Has anyone made any progress on this? We're in the middle of a reading week so mac use has been lower than normal.
Interestingly, we're currently trying Nexthink monitoring solution and it's highlighted that all of ours that are AD bound throw up regular connection failures to our AD controllers. I've enabled more verbose logging on the opendirectoryd service but so far nothing jumps out.
@allanp81 No progress here, actually gotten worse and started seeing on a lot of MacBook Airs.
What's odd though is if you leave the MacBook Air stuck on the startup, it will go to sleep, then when you restart it, will come back on just fine.
We've noticed here that ours that are seeing the issue return different IP addresses depending on whether we look up hostname versus FQDN.
Having done the verbose boot we see the same thing about the kauth timeout, which is the last thing that appears and then it loads no further.
Could the disparity between the IPs be causing this? A lookup on our AD domain returns 5 IP addresses so maybe it's hitting a different AD server each time and that's causing the randomness in startup?
Just to say that we had this problem in the summer and then with the 10.11.6 update it suddenly went away. In the last weeks it has returned and we are also seeing it on some laptops, where previously it was pretty much just iMacs.
I got one of our trouble machines up on the workbench and it failed to boot several times so we plugged in an external usb 3 drive with a 10.11.6 install of OSX on that is our new netboot image. This is obviously a very stripped down image, not enrolled with Casper or joined to AD etc. This failed to boot, with pretty much the same symptoms.
Plugged same usb 3 drive into a newer macbook and it booted instantly (you could say even faster than the internal drive!).
Looking at our inventory information it does appear that all of the EFI/SMCs versions are out of date compared to what Apple say are the latest (https://support.apple.com/en-us/HT201518) but they make this almost impossible to update them as you need the version of OSX that came with the machine to be able to do this!!!
I'm pulling out what little hair I have left over this issue now as I'm getting a lot of flack for something that's pretty much totally out of my control.
@amosdeane Issue went away for us for some time too and then came back on iMacs and Mac Minis only.
Now the issue has caught fire and spread to MacBook Airs.
@allanp81 What make / model were you using for your test with the outdated EFI ?
I'm seeing this mainly on Early 2014 MacBook Airs running the MBA61.0099.B22 Boot ROM Version.
@amosdeane @allanp81 I wonder if it has something to do with the Security Update 2016-002 for MacBook Airs. That was released on Oct. 24, 2016 and that is the same time we started seeing this issue on MacBook Airs which previously never had this issue. . . .
Perfectly possible. We haven't applied any security updates to any our affected macs. They're pretty much all running 10.11.5.
So... this morning I dug out the firmwareupdate.pkg that comes with the 10.11.5/10.11.6 updates that was available on local SUS server. Ran this on some of our late 2012 and late 2013 models and so far so good.
The late 2012 model I have on our workbench that pretty much refused to boot the majority of the time is now booting successfully every time.
I also noticed that on the late 2013 models it changes the boot screen from grey to black so potentially a major change to EFI and SMC (the versions number have jumped quite a bit).
Before I get my hopes up I'm assuming others have gone down this route? It seems that unless you don't do a proper install of El Capitan or at least an update from 10.11.5 to 10.11.6 etc. that it's almost impossible to do any firmware updates.
It sounds worth looking into the firmware issue again. We did check this out previously and I think we ran an update when doing 10.11.5-6 but we could double check this in case there were issues with it. @Chuey, I haven't seen the boot issue on a Macbook Air yet, as we've mainly got MacBooks here, but we'll check for any recent updates.
Looks like my excitement was short lived. The late 2012 mac here is still exhibiting the same completely random boot hang.
It literally doesn't matter what you're doing, could be logged in as admin, logged in as an AD user or even just restart from the login screen after a successful boot. It will just randomly not boot some of the time.
None of the system logs reveal anything useful. A verbose boot doesn't uncover anything useful either :(
We're literally staring at replacing these macs as the only solution!!!
@amosdeane Thanks for the update, let me know if you find anything else out.
@allanp81 Lame, not what I wanted to hear. Replacing machines with new ones is not an option in our environment due to the amount of MacBook Airs / iMacs we have that are showing this issue. We are an Apple Certified Repair Center maybe I can contact Apple directly about this issue we are seeing a lot of . . .
Please do contact Apple if you can!!!
1 of our older 2012 imacs is actually failing the Apple Hardware Test consistently with memory errors but in true Apple fashion they've taken something that was useful and removed it from newer models. All you can do now is run the Apple diagnostics check which seems to finish checking in under 2 minutes so can't be particularly thorough.
Can I just ask that if anyone else is experiencing this issue, even if you don't have anything to add to the discussion, you just put a brief post to confirm that you have it.
We've been going round the machines where we're seeing the issues and so far it looks like:
Firmware update doesn't fix the issue
Firmware update then a reimage and so far haven't seen the issue
We've created a matrix to compile when a machine was last imaged, has it had its firmware updated, when was the issue last seen. We've also got a daemon running on the machines that constantly sends us this info so we can actively monitor it (mostly to appease management).
Only time will tell as a reimage seems to work for a period of time but the main thing we've seen so far is that all of our problem machines clearly haven't had a firmware update for a long time. Prior to us rolling out casper they would've been imaged using deploy studio so will have been distributed with an image created on a mac. The guy that used to do it here always used whatever was the newest model at the time to build his images. Our problem macs are 2012 and 2013 so I'm guessing would've originally come with Mavericks or maybe even older than that?
I would imagine that most of us here are in a similar situation where we're applying an image (whether old fashioned way or using autodmg). Either way I don't think the macs will get a firmware update as normally this would get done when either installing a version of OSX or performing an update.
It seems like we are having the same issue on some machines - particularly the busy ones in library areas.
The Macs are: 21.5 inch iMac 14,1 running OS X 10.11.6
I haven't yet looked at this issue myself - my collegues have.
Though I can't think of anything that I would have tried - that they have not already tried...
Personally I have a suspicion that it's related to logouts not happening correctly - getting stuck,
and then then machines being crashed, in order to log in..
Quite why logout sometimes does not complete (apart from applications still being open, with unsaved documents) is unclear.
I did think of writing my own logout routine to 'force' a full logout - even if that did result in loosing unsaved documents..
but haven't yet done this since I've been busy with other things.
besides implementing such a thing should not be necessary..
But it would be interesting to see if this then began to resolve the startup issue..
If the problem was related to 'corrupted boot caches' - then the "safe boot" followed by restart and "normal boot" may sometimes resolve this - and I think on occasion has for some people. Although that method is not always 100% reliable.
What we know is that 'something' or several different 'somethings' are on some machines, causing the system startup to not follow a normal pathway - resulting in a startup freeze.. And at this point no-one seems to know exactly what is causing this.
Reformatting and Re-imaging the machines affected - does resolve the problem - for a while, and then it occurs again.
Although above I said re-imaging - we are thin imaging using Casper, and installing an OS from deploy Studio (which we didn't do last year)
but last year we saw this problem too - though less frequently. The OS is built using the AutoDMG tool..
So it's not an old-style 'clone' image.
The main 'pattern' so far - is that we are only seeing this happening in especially busy areas - where lots of different logins are occurring..
i.e. - in excess of 100 different users logging in..
Incidentally, in casper Vn 9.96 the number of: "MDM Capable Users" (Machine Record : General ;MDM Capable Users)
does not seem to get reset after reimaging - it just increases with occasional ,, entries which makes me think that might be a casper bug ?
But even if so, that unlikely to be related to this startup issue..
Thanks to @PeterClarke for posting much of what i wanted to say. We're having issues in our heavily traveled areas as well. These are iMac16,2 machines (2015, 4K Retina, 21.5" 16GB/1TB) machines purchased this summer. Had they been older machines we may have reverted back to our 10.10.5 image that was rock solid last academic year.
I opened a ticket with Apple and called it "Stuck on Startup" as well. But in my observation in one of our teaching labs I noticed upon logout that I got the same Apple logo with slider on it just like you would upon boot before the login window reappeared. This makes more sense as our users "shouldn't" be rebooting those machines and the Shutdown feature is removed via Config. Profile.
However, once they're hung it's a crap shoot if they'll reboot. We've got several machines that we've re-imaged at what seems about a two week interval. Of course I have an identical piece of hardware in my office that I can't reproduce the problem on. Guess I need to invite a few hundred students in to use it.
That's not good to hear that you're seeing exactly the same issue on a 16,2 as we're about to swap out some of older machines with these to see if it cured the issue.
Like @PeterClarke we have machines in the office on the workbench with same spec, same software setup that we can't reproduce the issue on so it's definitely a problem when you pass a certain amount of usage/number of users.
@allanp81 @amosdeane
Is anyone deleting mobile accounts on logout ? I thought about setting that feature in my config profile in a select few high traffic areas or carts of MacBook Air that have been having this issue. I too think it has something to do with amount of usage and wondering if deleting every mobile account on logout would help the issue?
We don't delete on logout, we have a launch daemon that deletes at startup but it doesn't seem to make any difference whether we run this or not. It looks like the macs hang before even getting that far into the boot process anyway.
We have found that it occurs in areas where we do delete accounts (also on startup) and we have tried various different variations in how we do it, but not with any success I am afraid. It does seem to occur on machines that have a large volume of users, and @PeterClarke 's suggestion of it being related to failed logouts seems possible.
We had a suspicion that it begun to occur on machines that hung on logout, and were then forcibly shutdown. This then caused them to (sometimes) get stuck on startup, causing more forced shutdowns, which made the problem progressively worse until a large percentage of the time they would get stuck on startup, and so on.
We have not found that it occurs solely on a particular make or makes, although until recently it didn't seem to affect laptops. Now it is happening on both MBP and MBAs, however!
Just thought I'd chime in and say we are seeing this in our district with Late-2013 iMacs 21.5 running 10.11.6 in high traffic areas like Library Labs. There are likely hundreds of accounts on these Macs. There are also reports of it happening on 2012 MacBooks Pros in a shared cart that get used regularly.
I've had this happen in all 3 of our iMac labs (Late 2015 21.5 iMacs and Late 2013 27 iMacs). I completely reimaged the Late 2015 labs with 10.11.6 and this seems to have solved the issue. Haven't had any boot up issues in weeks. They were all running 10.11.6 previously however so I don't really know what solved it.
safeboot and check disk usually fixes the issue, but why it occurs, appears to be a disk permission issue I think.