Stuck on startup

Chuey
Contributor III

Hello all,

I'm still seeing some issues with 10.10.5 and 10.11.4 that won't fully load and get stuck on startup. It will load either halfway or even 95% but never past that. Sometimes I'll turn them off, let them go over night, come back and they work next day? Sometimes I'll boot to single user mode and run fsck -fy and reboot and it works. Just so hit and miss. These machines are bound to AD. Is anyone still seeing these issues in their environment?

195 REPLIES 195

allanp81
Valued Contributor

Clutching at straws here, but could it be something to do with user configuration profiles?

I've noticed that when I run "sudo profiles -P" on an affected machines it returns well over 100 user configuration profiles. An unaffected machine is generally about half of this.

Doing a "sudo jamf removemdmprofile" only clears the 2 computer level profiles, all of the user profiles remain. I can't find any combination of commands to remove these, it either complains they don't exist anymore in DS (if we're doing account cleanup) or that it can't be removed as it's non-removable.

The only method I've found to clear them is to trash the /var/db/ConfigurationProfiles directory and then re-apply MDM. Sadly all of the local macs to me that have the issue are all in use currently so I can't put this theory to the test to see if it solves anything.

allanp81
Valued Contributor

Deleting the config profiles made no difference sadly, was worth a shot though. It really is a bizarre issue.

One of the machines I was looking at wouldn't boot at least 20 times in a row yesterday and then just randomly did. This morning it booted fine 2-3 times fine after logging in with AD accounts and then on the 4th boot it refused to boot.

There literally is no pattern to it!!! I'm just going to reimage it with 10.11.3 instead of 10.11.5 and wait and see. Failing to see what else we can do at this point.

allanp81
Valued Contributor

Still trying to figure out what the issue could be.

Looking at our affected machines one of the main differences is that we install xcode 7.3.1 and then make all users part of the developers group so they can compile stuff.

Anyone else running xcode in an affected area?

Chuey
Contributor III

@allanp81 No, I am not running xCode on any of these machines.

allanp81
Valued Contributor

I've now got some of the technical staff imaging them back to 10.11.3 to see if the issue goes away, if yes then we'll have to look at rolling them all back to that version.

Chuey
Contributor III

@allanp81 Seems when I reverted back to 10.11.4 issue stopped. I don't like rolling back but I'm also not prepared for Sierra yet without some good testing. When I see issue on 10.11.6 I just re-image it back to 10.11.4. Let me know if reverting helps your issue out too.

Thanks

Chuey
Contributor III

Has anyone seen this issue on MacBook Airs? I've seen a few MacBook Airs with stuck on startup but not sure if it is related to the same issue with the iMacs / Mac Minis. . .

allanp81
Valued Contributor

We don't have any macbook airs, only seen it on imacs so far and only late 2012/2013 models. That might be pure coincidence though as they're the only models we have in large usage areas.

draeconis
New Contributor II

We were seeing this issue on 10.11.5, Apple advised we upgrade to 10.11.6, and the issue seemed to go away. They wouldn't explain what the issue was.

More recently it's come back, and is instantly a problem if we do an upgrade from 10.10.5 to 10.11.6.

Newly imaged 10.11.6 (15G1004) machines also experience this issue, but only after 2-3 weeks of heavy use.

All the standard stuff doesn't seem to help. Even insane stuff like trashing the contents of /var/folders/ or /var/db/spindump/

Upgrading to 10.12 doesn't help, same issue. Can't test 10.12.1 yet as the Sierra installer in the App Store is still 10.12 for some reason.

Most machines (though not all, and not consistently) show the following when booting verbosely.

kauth external resolver timed out (1 timeout(s) of 60 seconds)

Even Safe Boot doesn't work properly any more on these machines. Very perplexing.

allanp81
Valued Contributor

Interesting, seems to me like Apple have no interest in fixing this issue.

Chuey
Contributor III

@draeconis Thanks for the information. I too am seeing the kauth external resolver error a lot.

@allanp81 Why would they care? They only care about 1-1 and not enterprise environments. Sierra is out but I haven't even entertained the idea because I've had zero time to test thoroughly.

allanp81
Valued Contributor

Has anyone made any progress on this? We're in the middle of a reading week so mac use has been lower than normal.

Interestingly, we're currently trying Nexthink monitoring solution and it's highlighted that all of ours that are AD bound throw up regular connection failures to our AD controllers. I've enabled more verbose logging on the opendirectoryd service but so far nothing jumps out.

Chuey
Contributor III

@allanp81 No progress here, actually gotten worse and started seeing on a lot of MacBook Airs.

What's odd though is if you leave the MacBook Air stuck on the startup, it will go to sleep, then when you restart it, will come back on just fine.

allanp81
Valued Contributor

We've noticed here that ours that are seeing the issue return different IP addresses depending on whether we look up hostname versus FQDN.

Having done the verbose boot we see the same thing about the kauth timeout, which is the last thing that appears and then it loads no further.

Could the disparity between the IPs be causing this? A lookup on our AD domain returns 5 IP addresses so maybe it's hitting a different AD server each time and that's causing the randomness in startup?

amosdeane
New Contributor III

Just to say that we had this problem in the summer and then with the 10.11.6 update it suddenly went away. In the last weeks it has returned and we are also seeing it on some laptops, where previously it was pretty much just iMacs.

allanp81
Valued Contributor

I got one of our trouble machines up on the workbench and it failed to boot several times so we plugged in an external usb 3 drive with a 10.11.6 install of OSX on that is our new netboot image. This is obviously a very stripped down image, not enrolled with Casper or joined to AD etc. This failed to boot, with pretty much the same symptoms.

Plugged same usb 3 drive into a newer macbook and it booted instantly (you could say even faster than the internal drive!).

Looking at our inventory information it does appear that all of the EFI/SMCs versions are out of date compared to what Apple say are the latest (https://support.apple.com/en-us/HT201518) but they make this almost impossible to update them as you need the version of OSX that came with the machine to be able to do this!!!

I'm pulling out what little hair I have left over this issue now as I'm getting a lot of flack for something that's pretty much totally out of my control.

Chuey
Contributor III

@amosdeane Issue went away for us for some time too and then came back on iMacs and Mac Minis only.

Now the issue has caught fire and spread to MacBook Airs.

@allanp81 What make / model were you using for your test with the outdated EFI ?

I'm seeing this mainly on Early 2014 MacBook Airs running the MBA61.0099.B22 Boot ROM Version.

Chuey
Contributor III

@amosdeane @allanp81 I wonder if it has something to do with the Security Update 2016-002 for MacBook Airs. That was released on Oct. 24, 2016 and that is the same time we started seeing this issue on MacBook Airs which previously never had this issue. . . .

allanp81
Valued Contributor

Perfectly possible. We haven't applied any security updates to any our affected macs. They're pretty much all running 10.11.5.

allanp81
Valued Contributor

So... this morning I dug out the firmwareupdate.pkg that comes with the 10.11.5/10.11.6 updates that was available on local SUS server. Ran this on some of our late 2012 and late 2013 models and so far so good.

The late 2012 model I have on our workbench that pretty much refused to boot the majority of the time is now booting successfully every time.

I also noticed that on the late 2013 models it changes the boot screen from grey to black so potentially a major change to EFI and SMC (the versions number have jumped quite a bit).

Before I get my hopes up I'm assuming others have gone down this route? It seems that unless you don't do a proper install of El Capitan or at least an update from 10.11.5 to 10.11.6 etc. that it's almost impossible to do any firmware updates.

amosdeane
New Contributor III

It sounds worth looking into the firmware issue again. We did check this out previously and I think we ran an update when doing 10.11.5-6 but we could double check this in case there were issues with it. @Chuey, I haven't seen the boot issue on a Macbook Air yet, as we've mainly got MacBooks here, but we'll check for any recent updates.

allanp81
Valued Contributor

Looks like my excitement was short lived. The late 2012 mac here is still exhibiting the same completely random boot hang.

It literally doesn't matter what you're doing, could be logged in as admin, logged in as an AD user or even just restart from the login screen after a successful boot. It will just randomly not boot some of the time.

None of the system logs reveal anything useful. A verbose boot doesn't uncover anything useful either :(

We're literally staring at replacing these macs as the only solution!!!

Chuey
Contributor III

@amosdeane Thanks for the update, let me know if you find anything else out.

@allanp81 Lame, not what I wanted to hear. Replacing machines with new ones is not an option in our environment due to the amount of MacBook Airs / iMacs we have that are showing this issue. We are an Apple Certified Repair Center maybe I can contact Apple directly about this issue we are seeing a lot of . . .

allanp81
Valued Contributor

Please do contact Apple if you can!!!

allanp81
Valued Contributor

1 of our older 2012 imacs is actually failing the Apple Hardware Test consistently with memory errors but in true Apple fashion they've taken something that was useful and removed it from newer models. All you can do now is run the Apple diagnostics check which seems to finish checking in under 2 minutes so can't be particularly thorough.

amosdeane
New Contributor III

Can I just ask that if anyone else is experiencing this issue, even if you don't have anything to add to the discussion, you just put a brief post to confirm that you have it.

allanp81
Valued Contributor

We've been going round the machines where we're seeing the issues and so far it looks like:

Firmware update doesn't fix the issue
Firmware update then a reimage and so far haven't seen the issue

We've created a matrix to compile when a machine was last imaged, has it had its firmware updated, when was the issue last seen. We've also got a daemon running on the machines that constantly sends us this info so we can actively monitor it (mostly to appease management).

Only time will tell as a reimage seems to work for a period of time but the main thing we've seen so far is that all of our problem machines clearly haven't had a firmware update for a long time. Prior to us rolling out casper they would've been imaged using deploy studio so will have been distributed with an image created on a mac. The guy that used to do it here always used whatever was the newest model at the time to build his images. Our problem macs are 2012 and 2013 so I'm guessing would've originally come with Mavericks or maybe even older than that?

I would imagine that most of us here are in a similar situation where we're applying an image (whether old fashioned way or using autodmg). Either way I don't think the macs will get a firmware update as normally this would get done when either installing a version of OSX or performing an update.

PeterClarke
Contributor II

It seems like we are having the same issue on some machines - particularly the busy ones in library areas.
The Macs are: 21.5 inch iMac 14,1 running OS X 10.11.6
I haven't yet looked at this issue myself - my collegues have.
Though I can't think of anything that I would have tried - that they have not already tried...

Personally I have a suspicion that it's related to logouts not happening correctly - getting stuck,
and then then machines being crashed, in order to log in..

Quite why logout sometimes does not complete (apart from applications still being open, with unsaved documents) is unclear.
I did think of writing my own logout routine to 'force' a full logout - even if that did result in loosing unsaved documents..
but haven't yet done this since I've been busy with other things.
besides implementing such a thing should not be necessary..

But it would be interesting to see if this then began to resolve the startup issue..

If the problem was related to 'corrupted boot caches' - then the "safe boot" followed by restart and "normal boot" may sometimes resolve this - and I think on occasion has for some people. Although that method is not always 100% reliable.

What we know is that 'something' or several different 'somethings' are on some machines, causing the system startup to not follow a normal pathway - resulting in a startup freeze.. And at this point no-one seems to know exactly what is causing this.

Reformatting and Re-imaging the machines affected - does resolve the problem - for a while, and then it occurs again.
Although above I said re-imaging - we are thin imaging using Casper, and installing an OS from deploy Studio (which we didn't do last year)
but last year we saw this problem too - though less frequently. The OS is built using the AutoDMG tool..
So it's not an old-style 'clone' image.

The main 'pattern' so far - is that we are only seeing this happening in especially busy areas - where lots of different logins are occurring..
i.e. - in excess of 100 different users logging in..

Incidentally, in casper Vn 9.96 the number of: "MDM Capable Users" (Machine Record : General ;MDM Capable Users)
does not seem to get reset after reimaging - it just increases with occasional ,, entries which makes me think that might be a casper bug ?
But even if so, that unlikely to be related to this startup issue..

Brad_G
Contributor II

Thanks to @PeterClarke for posting much of what i wanted to say. We're having issues in our heavily traveled areas as well. These are iMac16,2 machines (2015, 4K Retina, 21.5" 16GB/1TB) machines purchased this summer. Had they been older machines we may have reverted back to our 10.10.5 image that was rock solid last academic year.

I opened a ticket with Apple and called it "Stuck on Startup" as well. But in my observation in one of our teaching labs I noticed upon logout that I got the same Apple logo with slider on it just like you would upon boot before the login window reappeared. This makes more sense as our users "shouldn't" be rebooting those machines and the Shutdown feature is removed via Config. Profile.

However, once they're hung it's a crap shoot if they'll reboot. We've got several machines that we've re-imaged at what seems about a two week interval. Of course I have an identical piece of hardware in my office that I can't reproduce the problem on. Guess I need to invite a few hundred students in to use it.

allanp81
Valued Contributor

That's not good to hear that you're seeing exactly the same issue on a 16,2 as we're about to swap out some of older machines with these to see if it cured the issue.

Like @PeterClarke we have machines in the office on the workbench with same spec, same software setup that we can't reproduce the issue on so it's definitely a problem when you pass a certain amount of usage/number of users.

Chuey
Contributor III

@allanp81 @amosdeane Is anyone deleting mobile accounts on logout ? I thought about setting that feature in my config profile in a select few high traffic areas or carts of MacBook Air that have been having this issue. I too think it has something to do with amount of usage and wondering if deleting every mobile account on logout would help the issue?

allanp81
Valued Contributor

We don't delete on logout, we have a launch daemon that deletes at startup but it doesn't seem to make any difference whether we run this or not. It looks like the macs hang before even getting that far into the boot process anyway.

amosdeane
New Contributor III

We have found that it occurs in areas where we do delete accounts (also on startup) and we have tried various different variations in how we do it, but not with any success I am afraid. It does seem to occur on machines that have a large volume of users, and @PeterClarke 's suggestion of it being related to failed logouts seems possible.

We had a suspicion that it begun to occur on machines that hung on logout, and were then forcibly shutdown. This then caused them to (sometimes) get stuck on startup, causing more forced shutdowns, which made the problem progressively worse until a large percentage of the time they would get stuck on startup, and so on.

We have not found that it occurs solely on a particular make or makes, although until recently it didn't seem to affect laptops. Now it is happening on both MBP and MBAs, however!

russeller
Contributor III

Just thought I'd chime in and say we are seeing this in our district with Late-2013 iMacs 21.5 running 10.11.6 in high traffic areas like Library Labs. There are likely hundreds of accounts on these Macs. There are also reports of it happening on 2012 MacBooks Pros in a shared cart that get used regularly.

prichards
New Contributor II

I've had this happen in all 3 of our iMac labs (Late 2015 21.5 iMacs and Late 2013 27 iMacs). I completely reimaged the Late 2015 labs with 10.11.6 and this seems to have solved the issue. Haven't had any boot up issues in weeks. They were all running 10.11.6 previously however so I don't really know what solved it.

Malcolm
Contributor II

safeboot and check disk usually fixes the issue, but why it occurs, appears to be a disk permission issue I think.

allanp81
Valued Contributor

We're still hoping that we're seeing positive results by ensuring that the efi and smc are up to date and then reimaging.

For reference, these appear to be the latest:

Late 2012 iMac13,1
Boot rom IM131.010A.B09
SMC 2.9f8

Late 2013
iMac14,3
Boot rom IM143.0118.B13
SMC 2.17f7

We've imaged pretty much imaged all of them over the 2.5 weeks and so far haven't seen the issue again on any, having made sure we did the firmware update first.

Can anyone else confirm if they're seeing the issue on machines with up to date firmware?

apizz
Valued Contributor

Just to chime in here, we are experiencing this issue as well. We have 6 x iMac Intel (21.5-Inch, Late 2015) 16,1 w/ firmware passwords all running 10.11.6 in our Middle School. We are not running the latest security updates on these machines. They are running SMC 2.31f36 ; Boot ROM IM161.0207.B03

I've been looking at our system logs and I'm seeing a lot of forced shutdowns and power disconnects. We actually saw several of our students unplugging the power from these machines.

Going over the thread though it appears my initial thought that these improper shutdowns were the cause are in fact symptoms of a larger problem. As of yet, we haven't seen this issue on any of other iMacs or Mac computers.

Chuey
Contributor III

I put a ticket in with Apple through my GSX account and uploaded a log file from a troubled machine. They get back to me 4 days later and say:

There is nothing hardware related that stood out after review of the log files attached, however if you are still having the issue in a a clean known good OS, please initiate a technical support chat if you do require any further assistance troubleshooting the issue.

Thanks Apple

allanp81
Valued Contributor

So far ours have been looking ok. I've now added the EFI/SMC update to the imaging workflow and tested it in a few places and it successfully updates the mac before it boots into the OS for the first time.

We've still got a few machines with the issue but these were imaged about 3 weeks ago and were prior to the firmware update.

We're still kinda of pinning all our hopes on the firmware update then a reimage fixing it but sadly only time will tell.

As expected Apple's reply is useless and I'm sure their "fix" will just to be to update to Sierra which of course is really simple...

Has anyone actually logged a ticket with Jamf about it other that post on this thread? I'm not classing this as a Casper issue as I think the boot issue kicks in long before anything Casper related is loaded but it might be interesting to hear their take on it.