Stuck on startup

Chuey
Contributor III

Hello all,

I'm still seeing some issues with 10.10.5 and 10.11.4 that won't fully load and get stuck on startup. It will load either halfway or even 95% but never past that. Sometimes I'll turn them off, let them go over night, come back and they work next day? Sometimes I'll boot to single user mode and run fsck -fy and reboot and it works. Just so hit and miss. These machines are bound to AD. Is anyone still seeing these issues in their environment?

195 REPLIES 195

apizz
Valued Contributor

@allanp81 I was going to try and give the upgrade a try on at least one machine this week.

I also found this thread from Apple's threads back from February about an hp_io_enabler_compound.kext preventing startup. In our environment this appears to be the only Apple kext on our computers, so I was also going to try messing with this as well.

allanp81
Valued Contributor

@aporlebeke I'd be very surprised if that kext has anything to do with.

We've just had a few of us do it again, although they had their firmware updates done after they were last imaged.

The most recent time these were imaged was the 8th of November. We've reimaged them again and will see.

Chuey
Contributor III

@aporlebeke @allanp81 I was seeing huge issues in carts of MacBook Airs and also iMacs / Mac Minis. These machines were running Operating System Build: 15G31

After I created a new image with all patches / security updates the new build is Operating System Build: 15G1108

Since creating a new image and re-deploying things have been quiet.

allanp81
Valued Contributor

@Chuey How long ago were they imaged?

Chuey
Contributor III

@allanp81 I piloted this image to 1 cart of 30 MacBook Airs that were having major issues about 3 weeks ago. Since imaging that cart not 1 peep from them.

I just re-imaged 5 carts of 30 MacBook Airs on Wednesday and Friday of last week.

allanp81
Valued Contributor

@Chuey If you're going to see the issue then it would be any day based on our experiences here.

Chuey
Contributor III

@allanp81 Are you using Operating System Build: 15G1108 and still seeing the issue?

allanp81
Valued Contributor

@Chuey No we're still using 10.11.5 as our base image. We did manually update some of the machines to 10.11.6 when we first started seeing the issues but it didn't seem to help, they were still randomly getting stuck on boot.

kayzlot1
New Contributor

Just chiming in here to report that we are having the same exact issue. High usage area, common lab, lots of user accounts. It is all iMacs in our environment (Retina 5K, Late-2014, Running 10.11.6, 1TB Hybrid drives).

In our environment repetitive single-user or verbose boots will eventually get them to boot. I've looked very closely at the verbose logs and I'm not really seeing any red flags. There is this which shows up almost every time, seems it can't delete the dyld caches.

Oct 11 16:19:17 localhost kernel[0]: Sandbox: launchd(1) System Policy: deny(1) file-write-unlink /private/var/run/dyld_shared_cache_x86_64h Oct 11 16:19:17 localhost com.apple.xpc.launchd[1]: Failed to remove file or directory: name = dyld_shared_cache_x86_64h, error = 1: Operation not permitted. Further logging suppressed. When it stalls in verbose boot it tends to stall right after "SDXC: pause". System.log shows this coming up after that (not displayed in Verbose boot) Oct 11 16:17:45 localhost kernel[0]: Sandbox: launchd(1) System Policy: deny(1) file-write-unlink /private/var/run/dyld_shared_cache_x86_64h.map

I re-imaged the entire lab about ~1 month ago and just now the issue started cropping up again. We deploy lots of Adobe CC apps, as well as Office 16. I do have quite a lot of unofficial kext's loading - that will be my next thing to look at.

My other thought was that it could be related to the Hybrid drives in some way, but I have no evidence for that. Just know how much we've struggled deploying to them or having them 'unfuse' themselves. Maybe firmwmare related to CoreStorage?

Eager to hear if Apple has responded with anything useful - it's getting to be the end of the semester and we really can't have this many machines out of order.

allanp81
Valued Contributor

@kayzlot1 The machines we've seen the issue originally had fusion drives in them but were all upgraded with Samsung 850 Pro SSDs. On some machines the Apple SSD portion of the fusion drive is still present but not on all of them (I guess different people did the upgrades?). The drives are separate though, they haven't been "joined" with the new drives.

We also deploy Adobe CC and Office 2016.

We too see the same errors about the dyld caches that you're seeing but I couldn't find anything useful as to what this meant and also how to fix it.

Chuey
Contributor III

@kayzlot1 Can you tell me what Operating System Build version you are using to image your machines with?

allanp81
Valued Contributor

Has anyone tried:

sudo update_dyld_shared_cache -force

In theory that could be run from single user mode.

Holding down shift on boot will also clear this cache at the same time as performing a safe boot.

kayzlot1
New Contributor

@Chuey 15G31
Edit: Should clarify, that's the original image. Post-deploy after all security updates applied they are sitting on 15G1108.

@allanp81 I'm pretty sure iMac's have a mini-PCIE or m.2 SSD that works in conjunction with the standard SATA hard drive. Sounds like whoever did the installs forgot to pull the SSDs.

Chuey
Contributor III

@kayzlot1 Ok, I used the latest Install.app and AutoDMG to create our image. I applied all updates to the image before compiling this way it was up to date before I shipped it out to computers.

I'm not 100% positive or anything but it seems like machines running 10.11.6 OS Build 15G31 had an issue when they applied the security update from Oct. 24th. That is when all our MacBook Airs started doing this startup issue. Since upgrading our image to the latest build with all updates and re-imaging MacBook Airs I have not seen the issue.

allanp81
Valued Contributor

Seems unlikely as we're seeing the issue on older versions of osx

Chuey
Contributor III

@allanp81 What other versions besides 10.10.X and 10.11.X are you seeing this on?

allanp81
Valued Contributor

@Chuey we've seen it on 10.11.5 and 10.11.3 base images and 10.11.6 that were upgraded from 10.11.5.

Chuey
Contributor III

@allanp81 I'd create a clean image that is patched and up to date as possible with the system build and re-image the entire machine. That is the only thing I've seen help the issue in my environment on any flavor of 10.11.X

allanp81
Valued Contributor

@Chuey Well as per my previous posts, since we firmware upgraded and then reimaged the machines affected we haven't seen a reoccurence and some of the machines have gone 3 weeks without showing the issue.

Zeek
Contributor

Its because the hard drive on 2010-2013 Computer doesn't support the latest OS x unless its a Solid State Drive. We was having the same problem and Apple told us to change the hard drive to SSD and we have no more problem.

Chuey
Contributor III

@Zeek I've had 2012 Mac Minis with SSDs and upgraded RAM get stuck on startup but they had System Build 15G31. Since upgrading them to System Build 15G1108 I've not seen the issue on them. Just my experience though.

allanp81
Valued Contributor

@Zeek where did you hear that? It definitely works fine on spinning or fusion drives (albeit slow).

allanp81
Valued Contributor

Sigh, one of our imaged 2 weeks ago to the day has started locking up on boot with no warning. Updated to 10.11.6 and the latest security update to bring it up to 15G1108 and same issue.

Literally nothing fixes it, even removing all MDM remnants and still intermittent boot. No local accounts, nothing.

Chuey
Contributor III

@allanp81 The only thing I can think of is completely re-imaging the machine with the 15G1108 build and not doing an in place upgrade.

On November 8 I imaged a cart of 30 mac book airs that were having major startup issues with an image that was 15G1108 and I have not had 1 issue from that cart since.

davidhiggs
Contributor III

@kkt @LibertyJSS @rdwhitt and others. We had issues with our high use areas, hanging on boot or login window after about 2 weeks of heavy use. This happened on 10.10 and 10.11, rebuilding the machine was the only fix that worked. There was a kauth hangup for us in the logs. Then one day I went back to basics and looked at my AD binding config and realised I had overlooked a setting which could be related to the issue:

Create mobile account at login

I'd always had this option enabled for 1 to 1 setups and never gave it a thought in shared use computing. Once I rebuilt my Macs with this binding option off, 4 weeks later I knew it had worked. We've been ticking along nicely without failure for 6 months now.

There's definitely a bug there. We don't really have a use for this option and most people shouldn't for shared use desktop Macs. So give it a go if you have it enabled. I think that once the machine had hit a certain number of mobile account users, it just crapped out.

allanp81
Valued Contributor

@davidhiggs That's definitely something we'll try as we have mobile accounts enabled. I've noticed that in our dev environment it doesn't seem to then apply user level configuration profiles if we disable using mobile accounts, not sure if that's by design or just a totally separate issue.

allanp81
Valued Contributor

@davidhiggs I've tried this but once I untick the option to use mobile accounts it seems to stop any user level configuration profiles from being applied. Not sure why.

SGill
Contributor III

Will have to test this setting, too. We've had Create Mobile Accounts on for many years, and apparently it only started being a problem in high traffic labs as of 10.10+. I'm wondering if high numbers of /Users accounts or high numbers of /var/folders/ directories are the actual problem, too.

Our pain point seems to be when the number exceeds 50 or so local profiles.

Really odd that this number isn't limited only by the size of your local storage and not some undocumented "handful" number.

jrippy
Contributor II

@davidhiggs I've talked to Apple Education Support and they've said the same thing. Essentially, Mobile Accounts were never meant to service more than a handful of people on a machine. Turning that setting off has fixed the issues we were having as well.

allanp81
Valued Contributor

Once you've disabled mobile accounts are you applying user level configuration profiles as well?

Chriskmpruitt
Contributor

We have this same issue. Our library machines are the ones that have heavy use and over a hundred managed mobile accounts. I have one machine with me right now that would lock up on boot 8 out of 10 times. The machine had 114 MM accounts on it. I have reduced that count to 30 MM accounts, now the machine is 10 for 10 on NOT locking up on startup.

We have been doing Manged mobile accounts for years, what changed?

jrippy
Contributor II

@Chriskmpruitt

We have been doing Manged mobile accounts for years, what changed?

Yosemite and El Capitan.
No idea on what really changed in the underlying code but you know Apple. Just like with their AD plugin or wifi, they have to break everything sometimes.

davidhiggs
Contributor III

We were initially using JAMF AD binding options, but I switched to using config profiles while troubleshooting. Even though it didn't fix the kauth timeout at the time, I preferred this method.

@Chriskmpruitt 10.10 and 10.11 must be not be coping with a large number of cached credentials
@allanp81 we don't have any user level profiles currently. if i have some time, i'll see if i get the same issue you do

allanp81
Valued Contributor

@Chriskmpruitt how did you clear the mobile accounts? We're clearing all users on each boot already.

allanp81
Valued Contributor

Also I suppose the question then becomes was it building up on a machine that uses mobile accounts to make it eventually start failing to boot.

We use a script run by a launch daemon that runs on each startup to clear out any accounts that aren't admin so we're not getting a build up of local accounts. Clearly this isn't enough so something else is getting broken/filled up that then causes the intermittent boot issue.

I've tried clearing all caches I can think of etc. but obviously there has to be something.

kayzlot1
New Contributor

I just got done re-imaging our entire space again. One machine started acting up about ~3 weeks after the last re-image, and afterward it spread like wildfire. It is definitely affecting the most heavily used machines first, which makes the mobile account theory make a lot of sense.

We are going to implement the mobile account change ASAP and see if that helps.

russeller
Contributor III

@allanp81 are you just removing the home folder or are you removing them from the local directory? Are you running something like dscl . -delete /users/student_account in your script?

Chriskmpruitt
Contributor

@allanp81 since we are still testing, I am just deleting the accounts one by one. If someone has a script to delete accounts (last login older than a month or something) I would give it a try on a cart or two.

allanp81
Valued Contributor

@ssrussell We're doing pretty much exactly that.

Rocky
New Contributor III

@ssrussell Here is the simple version of the script we have been using to delete mobile accounts. We are deleting every 7 days in places and still having the hanging at startup.

#!/bin/sh
userList=`dscl . list /Users UniqueID | awk '$2 > 1000 {print $1}'`
# Deleting account and home directory for the following users...
for a in $userList ; do
#To change timefrme to a different number of days adjust the parameter, for instance, -mtime +3 is three days since modification
find /Users -type d -maxdepth 1 -mindepth 1 -not -name "*.*" -mtime +21 | grep "$a";
if [[ $? == 0 ]]; then
dscl . delete /Users/"$a"; #delete the account
rm -r /Users/"$a"; #delete the home directory
fi
done