Posted on 05-19-2016 07:19 AM
Hello all,
I'm still seeing some issues with 10.10.5 and 10.11.4 that won't fully load and get stuck on startup. It will load either halfway or even 95% but never past that. Sometimes I'll turn them off, let them go over night, come back and they work next day? Sometimes I'll boot to single user mode and run fsck -fy and reboot and it works. Just so hit and miss. These machines are bound to AD. Is anyone still seeing these issues in their environment?
Posted on 12-05-2016 10:31 AM
@Rocky Thanks! I'm going to test this out. Some of our Macs are full to the brim with student accounts. Its nice that his will expire out old stale accounts while leaving the fresher ones on the Macs instead of just wiping all students at startup.
Posted on 12-05-2016 11:55 AM
@Rocky I think they are saying that you have to both clear out old mobilized AD accounts and de-mobilize new ones in the AD Bind settings (on desktops only) to see relief from the random startup failures. Many of the computers seeing the issue will show a "kauth" error at startup.
I'll be testing turning off Mobile accounts in the AD-plugin on desktops soon, but haven't yet. In my testing, it only becomes an issue on Macs with a large buildup of mobilized AD accounts.
Posted on 12-06-2016 03:33 AM
I've tested turning off mobie accounts but unfortunately it seems to stop user level config profiles stop working. I've logged a ticket with jamf about this but haven't heard anything back since sending screenshots of the issue.
The main question still is what is getting clogged etc. when a load of mobile accounts have been logged into a mac? Clearing the accounts using the dscl command as above doesn't fix it as we're routinely doing that anyway. If you "sudo profiles -P" from terminal you'll see all of the config profiles still there for all of the users that have used the machine, whether the account still exists or not and there's no simple way to clear these but even when you do clear them it doesn't change anything.
What other caches could there be that need clearing or is it just something we'll never get to the bottom of without Apple's assistance? (fat chance of Apple helping as they'll just say upgrade to Sierra).
Posted on 12-06-2016 06:57 AM
If upgrading to Sierra fixes this...I'm in! :)
Posted on 12-06-2016 07:00 AM
Upgrading to Sierra does not fix this issue. I hadn't seen the problem in a while, but just had to re-load a Sierra test machine to resolve the issue.
Posted on 12-06-2016 07:04 AM
@amiller6 Were you able to retest with "Create Mobile Accounts" set to Off ?
Posted on 12-06-2016 07:12 AM
I just reloaded the machine and am using that configuration now. I see that has caused some issues for others though, so not sure how that will work. Unfortunately, the machine went a couple of weeks without showing the issue, so it may be some time before I would see the issue again.
Posted on 12-06-2016 07:31 AM
I think it's when you get around 100+ accounts. That seems to be when we start to see the issue. I could in theory make 100 temporary guest accounts and start going through the laborious process of logging in and out with each of them.
Posted on 12-06-2016 02:42 PM
when creating mobile accounts, i'm not sure where Apple keeps the user cached credentials. it's possible they're separated and might not be cleaned up when the dscl delete command is used, which sounds like what @allanp81 is experiencing.
to be sure, i would re-image the machine if possible. when i made this AD config change, i didn't find it necessary to delete/reset the machines in the JSS.
Posted on 12-07-2016 02:29 AM
Hello, we have the same issues as posted in this post. Have re-imaged our affected Macs, roll back the OS and we have different models of Macs, after a few weeks of heavy usage, same issues again.
David Higgs was right, it was the CreateMobile Account setting that is enabled that is causing this issue.
If you do not wish to re-build/re-image your existing affected Macs, you can apply the fixes below manaully that I have applied and tested on all our affected Macs.
Disable CreatedMobile Account.
Enable the root account.
Logon as root and delete these AD cached user files below;
/ var/db/dslocal/nodes/default/groups and delete all with com.apple.sharepoint.group..plist - = number of user
(you can spot these easily as that are listed in numerical order.)
/var/db/dslocal/nodes/default/groups/sharepoint and delete USERNAME public folder.plist (all your AD users actual Names will be listed here)
var/db/dslocal/nodes/default/groups/Users and delete all USERNAME.plist
(all your AD users will be listed depending on how you have assigned these e.g. our students are registered by the year they start so 16000000 etc.) (Deleting the affected users' plist here will also automatically removes them from System Preferences Users & Groups section.)
Empty the Recycled bin.
Must restart the Mac. The first boot will take awhile as it needs to rebuild the databases.
We have a script that deletes User's local home folders and Users' folders in /Library/Managed Preferences. So if you do not allow users to save work on the local Mac, if you wish, you can delete these manaually. So it's nice and clean,
If you do get a Mac that just refuse or take ages to boot even after reseting the SMC, PRAM, Safemode fixes etc. Target disk mode the affected Mac with another Mac. Must ensure that all Hidden/System Files are enabled so that you can see the affected files. I have a nice App that I found on the web that does this nicely. Then follow the steps above.
Hope this helps.
Frank
Posted on 12-07-2016 04:47 AM
@Kong I'm assuming if you regularly cleaned up those locations you could leave mobile accounts enabled?
Posted on 12-07-2016 06:04 AM
Hi allap81. If you leave the CreateMobile accounts enabled after my cleanup, after heavy AD usage you will ge the same issues again. Unless you are know how to write a script to does the cleanup on startup or maybe once a week of these hidden AD users cached credentials. In our case, we have no need to use the CreateMobile aacount enabled as these are Student Macs and most of them are in open access areas. We never have this option enabled over the last few years but it was a mistake that this option was enabled that's when we start to get the stuck at booting issues.
Posted on 12-07-2016 06:36 AM
@Kong we run a script on startup to clear the accounts so in theory I could just clear those locations at the same time
Posted on 12-07-2016 09:55 AM
@allanp81 do you mind sharing that script?
We have been using Rockys script to delete accounts on startup. We are deleting all accounts that have not been modified in 5 days. This machine is still locking up with only 8 accounts on it.
I just manually deleted the / var/db/dslocal/nodes/default/groups and delete all with com.apple.sharepoint.group..plist - = and my test machine just booted 10/10 with no lock ups.
Some of our machines go home with students so we need Managed mobile accounts turned on.
Posted on 12-08-2016 01:01 AM
Interestingly I have noticed that if you remove a mobile account via the gui, it removes those references under the /var/db/dslocal/nodes/Default location...
Posted on 12-08-2016 07:43 AM
Hi, can I just ask who uses Autodmg when building your base image? Has anyone had this issue when thin imaging?
Posted on 12-08-2016 07:53 AM
Yes, I'm seeing it in thin deployments...no Autodmg in the loop.
Happens on busy lab macs with about 100 or more AD-mobilized accounts.
I think I'll be moving forward with turning off mobilization soon (via a Configuration Profile--the Mobilize choice is also in the Directory payload).
Still not sure why there seems to be an upper limit here other than local storage--it could be an Apple bug that began around 10.10.3. This one is difficult to recreate due to the conditions that must be present in order to see it.
Posted on 12-08-2016 08:00 AM
Yes, we do - Although i did wonder about that, we were seeing this issue - less often - before we started using AutoDMG built images.
But previously, we had an old-account removal script running, where as presently we don't.
When we were using the old-account removal script, it had bug, that caused some accounts not to be removed.
I was going to re-write it - without the bug. (The original version was copied from elsewhere)
The 'bug' by the way, was technically a 'feature' - the original script made use of the unix mtime function - which actually works differently to the way that everyones expects to work !
i am thinking that, because the number of (mobile) accounts, were mostly, kept limited (though in busy areas, even with the account culling script, we sometimes got to over 300 accounts a a library computer) we rarely saw this issue.
It seems to be happening more since we have used OS X 10.11 (OSX 10.11.6 currently)
we also saw it more rarely in OS X 10.10.x, and it almost never happened in OS X 10.9.x..
- But that's just my observation..
Busy areas with lots of account churn - such as in library areas, seem most prone to this problem
Posted on 12-08-2016 08:27 AM
We cleared out the groups and sharepoints directories on over 15 today and it fixed all of them instantly. We've added it to our cleanup scripts so will see what happens from this point so fingers crossed.
I'm also going to use composer to do a snapshot to see what deleting a mobile account via the gui actually does.
Posted on 12-08-2016 09:21 AM
I use the Autorun Data to re-image a computer with a different name and for some reason every time I change the name its goes to the old one. Any idea how to fix it?
I also went to the sharing option on the device and change the name but when I run sudo jamf recon it change it to the preview name.
Posted on 12-08-2016 03:37 PM
Incomplete solution deleted.
Posted on 12-09-2016 07:53 AM
It appears that this has now fixed our issue based on our testing today. It has fixed all machines with the issue.
We are just running the following script on each boot of a student machine:
#!/bin/sh
UserList=`ls /Users | grep -v "Shared" | grep -v ".localized"`
Dansarray=( $UserList )
#printf "%s
" "${Dansarray[@]}"
for u in ${Dansarray[@]} ; do
if [ "$u" = "administrator" ] || [ "$u" = "admin" ] || [ "$u" = "kingston" ] || [ "$u" = "Administrator" ] || [ "$u" = "arduser" ] ;
then
echo "$u -- detected skipping..."
else
echo "$u -- Deleting..."
`/usr/bin/dscl . delete /Users/$u && /bin/rm -rf /Users/$u`
find /private/var/db/dslocal/nodes/Default/sharepoints -name "*" -type f -delete
find /private/var/db/dslocal/nodes/Default/groups -name "com.apple.sharepoint*" -type f -delete
fi
done
This was written by a colleague about 18 months ago so I have just appended the 2 lines to remove the references to sharepoint.
Posted on 12-09-2016 10:53 AM
Interesting to see that it's occurring without Autodmg in thin images. We currently have a support case with Apple and they insisting that we build an image without any 3rd party to tools to remove them from the equation. We are doing this but it sounds like this is a red herring.
Posted on 12-09-2016 11:57 AM
Details are fuzzy, but I recall using internet restore on a machine and still seeing the issue. I don't believe this is an Autodmg issue(although that is what I'm using to create my base image).
Posted on 12-09-2016 12:02 PM
@amosdeane @amiller6 look above, it's essentially been solved by Kong and from my testing looks like a fix has been found. No need to disable mobile accounts etc, just do a proper cleanup of them and bob's your mother's husband's brother.
Posted on 12-12-2016 02:49 AM
Ok, that sounds very positive. I'm just slightly cautious as we've thought that we've fixed this one a few times and then it's come back! We're going to test this out. Thanks all for the suggestion!
Posted on 12-12-2016 03:27 AM
@amosdeane It is looking promising. Our main affected room has 47 macs in it and so far we've never had a morning where all of them would power on successfully.
This morning I watched using our custom availability tool and all 47 came on first time without any hitches so it really is looking good so far.
Usage will drop off in the run up to Christmas so I'm not going to call it properly until we reconvene after the break in January but so far every mac that wouldn't boot worked fine after clearing those obsolete plists.
I find it ridiculous that just a build of ~100 plists can stop a whole OS from booting if this does turn out to be the fix.
Posted on 12-12-2016 08:21 AM
allanp81 that sounds encouraging. If we could finally crack this problem I feel like just starting the christmas festivities right away....
Posted on 12-13-2016 07:02 AM
I did a quick compare of a file system, before and after deleting a mobile account via the gui compared to deleting an account using the dscl command. And the main differences appears to be that deleting via the gui removes the following (along with the /Users/account directory):
/private/var/db/dslocal/nodes/Default/groups/com.apple.sharepoint.group.1.plist (this increments for each new user)
/private/var/db/dslocal/nodes/Default/sharepoints/user, name's Public Folder.plist
/private/var/db/dslocal/nodes/Default/users/username.plist
Using the dscl command only removes the last of those 3 lines and leaves the other 2 files. Over time you could end up with 100s of these and this seems to be what causes the intermittent boot issues. All of our machines now appear to be working fine since adding those 2 lines to our account cleanup script.
I don't know if this is a bug in the way the dscl command works but you can manually clear these easy enough by removing them all or making something more complicated if you wanted to.
The com.apple.sharepoint and public folder.plist are to do with user shares and the public directory that exists within a user's home directory. If you are deleting all local mobile accounts then there's no harm in doing this.
Posted on 12-13-2016 07:10 AM
Posted on 12-13-2016 07:15 AM
@allanp81 @amosdeane We had 4 MacBook Airs that were stuck on startup this morning. We booted to an external hard drive and then browsed to the troubled computers partition and ran these two commands suggested by allanp81 above:
find /private/var/db/dslocal/nodes/Default/sharepoints -name "*" -type f -delete
find /private/var/db/dslocal/nodes/Default/groups -name "com.apple.sharepoint*" -type f -delete
Once we rebooted the machine, BOOM, it worked and even seemed to boot faster. I think this is definitely the fix. We did not delete the Users home folder or the dscl record. All we did was delete those plist files and it instantly booted. Thanks so much. I think we are going to create a script with those commands and apply it as a LaunchDaemon.
Posted on 12-13-2016 07:24 AM
@aporlebeke Yes that is correct, we've been running that script now for over a year and I just added the 2 find commands to the script. We have always cleared local mobile accounts on our student machines to prevent build ups.
@Chuey I'm assuming you put in the path to the local disk though and not the external hard drive that you booted from?
Posted on 12-13-2016 07:35 AM
@allanp81 Correct, we made sure we deleted them on the local hard drive and not our external drive.
Posted on 12-13-2016 07:37 AM
@Chuey Excellent, well it's looking good then. I might see if I can try and streamline it a bit as those 2 commands will also delete the files associated to the admin account, although I'm not sure it matters unless you've changed any sharing settings to do with that user.
Posted on 12-13-2016 08:07 AM
@allanp81 I noticed if you do not delete home folders associated to users that it will not re-build those files in private/var/folders. I was able to login and browse mounted shares no problem.
Not sure if that is an issue or what ?
Posted on 12-13-2016 08:08 AM
Yes not sure. We're only going to clear those sharepoint files at the same time as deleting the users.
Posted on 12-13-2016 08:34 AM
From what we can tell, it looks like an area that handles the user's local 'Public Folders' and sharing rights for these folders specifically.
It seems to reliably resolve this issue, although we've seen many things fix it, only for the issue to come back without reason, so we'll keep testing it for now.
Since you're deleting these files without using $u in your script, you could always put this outside the do loop, since after the first time it runs it'll be redundant :).
Posted on 12-13-2016 09:35 AM
@draeconis haha,yes good point. Will update it once we roll it out properly.
Posted on 12-16-2016 04:05 AM
How are people getting on with this now? Does it look like it's fixed it for everyone?
Posted on 12-16-2016 09:01 AM
It's been a week now since the first machines I did (end of last week), other machine about 4 days (beginning of this week) and have seen no recurrence thus far. It's finals week at the university I'm at, so pretty heavy use at the beginning of the week tapering off towards the end. I'm very optimistic this is working.