Slow login times on AD domain-bound Macs

dudzikj
New Contributor III

Hi All,

I know this is an age-old problem with a million different possible fixes out there, but it's driving me nuts and hopefully somebody has seen this particular flavor before. My apologies in advance for the super-long post, but there is a lot going on here.

We have a fleet of Macs deployed in computer labs on campus (All Yosemite builds) that are experiencing extremely slow login times when a user attempts to log in with their domain account for the first time. Anywhere from two minutes to ten minutes. We are not using network-based home folders, and we are not using mobile accounts.

One thing that we found is that by switching to the time server in our environment rather than using the default time.apple.com server is that we have managed to get subsequent, new-user logins to get to a usable desktop within 25 seconds. Until the machine is rebooted. Then we're back to the same situation where the first new user account that logs in has to wait a couple of minutes to get to a usable desktop.

So it goes something like this:

Newuser1 sits down at a freshly booted Mac and logs in. It takes about one minute forty five seconds for the login to process and for the user to get to a usable desktop. During this time, the default user template is being applied, home directory created, etc., etc. Newuser1 then logs out of the computer.

Newuser2, logs into the computer and gets to a usable desktop 25 seconds later. When newuser2 is done, he/she reboots the computer.

If either of the previous users come back to the computer after it has been rebooted, they will be able to log in and get to their desktop in about 25 seconds (we clear the home folders once a week). However, if a user who already has a home folder logs into the machine first, and then Newuser3 comes along, newuser3 will have to wait for 1:45 to get to the desktop. After that, additional new users will be able to get to the usable desktop within 25 seconds.

So all of this says to me that when the first new user after a reboot logs in and waits the 1:45 to get to the desktop, something is happening in the background that becomes cached/remembered and available to the process of creating additional new user home folders, and stays available until the next time the machine is rebooted, and then the process starts all over again. I just can't figure out what, and my head is getting sore from being beaten against the wall over this problem.

In Directory Utility < Services my AD bind settings are the following:

User Experience:
Create mobile account at login - unchecked
Force local home directory on startup disk - checked
Use UNC path from Active Directory to derive network home location - unchecked

Mappings: All set to defaults

Administrative:
Prefer this domain server - unchecked (and I have been strongly advised not to put anything in there)
Allow authentication from any domain in the forest - unchecked

In Directory Services < Search Policies I have removed all references to "All Domains" and have explicitly defined the domain that I want people logging in from.

I have also defined my domain in System Preferences < Network < DNS < Search Domains. I tried adding "local" and/or ".local" to into the Search Domains but it had no effect.

I also tried changing the timeout options for the ActiveDirectory.plist file, (I had to find the plist file first because it's not in the same place as most of the articles say it is), but that seemed to have no effect either.

Now, I will also say that my default user template is ~400 MB. It could be smaller, but I don't think that template in and of itself is the problem, because if I create a new, local user, that user can log into the machine in 25 seconds or less. It seems like this is somewhat related to the network login, but I just can't find the connection.

I have had our network engineering team monitor the traffic happening on one of my test machines, and while they can see a bunch of LDAP packet errors happening during login, but the percentage of packet errors is not consistent with the amount of time that it takes to log in, so none of us think that those errors realistically are the problem, and there are no other smoking guns in the traffic logs.

Has anybody run into a similar situation, and if so, any ideas?

Thanks in advance for any help/insights.

15 REPLIES 15

koepke
New Contributor II

My company experienced the exact same issue back in the 10.8.X days. Well over 500K objects in AD. We cam to the conclusion that the Mac was traversing the entire AD structure to locate the user object. There was no consistency on login times for different users but it always took over 70 seconds the first time a user logged in. At the time we were on Active Directory Schema 2003. After we upgraded to Active Directory Schema 2008 R2 or 2012 the problem went away. Not sure which version as my department doesn't have any contact with AD team. We now normal login times. First login is maybe 20 seconds and after that its closer to 5.

Do you know what Schema version you are using?

koepke
New Contributor II

Probably not helpful but:

OS X, OS X Server: Active Directory attributes required to be accessible to OS X computer account objects
https://support.apple.com/en-us/HT202244

jkuo
Contributor

What about running a script on startup to set the time server to your domain controller? I have one running on a recurring basis on our domain joined Macs specifically to keep them in sync and to avoid authentication delays due to time offsets. The only thing I'm not sure of is when in the startup sequence this would run and whether or not it would take. (Mine runs on a recurring schedule, and we keep the Macs on all the time).

The script might be something like this:

#!/bin/sh   

#set date, time asap
TIMEZONE="America/Chicago"
TIMESERVER="xx.xx.xx.xx" 

SetDateAndTime () {
        systemsetup -settimezone "$TIMEZONE"
        #Update network time from DC
        sudo ntpdate -u $TIMESERVER

        #Toggle network time
        systemsetup -setusingnetworktime off

        systemsetup -setusingnetworktime on
        sleep 10 #let the clock change before running date
}

SetDateAndTime

dudzikj
New Contributor III

Hi all,

Thank you for your quick responses and ideas.

@koepke We are running on a 2008 R2 AD schema. As a troubleshooting step, I tried setting the field to "prefer this domain server" to each individual domain controller, but that didn't seem to have any effect on the issue.

@jkuo I have a similar script that bounces the time server at startup, but I like that yours specifically sets the time zone (mine just tells the machine to stop time services and then start again, pointed to our time servers). I think yous shaved a couple of seconds off of the login times.

So, here's where we're at now.

I shaved the default user template down to 70 MB. That got me to a 45 second initial login, with subsequent login times of ~17 seconds. That's acceptable, but I'd like to see that initial login time faster, if possible.

Now the one interesting thing is that every so often, for whatever reason, initial login times just all of the sudden start working and I can get into the machine in 20 seconds or less. I have not been able to figure out why, and that generally only lasts for a few minutes, tops.

This morning I started going through the system.log file and noticed that Microsoft SCEP is scanning seemingly every file that is being used to create the user's profile. I removed SCEP from the machine, and all of the sudden I am getting consistent 15-17 second initial login times.

So now it seems like I just need to figure out a way to get SCEP to not scan the home folder as it is being created. I don't want to permanently except the /Users/ folder, so I'm hoping there's a way to pause SCEP so that it won't do that right during the home folder creation.

At least I'm fairly certain that I know where the problem is now. :)

easyedc
Valued Contributor II

Have you tried something simple like forcing a timeout delay? We run

sudo defaults write /Library/Preferences/com.apple.loginwindow DSBindTimeout -int 15

to resolve this problem for us in the past.

jmahlman
Valued Contributor

We are having this exact issue....did you actually find a resolution?

Note: The DSBindTimeout didn't work for us either.

bradtchapman
Valued Contributor II

@jmahlman, have you run through the Apple KB article "Verifying DNS consistency for binding Active Directory" ?

https://support.apple.com/en-us/HT201885

You'll want to ensure that your organization has the proper DNS SRV records set for LDAP discovery. Also, a '.local' domain is verboten. :)

dudzikj
New Contributor III

Beyond keeping the profiles as small as possible and wishing we could get rid of antivirus software, I never got much farther with the issue.

I did find that overall image size seems to have an effect on login times, even if the default user template is small. We actually saw login times get slower once we started testing with El Cap and noticed that they improved significantly on bare bones testing images that didn't have everything plus the kitchen sink on them. Ultimately what we ended up doing for that was creating a stripped, bare bones image that has a very basic software package for a lot of our areas where the machines are general use, and having a bigger, full meal deal image for the specialized labs, which generally have newer hardware. SSDs definitely made a difference in those areas, although still not as much as we had hoped.

jmahlman
Valued Contributor

@bradtchapman Haven't check that out, will do that tomorrow. It's been working fine for us for years, this is the first we've heard of the issue in our environment. We did see some oddities with the DNS lookup which we're going to run by our network guy..but it's just so odd that it only happens on first login and not on subsequent ones.

@dudzikj We never had an issue with our builds, and they really haven't changed much except for the OS. This is the first we've heard about it in the environment..and we've been on 10.11 since last August.

jmahlman
Valued Contributor

So, we set the od logs to "Info" level and grabbed a few. Going through and comparing both sets of logs (one from the first login and one from the second login) the only major difference we found is on the first login (the long one) these lines show up:

2017-02-08 13:02:24.710310 EST - AID: 0x0000000000000000 - idle disconnected '/Active Directory/UA/ua.lan:ldap:A06C0891-C050-4F53-8EFC-08310B2B87C1' to 10.32.2.4 after 90 seconds
2017-02-08 13:02:24.710464 EST - AID: 0x0000000000000000 - no longer watching destination 10.32.2.4
2017-02-08 13:02:24.710601 EST - AID: 0x0000000000000000 - Module: ldap - closing socket 12 for connection 0x7ffb4adcc410
2017-02-08 13:02:25.077899 EST - AID: 0x0000000000000000 - idle disconnected '/Active Directory/UA/Global Catalog:ldap:32DF27A0-02C4-4146-98F8-9864FF8ACA44' to 10.32.2.4 after 90 seconds
2017-02-08 13:02:25.078065 EST - AID: 0x0000000000000000 - no longer watching destination 10.32.2.4
2017-02-08 13:02:25.078173 EST - AID: 0x0000000000000000 - Module: ldap - closing socket 8 for connection 0x7ffb4ad21160
2017-02-08 13:02:47.988351 EST - AID: 0x0000000000000000 - Trigger - new node trigger watching for 'opendirectoryd:nodes;(register|unregister);.*'

These lines also show up in the longest break in the logs; roughly 1 minute.

charles_hitch
Contributor II

@jmahlman What version of macOS are you running and what antivirus are you using. We had this issue for a long time and found it was being caused by an incompatible version of McAfee. After upgrading McAfee our login times dropped dramatically and the issue is basically resolved now.

The other thing I have found can impact login times is having bad DNS servers in the resolution list. One of our servers was unknowingly decommissioned. Once we removed that from the list, logins returned to normal.

jmahlman
Valued Contributor

@charles.hitch We're using 10.11.6 and Sophos 9.whatever the latest is ;)

We checked our DNS today and everything looks okay.

I did some testing today by installing our programs one by one on a clean bound system and the ones I installed didn't cause any hang ups, so I'm going to continue testing that more when I head back into the office. We're also going to talk to our AD guy to see if he knows anything...

jmahlman
Valued Contributor

@charles.hitch We took your advice and looked at the antivirus a bit more and found that it was causing the problem! We're in the process of looking for a replacement AV anyway, so this will be another reason to look at something other than sophos. We're also going to try to install a newer build AFTER imaging instead of during.

Thank you all so much for the assistance!

Chuey
Contributor III

@jmahlman We too use Sophos as our AV version 9.x something. What specifically with Sophos was delaying your login times? I'd like to know. Thanks for any info

jmahlman
Valued Contributor

@Chuey Honestly, we don't know...we really haven't had time to figure that out yet. We really just needed to fix the slow login times quickly, so we did a mass removal of Sophos on all of our public machines and we're working on trying different versions of Sophos or different settings. Our Sophos subscription is up for renewal in a few weeks and we were thinking of switching anyway, so this is just more reason to do it.

If we find anything out, I'll update here.

Also, the version that was installed was 9.5.x autoupdated from 9.2.8 at image time. Computers are all 10.11.6.