Insane AD authentication issue - help?!

jenuon
New Contributor

Hi All,
Long time stalker, first time poster. We are hopefully running through the process and hope to have Casper in place for the new year. Currently a Symantec house and not enjoying the lack of features.

We are experiencing some very very inconsistent issues with AD authentication in our lab environments.

All machines are running 10.9 or higher. The full thick image deployed is essentially OOBE with our corporate apps installed. Not much tweaking is done aside from the lproj profile, local accounts, proxy and a mobile config applied that includes airport and bluetooth disabling, login window text, energy settings and a shared drive. User accounts are created locally based on a profile but not mobile so user credentials aren't stored after logout.

What happens is at the login screen "Networks Accounts are Unavailable" appears. Randomly. Sometimes after restart, sometimes after a user logs out, sometimes after waking from sleep. Always sometimes.

Sometimes restarting resolves it.
Sometimes reseating the network cable resolves it.
Sometimes rebinding to AD resolves it.
Sometimes logging in as a local user resolves it.
Removing the mobileconfig profile doesn't change anything.
Checked the System keychain - all appears fine.
AD Binding seems perfectly fine.
Time is always okay and set correctly.

There's never any consistency to the start or the resolution of the issue. I haven't been able to pin down a possible lapse of time that triggers it and I'm starting to pluck at straws that are very thin.

Can anyone suggest some further troubleshooting I can complete to help get to the bottom of the issue? All of our infrastructure teams are convinced this is an Apple issue and I have no idea where to start to push back.

24 REPLIES 24

calumhunter
Valued Contributor

how are you binding the machine to ad? at what point in the imaging process?
my understanding is that the network accounts unavailable means that directory services isnt getting a response from the addc
usually i see this on a machine when it starts up, it gets to the login window quickly but takes maybe 10-15 seconds before it gives me the green light. Also see it on logout if its on wireless - if your using 802.1x then it will always be red, as you need to connect to authenticate to the wifi, get an ip then you can authenticate to ad for login.

when in doubt, check dns, there might be a stale dns record for a dc that has been removed but not demoted properly
http://support.apple.com/kb/HT3394

centrify also have an addiag tool which i found really useful. it comes with their free express product, maybe download it on a fresh unbound machine and run it just to check that your environment passes all the tests.

jenuon
New Contributor

Hey Calum,

Thanks for the speedy response.

We don't have any wifi config machines, and those few laptops that have them haven't got the wifi disabled.

The dig command returned the same results from both a "working" and "not working" machine. The issue I see is that once it gets a red light it won't recover unless restarted, reseated or rebound.

I will try the addiag tool tomorrow.

Merkley
New Contributor III

We are having this same issue. This is happening on both iMacs and MacBook Airs. It seems to not matter about any of the machine models as we have a wide variety of models in our district. All the machines are 10.9.4 and Casper is doing the binding process after imaging. I've also unbound and rebound machines but then a day later it doesn't work again.

I'll look at the addiag tool as well and see what happens. My current AD setting on the machines are set to use the UNC path to get the home folder. Mobile account is unchecked so is Local Home folder. In the Administration tab, I unchecked allow authentication from any domain in the forest.

I'll report back what the tool finds and if it passes.

Merkley
New Contributor III

I promised I would report back my findings. My network passed the Centrify AD Check. I also checked the KB article you included and everything looked correct in my findings.

The configuration profiles that I have installing modify the login screen to include a banner, disable auto login and some energy saver settings.

Our next plan to do is image a lab with just 10.9.4 and manually bind the 30 machines to AD and see if that works. We aren't going to include Casper or any 3rd party software on them yet so we can check to see if it's something with our network.

Any other ideas to try would help.

jenuon
New Contributor

Our network also passed all of the centrify express tests.

I had some interesting results running:
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist
followed by:
sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist

I had a lab of 31 machines:
5 already logged in
7 logged in no problems
after running the above via ARD at the login screen - 12 more then logged in no problems
This left 7 that continued to not allow AD authentication until restart

So, could be DNS?!?

henrysalas
New Contributor

Dou you have IPV6 enabled? Also, how many Domain Controllers do you have in your environment?

Merkley
New Contributor III

I don't have IPV6 disabled. I've done that on the servers and my test environment and will see if that helps. We have 8 Domain controllers.

frozenarse
Contributor II

This is going to be a long shot but we JUST resolved a similar issue.

Are you able to SSH into a mac that is stuck doing the "Network Accounts are Unavailable" thing? Run a Netstat command and look if all the problematic machines are trying to use a common domain controller. In our situation, machines would be trying to use a "read only" domain controller that was recently setup. The firewall between the read only DC and the other 'real domain controllers' was flagging communication between them as P2P traffic and blocking it.

I doubt you have a similar setup but starting to look for similarities with Netstat is how we got headed down the path to determining our problem.

Do you have a preferred Domain controller configured when you bind?

dlondon
Valued Contributor

Hi,

I had noticed that some of our lab machines after going into screen saver mode on the logon screen would no longer allow users to authenticate. Does that describe your issue? Like you, a reboot fixed things. There are other posts here about this on 10.9.x. I tried the solution using sleep watcher to momentarily disable then enable the network interface if it comes out of sleep and it looks to have sorted it out.

Regards,

David

Merkley
New Contributor III

Disabling IPv6 didn't help. It still was acting really strange. I had our network guys take a look into the firewall and found that some of our domain controllers weren't in the correct firewall groups. After they made a change to the firewall it's working better but I'm not sure it's at 100% yet. I did look into the screensaver thinking that was part of the problem but we would be logging accounts in on the machines and for one student it would work then for others it would fail, the screen saver didn't come on and we were using it when it would fail. The network guys have some more things they are going to try and I'm hoping it's something that messed up on our network.

jenuon
New Contributor

Hi All,

Thanks for the responses.

We've troubleshot around ipv6 with no definitive solution. By default it is enabled but disabling it doesn't seem to resolve it.

Thanks frozenarse (great handle!) but we've already tried to find any common domain controllers and there's no pattern between the three in our environment.

Today we have headed down the AD password interval track. Our AD admin advised that a couple of machines yesterday were being bounced by all the domain controllers due to the machine password being set incorrectly. We have a lab of 30 and have set 15 to a 0 interval, then set two others to 1 day interval. The rest of the lab are our "control" which have hit and miss logins.

I've been watching the 15 with the 0 interval all morning and so far all AD login attempts have worked. Even returning from screen savers.

FYI for others, we have run this via ARD:
sudo dsconfigad -passinterval 0
sudo killall opendirectoryd
dscacheutil -flushcache

Now to wait a couple of days and see what happens...

henrysalas
New Contributor

Hi Jenuon,

I have experienced slowness logging in when IPv6 is enabled. We have a large AD environment with DCs all over the world and a few trusts in place. I have seen the message "Network Accounts are Unavailable" for a few seconds but then the Mac establishes connectivity with our Domain. Below is a print out of my Mac's AD settings. Make sure DNS is clean in your environment. Run in terminal nslookup "your domain" and you should get the IP addresses of your 8 DCs. Verify forward and reverse DNS resolution. Are you running WINS? Are you running Microsoft DHCP?

dsconfigad -show
Active Directory Forest = our domain.com
Active Directory Domain = our domain.com
Computer Account = mymac

Advanced Options - User Experience Create mobile account at login = Enabled Require confirmation = Disabled Force home to startup disk = Enabled Mount home as sharepoint = Enabled Use Windows UNC path for home = Disabled Network protocol to be used = smb Default user Shell = /bin/bash

Advanced Options - Mappings Mapping UID to attribute = not set Mapping user GID to attribute = not set Mapping group GID to attribute = not set Generate Kerberos authority = Enabled

Advanced Options - Administrative Preferred Domain controller = FQDN of nearest DC Allowed admin groups = Enterprise Admins,Domain Admins Authentication from any domain = Disabled Packet signing = allow Packet encryption = allow Password change interval = 14 Restrict Dynamic DNS updates = not set Namespace mode = domain

jenuon
New Contributor

Hi Henry,

nslookup of our domain returns the 3 DCs.

dsconfigad -show
Active Directory Forest = correct
Active Directory Domain = correct
Computer Account = correct

Advanced Options - User Experience Create mobile account at login = Disabled Require confirmation = Disabled Force home to startup disk = Enabled Mount home as sharepoint = Enabled Use Windows UNC path for home = Disabled Network protocol to be used = afp Default user Shell = /bin/bash

Advanced Options - Mappings Mapping UID to attribute = correct Mapping user GID to attribute = correct Mapping group GID to attribute = not set Generate Kerberos authority = Enabled

Advanced Options - Administrative Preferred Domain controller = not set Allowed admin groups = not set Authentication from any domain = Enabled Packet signing = allow Packet encryption = allow Password change interval = 14 Restrict Dynamic DNS updates = not set Namespace mode = domain

We have established that the killall opendirectoryd forces the login window to become responsive. The only problem is we can't figure out why or how to backtrack this to find the cause.

Not to mention running that command on the login screen to prevent machines "dropping" off again.

Gordo_L
New Contributor

Same boat just rowing right behind you!

Sounds like we're in very similar environments.

Centrify, JAMF, AD, sleeping / wake up issues and non-authentication / bind dropping, reboot etc... the whole 9...

I'm keeping a close watch on this thread!

We have a wide range of mac platforms however, the problems as you described above seem to be more visible on our mac mini's and macbook airs.

I do hate to be so vague in a technical forum however, I have been reading several posts from o.s. ver. 10.9.2 forward where NUMEROUS persons are experiencing similar problems and I have yet to find one that clearly indicates a solid fix...

I haven't had the opportunity as yet to start digging into the root cause however, I did attempt to upgrade all of the problem machines to 10.9.4. The mac mini's still seem to be problematic even with the new OS on them whereas the mac book air's seem to have quieted down a bit for now. Still not convinced the dragon has retreated to his cave quite yet! :)

I was going down the DNS troubleshooting path (still not ruling it out) however, after having read thru this thread I've decided to attempt to hit a few of the mac mini's with a known working previously used image and work my way up from that point. No JAMF, no profiles, just a working older o.s. version and a straight out bind via Centrify with our wireless just to see what happens.

jenuon
New Contributor

This afternoon we have applied the following commands to force specific domain settings to see if these help:

dsconfigad -restrictDDNS en0

To force authetication over Ethernet

dsconfigad -preferred specific.domain.inour.environment

To force the usage of only one DC

sudo dscl /Search -append / CSPSearchPath /Active Directory/DOMAIN/our.domain.com
and
sudo dscl /Search -delete / CSPSearchPath /Active Directory/DOMAIN/All Domains

To make authentication requests a lot more specific.

Until tomorrow... when they all boot up at 7am and I check them at 9...

bentoms
Release Candidate Programs Tester

Hi all,

Are all your macs setup like:

Create mobile account at login = Disabled
Require confirmation = Disabled
Force home to startup disk = Enabled
Mount home as sharepoint = Enabled

Just asking as we do not see the issue, BUT we are creating mobile accounts at login.

jenuon
New Contributor

Our labs aren't running mobile accounts. It's an interesting point... I'll add it to tomorrow's list of troubleshooting :D

DraconicBlue
New Contributor III

That could be part of the issue. If you set the accounts as Mobile, it allows caching of the accounts and allow login, even if not immediately connection to AD.

We have all of our accounts set to Managed and Mobile and as long as the account has logged into the system once, they can log in to screen saver lock and login screen even if you get the "Network Accounts are Unavailable" message.

henrysalas
New Contributor

Hello,

I agree with JRossa....enable Mobile Accounts and also I suggest modifying the settings below.

Preferred Domain controller = not set===> Should set to local DC
Allowed admin groups ====>Domain Admins should be allowed admin group.
Authentication from any domain = Enabled====>Change To Disabled

Olivier
New Contributor II

+1 here, we have this sporadic problem of temporary losing connectivity to domain since 1 or 2 years.

In our case, it is not a AD issue, nor a domain controller issue, nor a Jamf issue, nor a DNS issue. We use "Create mobile account at login = Enabled". No custom OSX AD tweaking, no 802.1X.

What I noticed is that when you run "odutil show nodenames", some of the nodes are offline. In general, rebooting the machine or running a "sudo dsmemberutil flushcache" somehow re-trigger the full connectivity to AD.

I generally fire a "odutil set log debug" and look into /var/log/opendirectory* logs. The reason why OSX deciced that AD is partly unreachable will be logged there. Unfortunately, every time it is a different reason (one time, it says the KDC is unreachable, the other time is that it cannot enumerate the AD sites from SRV records in DNS, sometimes the computer password entry in the system keychain simply disappears so obviously the Mac has no way to establish the trust to AD...).

Notice that we also believe that TimeMachine restores done by endusers, may also kill AD connectivity (this time permanently), as endusers may restore also the system settings from a very old backup, therefore it also restores an outdated computer password in the system keychain, so AD trust relationship becomes obviously broken.

catman2
New Contributor

@Olivier we have just resolved this, I believe, by changing the AD Kerberos configuration:
Service Ticket Lifetime was set to a very short time. we changed it to 600 min (the default) and it all seems to work much better now. just a heads up...
-Shay

scentsy
Contributor

@catman2 we have a similar issue. we get "Networks Accounts are Unavailable" it disappears after 55seconds (once the mac is connected via Ethernet)

could you help me out please and would it be possible to let me know what settings do you have for the following: (thank you in advance)
(we have the defaults set for our Kerberos Policy Settings) (Enabled, 600 min, 10 hrs, 7 days, 5 min.)
what is the kerberos policy set to for the followings settings:
Maximum lifetime for service ticket:

Maximum lifetime for user ticket:

Maximum lifetime for user ticket renewal:

Maximum tolarance for computer clocks:

catman2
New Contributor

Hi,
@scentsy essentially, all are at default, expect max lifetime for user ticket, which is 12 hours.
Maximum lifetime for service ticket:
10 hours <-- was misconfigured as 10 min and probably the cause of problems for us
Maximum lifetime for user ticket: 12 hours
Maximum lifetime for user ticket renewal: default (I think 7 days?)
Maximum tolarance for computer clocks: 5 minutes

if possible, I suggest you set up a test AD domain and test against that.

here's some more lessons learned:

basically. in my case, connecting or disconnecting wifi or LAN would trigger the problem.
the best way to test, for me, was to use Directory Utility to try and browse AD and authenticate at that screen.
I would get error 2100 which indicates a problem.

the most interesting insight I had was that when problems begin, OS-X doesn't even try to communicate with AD - I used wireshark and so it won't as much as send a single packet to AD. meanwhile, opendirectoryd.log showed it was unable to reach AD.
I did not understand just why that would be.

I also noted that just before that, it seems our KDC(s) would reset the session with OS-X.
my assumption is that OS-X was trying each KDC in our domain and it would fail.
It seems restarting the KDC service on the DCs would release the block, but soon enough it would show up again.

eventually, I compared my production domain with a test domain we had set up, where we were lucky to not see this problem. after comparing GPOs between them, we realized the service tickets lifetime was incorrectly configured

below list is from Memory, I don't have the tools here - forgive any inaccuracies...

tools on OS-X:
odtutil (set log to warning)
odutil show nodenames
tail -f /var/log/opendirectoryd.log (used ssh from a remote to see what's going on)
tail -f /var/log/system.log

reset the problem:
launchctl unload /Sytem/Library/LaunchDaemnos/com.apple.opendirectoryd
launchctl load -w /Sytem/Library/LaunchDaemnos/com.apple.opendirectoryd

wireshark (display filter: kerberos or ldap, and capture filter host::<ip of DC> or host:: <ip of other dc>...
ticket viewer (launch from keychain, under file-->ticket viewer believe)
I also used that to delete all tickets and observe new tickets being granted

hope that helps...
good luck m8

scentsy
Contributor

I appreciate your help...I'll use the tools/troubleshooting you have provided.

Thanks again.