Posted on 10-21-2013 09:24 AM
I'm having a problem show up randomly on our 10.8.5 computers. The user will log out and when they attempt to log back in they are just shaken off. When I look at the system log I see this:
Oct 21 09:14:58 COMPNAME SecurityAgent[129]: User info context values set for username
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): Got user: username
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): Got ruser: (null)
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): Got service: authorization
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): Context initialised
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): Stashing kcm credentials in enviroment for kcminit: username@domain.COM
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): pam_sm_authenticate: ntlm
Oct 21 09:14:58 COMPNAME rpcsvchost[113]: failed to create secure channel: STATUS_ACCESS_DENIED (0xC0000022)
Oct 21 09:14:58 COMPNAME authorizationhost[300]: in pam_sm_authenticate(): OpenDirectory - The authtok is incorrect.
Oct 21 09:14:58 COMPNAME authorizationhost[300]: Failed to authenticate user <username> (error: 9).
If I attempt to unbind the computer from the domain and bind it again, I get the error:
An error occurred binding to Active Directory: dsconfigad: Authentication server could not be contacted. (5200). (Attempt 1)
DNS forward and reverse lookups are working correctly. Pings fine.
The fact that I can't bind to the domain again seems to point me to the domain controller, any ideas?
The primary and secondary domain controllers are 2008r2, and I get the same result if I manually specify either of the domain controllers to prefer.
Solved! Go to Solution.
Posted on 10-22-2013 08:00 AM
OK. I figured it out. If the JNUC was this coming week instead of last week I'd buy you all a beer, it wasn't any of the things mentioned, but I'm still grateful for the suggestions.
Turns out it was a DNS problem. Sort of. The root cause was the the hard drive on our secondary domain controller /secondary DNS filled up, for a reason that has yet to be determined. That caused AD to stop replicating, which caused DNS to stop updating somehow, which ended up effecting the DNS entries in our primary DNS.
It took me an entire day of troubleshooting the clients before I moved to the right server, but I guess when a client says, "Authentication server could not be contacted. (5200)" it really means that it can't contact the authentication server. I wish I could blame the error for being unclear.
Thanks again.
Posted on 10-21-2013 09:29 AM
Is the clock more than 5 minutes out from the Domain?
Posted on 10-21-2013 09:34 AM
Nope, the clock is spot on.
Posted on 10-21-2013 09:38 AM
So I've been able to bind to the domain now, but not from command line, and not from the Users&Groups + button, but if I go into directory utility and try to bind I am able to.
the dsconfigad -show for the computer are:
Active Directory Forest = domain.com
Active Directory Domain = domain.com
Computer Account = computername$
Advanced Options - User Experience Create mobile account at login = Enabled Require confirmation = Disabled Force home to startup disk = Enabled Mount home as sharepoint = Enabled Use Windows UNC path for home = Disabled Network protocol to be used = smb Default user Shell = /bin/bash
Advanced Options - Mappings Mapping UID to attribute = not set Mapping user GID to attribute = not set Mapping group GID to attribute = not set Generate Kerberos authority = Enabled
Advanced Options - Administrative Preferred Domain controller = not set Allowed admin groups = not set Authentication from any domain = Enabled Packet signing = allow Packet encryption = allow Password change interval = 14 Restrict Dynamic DNS updates = not set Namespace mode = domain
Posted on 10-21-2013 09:49 AM
After I've unbound and bound the computer to the domain the only difference I can see in the computer record in AD is that it shows the OS version correctly under Operating System tab of the computer properties.
There are no problems with unbinding the computer from the domain and binding it again in any manor. The user can log in, everything works fine. However, it is happening on a bunch of different computers. Can the OS Version shown in the AD record properties effect how it works?
Posted on 10-21-2013 10:29 AM
A while back when we had some Active Directory communication issues, one thing I discovered was that the information didplayed in dsonfigad -show can be a little misleading. Its actually read from a local plist file stored in the System, but that isn't actually what creates the communication between the Mac and AD. It looks like starting with 10.7 and forward, Apple is using a System level keychain that stores the Computer's AD password, the one that gets rotated quietly by the client every 14 days by default. This is also what creates the communication between the 2 systems.
If you look in Keychain Access from any Mac bound to AD, under the System keychain in the sidebar, check for a keychain called something like '/Active Directory/DOMAIN", where DOMAIN is your org's domain name. When you locate it, double click on it and you'll see that the "Account" is the computer's AD record name, like Mac001$, etc. and the password when you show it is something random.
What i found while investigating these issues is that if that keychain entry is missing, communication between the Mac and AD are broken. While cached mobile accounts will still work OK (for a little while) those acocunts won't be notified of password expirations at the login screen and things like lookups in Terminal or from Directory Utility will fail. Despite all this, running dsconfigad -show gives you what appears to be a fullty AD configured Mac, so that's why I say that using dsconfigad -show isn't the most accurate way to check connectivity.
To that end, I developed an Extension Attribute that goes through the step of verifying the keychain entry exists, is valid, and also does a test lookup when run to determine if AD Communication is verified or not. We can use that now to locate systems where it looks like everything is OK, but in fact something is broken. My EA will report if the lookup failed or if the keychain entry is MIA, or both, or in the case where everything is good, reports "Yes" for AD Communication Verified.
You may want to look for that keychain item on the Macs affected by the login issue. Since its a System level keychain, you can log in with a local admin account and see it.
Also I assume you aren't using mobile accounts with cached credentials?
Posted on 10-21-2013 10:37 AM
The AD record will show the computers OS @ time of bind.
It sounds like your macs are resetting their AD passwords & the changed password isn't either being sent back to the domain or isn't being replicated.
The dsconfigad dump above shows the standard 14 day password reset cycle that mac clients try.
It's burnt me before, so post bind I run:
sudo dsconfigad -setPassInterval 0
(check the man page for dsconfigad).
The above will stop the mac clients from updating their password.
Posted on 10-21-2013 12:29 PM
In looking at the system keychain it has the /Active Directory/Domain entry, which all the information you described in it. Would you mind sharing your extension attribute?
When I look at the keychain record and see the password, which is quite a jumble, I assume that password is something that AD is using to authenticate the computer record in AD? Is that the same password that managed by the -setPassInterval flag on the dsconfigad? If so, do you know if there is somewhere I can view the corresponding password in AD to see that is what has broken?
Posted on 10-21-2013 12:58 PM
@jhuhmann
You're correct in assuming that jumble of a password is how Active Directory communicates with the Mac. I can't necessarily tell you if that information is somewhere viewable in AD, but my guess is it may be. I'm not really certain though.
You can try using @bentoms' method and dropping the password change interval to 0 (disabled) as a test and see if it resolves your issue. I'll just note that in most environments this wouldn't be considered secure, so I'd personally only do it as a troubleshooting step. Its possible the password is getting out of sync due to some configuration issues and causing users not to be able to log in again.
As for my EA, I'll see about posting it here. I'll have to clean it up to remove sensitive network information. Part of what it does up front is determine if the Mac can ping our primary DC, to check if the Mac is connected to the internal network (we have an externally facing Limited Access JSS) then goes on to check if the keychain exists and finally, checks its own computer record in AD to see if the lookup is successful. If anything fails along the way it stops and notes the results. If it gets through it all, we were successful and communication is verified. If off the network it stops and returns N/A as the result.
Posted on 10-21-2013 01:08 PM
Here is my 2c worth of experience.
1 of my 5 customers has a very strict AD computer password timeout policy, if you go more than 15 days without changing the pw, your computer winds up in a disabled OU and while it is technically bound you cannot do any name lookups, login while connected to the network etc. a simple rebind after the computer record has been deleted clears this all up, we have the passinterval set to 14 for that environment and it does pretty well until somebody goes on 2 week vacation.
Most of the other accounts we set passinterval to 0 and don't let it change and that solved most of the AD flakiness in those accounts.
In my opinion get your AD binding automated, and either have it check each boot with a launch daemon and train your help desk on how to rebind computers. Its an imperfect science although it seems to be getting more stable on later OS.
Posted on 10-21-2013 01:16 PM
I tried dropping the password change interval to 0, but it didn't resolve it, but if it was already out of sync I would guess that probably wouldn't fix it.
I don't think that is the problem though, I'm trying to join a computer to the domain using dsconfigad, or using the join box in system preferences, and I get the error:
dsconfigad: Authentication server could not be contacted. (5200), or Unable to add server. Authentication server could not be contacted. (5200) if I do it from the system preferences. In this instance there is no existing password to get out of sync. It seems like whatever is happening is disrupting the communications between the two for both existing connections and new ones.
Posted on 10-21-2013 01:28 PM
@nessts
The idea of getting the help desk comfortable and able to re-bind to the domain sounds like a glorious impossibility. I'm a k12 tech with 8 buildings, 1600 devices, and two techs for it all, of which I'm one. I need to figure out the problem because simply on the basis of time, frequently re-binding systems to the domain isn't really a possibility for us.
Posted on 10-21-2013 05:23 PM
My 2 cents:
had a similar issue where it would be random users . a day ago they could login, next week no go. here's what worked for me:
1. got rid of mobile accounts
2. it seems a local account was being created, check System>Library>CoreServices> Directory Utility
launch Directory Utility choose Directory Editor, on the left pain look for the username.delete it. unbind, reboot.bind
3. Rinse repeat
hth
LS
Posted on 10-22-2013 06:13 AM
I don't think it is something happening with the specific account because when a computer is having the problem it will happen for any account, regardless of if they have logged into the computer before. When I was thinking it was something wrong with the user account I tried forcing the user to reset their password to see if that had any effect on anything. They could enter their old username/password, AD communicated to the computer that the account password needed to be changed, and gave them the prompt to do so, but when they tried to change it, it failed. The computer is communicating with AD in some form, but failing at some point.
Posted on 10-22-2013 08:00 AM
OK. I figured it out. If the JNUC was this coming week instead of last week I'd buy you all a beer, it wasn't any of the things mentioned, but I'm still grateful for the suggestions.
Turns out it was a DNS problem. Sort of. The root cause was the the hard drive on our secondary domain controller /secondary DNS filled up, for a reason that has yet to be determined. That caused AD to stop replicating, which caused DNS to stop updating somehow, which ended up effecting the DNS entries in our primary DNS.
It took me an entire day of troubleshooting the clients before I moved to the right server, but I guess when a client says, "Authentication server could not be contacted. (5200)" it really means that it can't contact the authentication server. I wish I could blame the error for being unclear.
Thanks again.
Posted on 04-24-2014 07:12 AM
@jhuhmann im having the exact same problem you experienced. What steps did you take to resolve this? I am totally stumped...
Posted on 04-24-2014 07:26 AM
@myronjoffe In my case the problem was a local server that runs DNS, DHCP, and Active Directory. A service filled the hard drive and caused DNS/DHCP to stop working correctly, which caused my AD authentication to fail because of missing computer records. It didn't happen everywhere at once but manifested as individual computers had their DHCP lease expire and could not renew with a new lease. It was a cascading problem, the root cause of which in my case was disk space on a server.
I cleared up the server hard drive space problem and once dynamic DNS entries started getting entered again it started working. Check the logs, Event Log if your server is windows, on your DNS server, primary and secondary if you have them.
Posted on 05-14-2014 09:44 PM
in regards to your post https://jamfnation.jamfsoftware.com/discussion.html?id=8722#responseChild46642
we have a few computer labs where computers are losing that System keychain /Active Directory/DOMAIN which subsequently breaks their AD connection... is there some alternative way to recreate this key when you find it's missing or is an unbind/rebind the only answer?
Did you find out what was causing this keychain to go MIA... was going to try set our pass intervals to 0 to see if it stops happening, but not sure if that's the cause of the disappearance.
Any help with how you set up your EA to monitor this would be hugely appreciated too!
Thanks for any help you can offer!
Posted on 11-25-2014 02:30 PM
I'll go ahead and chime in that we are seeing this same issue very randomly on machines in our district. We are bound to AD, and the affected machines are missing the AD/Domain keychain. We have been rebinding on a case by case basis but it is quite frustrating with our limited staffing. This is the console message that identified the issue for us:
Failed to retrieve keychain password for 'ms103574$' module '' node '/Active Directory/DOMAIN' - -25300
Posted on 11-25-2014 02:56 PM
Hmm. Looks like I never followed up on posting my Extension Attribute here as was requested by a few of you.
Allow me to correct that now. Here is a cleaned up version of the EA script. Back when it was written we still managed a decent enough number of 10.6 Macs that we needed to accommodate for any running Snow Leopard. Apparently the use of the System AD keychain only came about in 10.7 and up. Some of you can probably remove that section, but it works as is.
#!/bin/bash
OSvers=$( sw_vers -productVersion | cut -d. -f2 )
## Notes: Change the below 'dc.domain.company.com' to the name of your primary domain controller, or some internal server
if ping -c 2 -o dc.domain.company.com; then
## Mac is on the network, continue
## Notes: Change the 'domain.company.com' to the AD domain typically returned from dsconfigad -show
if [[ $(dsconfigad -show | awk '/Active Directory Domain/{ print $NF }') == "domain.company.com" ]]; then
ADCompName=$(dsconfigad -show | awk '/Computer Account/{ print $NF }')
## Mac has correct dsconfigad info, continue
if [[ "$OSvers" -ge "7" ]]; then
## Change 'DOMAIN' below to your domain
security find-generic-password -l "/Active Directory/DOMAIN" | grep "Active Directory"
if [ "$?" == "0" ]; then
## AD keychain file exists, continue
## Change the 'DOMAIN' below to your domain
dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep "$ADCompName"
if [ "$?" == "0" ]; then
## Successful lookup of computer record. AD communication is working
res="Yes"
else
res="No - AD Lookup Failed"
fi
else
res="No - AD Keychain Not Found"
fi
else
if [[ "$OSvers" -le "6" ]]; then
## OS is 10.6.x, moving on to AD lookup
## Change the 'DOMAIN' below to your domain
dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep "$ADCompName"
if [ "$?" == "0" ]; then
## Successful lookup of computer record. AD communication is working
res="Yes"
else
res="No - AD Lookup Failed"
fi
fi
fi
else
res="No - Not Joined to domain.company.com domain"
fi
else
## Mac is not on the network or has no network connection
echo "Not connected to company network"
res="Remote"
fi
echo "<result>$res</result>"
Also, it does the ping because our Macs can check in from outside the company network and so it needs to make sure it can do the lookup stuff or will fail. So if the ping doesn't work, it just reports that the Mac is remote so as not to falsely report it as not communicating with AD.
Posted on 11-25-2014 04:29 PM
Awesome, thanks so much for sharing your EA mm2270!
Posted on 12-23-2014 12:45 PM
I had this issue and was able to successfully use the 'dsconfigad -add' command with the following steps. I chose a specific domain controller and confirmed that I could ping it from the problem unbound mac. Then I included that specific domain controller using the -preferred option when issuing the -add. It was then succesfully bound. Finally, I ran a 'sudo dsconfigad -nopreferred' to restore the discovery function for resiliency.