Cannot talk with AD Domain - Which Systems Affected? Extension Attribute

pickerin
Contributor II

We've recently had a rash of systems that suddenly cannot talk to the Active Directory Domain. We haven't identified exactly what's happening, but the symptom is that the user cannot change their password from the Accounts Preference Pane, being told they do not have permission and to speak with an administrator. If they change their password externally to the Macintosh (using a Windows system, or another password management tool), then their keychain password and AD password get out of sync and various other issues ensue.

I wrote the following extension attribution after doing some research and we can now identify the systems having this problem and proactively fix them (the only fix we've found is to re-bind the system to the AD Domain):

#!/bin/bash

domain="YOURDOMAIN"
user="someuser"

# Can we query a UPN?
domainAns=`dscl /Active Directory/${domain}/All Domains -read /Users/${user} dsAttrTypeNative:userPrincipalName`
if [[ $domainAns =~ "is not valid" ]]; then
    result="Invalid"
else
        result="Valid"
fi

echo "<result>$result</result>"

Now, hopefully we can figure out why it's happening and fix it.

11 REPLIES 11

jhbush
Valued Contributor II

pickerin, we've seen this as well. What's odd is that doing an ID on user not local to the machine work fine, but we get the same error you mentioned. Please let us know if you discover the cause.

pickerin
Contributor II

We're pretty sure it has something to do with the Computer Account "Password Change Interval". You can see that the default is "14 Days". So, every 14 days the computer has to change the Computer Account password with the domain. If for some reason that doesn't happen, AD expires the Computer Account password and denies access to the Computer (not the user, the Computer).

You can see these settings by performing: dsconfigad -show
You'll find: Password change interval = 14

Some folks have said that you can avoid this problem by setting that interval to 0 (never change). Which you can do with this command: dsconfigad -lu localadminusername -lp localadminpassword -passinterval 0

Unfortunately, the article I found that in (can't find it now), also said you have to issue it, then REBIND to the Domain for it to take affect.

If we find out more, I'll let you know. I'm actually hoping that my Extension Attribute, exercising the Computer account each day will avoid the problem going forward.

gregp
Contributor

Which OS is this happening on?

Lion has serious problems in our environment, and it was due to our machine accts in one domain and user accounts in another. Lion did not handle that well and we saw symptoms similar to yours. Mtn Lion, on the other hand, is fine (except for unbinding, at least as of 10.8.1. Need to recheck that).

pickerin
Contributor II

This is with Mountain Lion. Yes, Lion was an AD nightmare, but everything that made Lion awful seems to have been fixed in Mountain Lion.

I run unbinds on 10.8 all the time, never had an issue.

My only ancillary issue is that using my Casper AD admin account, I don't seem to be able to re-bind Macs to the domain, I have to delete the old Computer account (tried doing a "reset" from the AD side, which is what allows re-binding for our Windows boxes, but it didn't work).

maiksanftenberg
Contributor II

If you are not able to overwrite existing Computer Accounts in the Domain it looks like missing permissions.
You don't have to delete the machine.
Most likely it's enough if you "reset" the account.

We have bound all of our 10.8.x machines with Centrify Express that come with 5.1 with Pref Pane integration what is looking really great.

Cheers.

pickerin
Contributor II

Yep, we tried to figure out the proper permissions and it's not working. My AD admins provided the following for our JSS AD bind account:

Allow - Permission: Create All Child Objects - Applies To: Computer Objects
Allow - Permission: Create Computer Objects - Applies To: This object and all child objects
Allow - Permission: Delete All Child Objects - Applies To: Computer Objects
Allow - Permission: Delete Computer Objects - Applies To: This oject and all child objects
Allow - Permission: Read All Properties - Applies To: Computer Objects
Allow - Permission: Write All Properties - Applies To: Computer Ojects

Even with these permissions on the OU where we place Macintosh Systems, I cannot re-bind a system even after Resetting the Object manually. I have to delete it. :(

jgutzman
New Contributor

Pickerin

Was your solution for the computer password issue in AD to run the password change interval to 0 and simply reconnect to the domain? (unbind and bind)

Obviously thats not ideal but i'd love to keep these machines active on our network!

Thanks for the update

pickerin
Contributor II

The only solution I've found when this happens is to:
unbind the computer locally (because it thinks it's bound to AD, but AD doesn't agree)
Then rebind the computer

The password change interval doesn't need to be changed, and the problem doesn't manifest consistently. I've even had systems fail my Extension Attribute, but then recover on their own later. The real problem is that if the problem is happening AND the user wants to change their own password, they cannot (Change Password on the Account pane is greyed out because the computer no longer has rights on the domain). That's the only time it gets escalated to my support team.

NoahRJ
Contributor II

@pickerin Not to drudge up an old topic, but I'm running into the EXACT same issue you've described. We also isolated that it's probably an issue with the machine password expiration (we also set the interval to 14 days), but haven't been able to isolate why, when a machine password expires, it is unable to correctly change. I've looked up a few machine records in dscl from my Mac and have identified that almost all machines that are getting flagged by the Extension Attribute have a SMBPasswordLastSet date that is outside of spec.

However, we don't have opendirectoryd logs that are verbose enough to really use that date (or 14 days out) to determine the cause of why the machine tried to reach out to the DC to change its password and failed. I know that the machine first changes the password locally and then attempts to update it in Active Directory, but I don't know where the communication breakdown might occur from there. It looks like when engaging a dscl query or attempting a password change, a machine won't even attempt communication with a DC (as visible from nettop), so I'm sure it's not able to talk to AD to get the machine password synced.

Like you've seen also, we get some false positives on our EA where the machine fails a dscl (or an id) lookup and then recovers the next time it checks in. Hard to track down what could be setting those off, and it's pretty frustrating because the symptoms of that one are all but impossible to replicate, and we don't want to set odutil logging to debug on all our machines because of how huge those logs get.

Have you had any success in the last year troubleshooting this? I have a support case open with Apple on this, but any information you (or anyone else) could provide on this would be incredibly valuable.

sgoetz
Contributor

So there are few things a Mac needs to talk to AD. In Keychain Access for System you will see a Active Directory Keychain. That is the computer password. If that is missing it won't talk to AD and the machine needs to be unbound and re-bound. Next the time on the Mac needs to be with-in 5 minutes of the time on the Domain Controllers. So if the time is greater than 5 minutes. AD will not respond. I actually wrote a script that Checks all the things that need to be present for AD communication and if anything is wrong. It will unbind the machine and rebind it automatically. I run it once a day against all AD bound machines.

Xopher
New Contributor III

@ sgoetz
Any chance you could share a your script for AD communication checks?
Thanks