Apple Macs losing AD binding

Zed2014
New Contributor II

Hi,

I will buy dinner for anyone that can help me solve this problem:

  1. 18 x Apple iMacs bound to our AD network using the built-in AD plugin
  2. ALL are on a VLAN
  3. DeepFrozen
  4. Were imaged using DeployStudio
  5. Currently on OS Mavericks (OS 10.9.4)
  6. Using AD only no OD servers

Now the problem:

Every now and again these Macs drop off the network randomly and the only solution is to unbind and rebind.

Things i've tried:
1. Using a preferred server for domain and set that same server for Time synchronisation
2. Set pass interval to 0 (this is recommended if you don't want it to check the machine password every 14 days with AD)
3. Set manual IP addresses then reverted back to DHCP (on that VLAN)
4. Cleared ALL DNS entries, cleared caches etc
5. Using a domain that does NOT end in .local
6. Disabled bonjour advertising services (to completely disable the .local issue even though I know I'm not using a .local domain)
7. Altered the mdns_timeout period in the file /System/Library/SystemConfiguration/IPMonitor.bundle/Contents/Info.plist (again i know this is for .local domain issues)
8. Tried Centrify Express just as a binding tool
9. Taken them ALL off the VLAN and back on after finding out it makes no difference.

Just as a side note does anybody know why Apple decided to remove the red dot (the traffic light indicating system thing)? I'm talking about the "Network Accounts are unavailable" red dot (Stays on until they do become available). In Mavericks its just a little pop-up flag that displays the same message but disappears instantly when the user starts to type username and obviously if they haven't waited long enough they get the infamous "shake" of the window.

Would be grateful of any advice/suggestions.

Many thanks

Zahid :)

53 REPLIES 53

nateburt
New Contributor III

Zahid, thanks for more hints on maintaining a healthy domain binding on a Mac. Our most common problem is lack of authentication, even though the binding is still good. Reboot is the simple fix, though it seems to be a time sync issue for us. "ntpdate -q timeservername" (or domainname) to see the time drift between the client and the domain shouldn't matter with less than 5 minutes of drift for the domain account. Authentication seems to be a lot touchier, though. Updating the time "fixes" it for us right away.
-Nate

BCS-Casper
New Contributor

I am not sure if this will resolve your issue, but it may help to add an additional search policy for AD after binding. Open Directory Utility, click Active Directory and then click the Search Policy button. Click the plus sign and add the available directory domain. Move the newly added domain up in the Directory Domains list and click apply. Your computers will find the directory a bit faster and log in authenticating against AD.

Again, not sure if this will resolve your issue, but it may help. Adding the additional search policy doesn't seem to be a feature of the Directory Bindings function in the JSS. Good luck!

Tammi

davidacland
Honored Contributor II
Honored Contributor II

From the list you've done everything I would normally do to try and make it as robust as possible. The only possibilities I can think of are:

- Time server related (as @nateburt mentioned) - Network connection speed (by that I mean the speed the clients connect to the network). I have seen some sites where there is a significant delay from initial boot up to the point where the Mac has an IP and can reach other devices (particularly the DC). - Computer objects being moved in AD. The Macs should be able to cope with this but I'm including it in the list just in case. - Another security setting / policy in AD throwing the Macs out (I'll leave this one open for others to comment on!) - A possible problem with the particular version of Mac OS and your AD domain, you could test 10.9.5 or 10.10.1 to see if the behaviour changes. - Finally, deep freeze could be causing an issue, again as a test you could try thawing the Mac to see how it gets on (unless this causes too many other issues, in which case the below recommendation might suit you).

If re-binding is a task you're finding yourself doing regularly, you could put in a script that gets the computer to read its own object from AD, if it can't, flag that there is an issue (possibly extension attribute and smart group membership, followed by an email to you), then automatically rejoin the domain (again emailing to confirm it has left the "broken binding" smart group). We've implemented a system like this for a few customers, the script will need to carefully crafted to avoid false positives etc but it does help to work around problems like this.

Zed2014
New Contributor II

Thanks Tammi,

Davidacland, I like your suggestion of rejoining the domain automatically. I'd be be grateful if you could elaborate on this. I have created a couple of scripts that just help me when bind/unbinding and i was tempted to just put the bind script into some sort of startup script but I don't wanna be just binding unnecessarily cos as you know this will delay the user even more.

thoule
Valued Contributor II

I'd save rebinding automatically for a last resort - it's better to fix the problem. As you know, there is a computer object in AD which the computer binds to. There is a password between your computer and the AD object for that binding to work which you don't see or know about. And that password is randomly changed every so often. I wonder if when you revert via Deep Freeze, you are reverting to the old password for that AD object which 'breaks' the binding. Try to avoid reverting via deep freeze and see if that fixes it. You can also ask you AD admin about not auto-changing AD passwords, but I suspect that may not be possible.

davidacland
Honored Contributor II
Honored Contributor II

The outline script steps are:

  • Read computer object AD using dscl (I would put a dscacheutil -flush cache at the start to make sure its not just reading a cached value)
  • Use $? (the exit status of the last command) to determine if it was successful (a 0 should mean all is well)
  • If there is an error, send the result up to an extension attribute (text field) using curl and the REST API

The other prep work would be:

  • Create the extension attribute that can hold the value
  • Create a smart group that looks for specific text in that extension attribute
  • Create a policy that is scoped ongoing to members of that smart group (with an inventory update at the end to make sure it doesn't keep running)

You then just need to create a launchdaemon or a policy to trigger the script at a regular interval.

On the API point, you could create a script populated extension attribute and then run a full recon if there is an error. I would probably prefer to use the API as you only want the one field populated but either way would work.

Sorry I don't have an up to date script to hand otherwise I would just post that!

mm2270
Legendary Contributor III

Here's an example of the Extension Attribute we use to determine if a Mac is properly joined to the company domain.
It will report if lookup fails, or (from 10.7 and above) if the AD keychain entry is missing, or if its on the outside of the network when it reported in. Or lastly, if everything is in order.

We pretty much have zero trust in dsconfigad -show by itself. As has been mentioned in other threads, its just reading from the local DOMAIN.plist file that gets generated at the time of binding, but won't get removed unless the Mac is disjoined from AD (normally or forcibly) so just having that file there is not a good indication of whether communication to AD is actually good. We only use it in the script to determine if the Mac thinks its joined and what the Computer Account name should be when doing a lookup later on.
Unfortunately, Casper Suite's built in AD field uses the same information as dsconfigad -show so it can be inaccurate as well. Hence why we made the EA.

#!/bin/bash

OSvers=$( sw_vers -productVersion | cut -d. -f2 )

## NOTE: Change "dc.domain.comp.org" to an internal master domain controller,
## or some other internal device for the ping command

if ping -c 2 -o dc.domain.comp.org; then
    ## Mac is on the network, continue
    if [[ $(dsconfigad -show | awk '/Active Directory Domain/{ print $NF }') == "domain.comp.org" ]]; then
        ADCompName=$(dsconfigad -show | awk '/Computer Account/{ print $NF }')
        ## Mac has correct dsconfigad info, continue
        if [[ "$OSvers" -ge "7"  ]]; then

            ## NOTE: Change the "DOMAIN" in the command below to your domain name

            security find-generic-password -l "/Active Directory/DOMAIN" | grep "Active Directory"
            if [ "$?" == "0" ]; then
                ## AD keychain file exists, continue
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep -i "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            else
                res="No - AD Keychain Not Found"
            fi
        else
            if [[ "$OSvers" == "6" ]]; then
                ## OS is 10.6.x, moving on to AD lookup
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            fi
        fi
    else
        ## NOTE: Change the "domain.comp.org" in the command below to your fqdn
        res="No - Not Joined to domain.comp.org domain"
    fi
else
    ## Mac is not on the network or has no network connection
    echo "Not connected to company network"
    res="Remote"
fi

echo "<result>$res</result>"

You could probably use something like the above and modify it to rejoin to AD in the case of lookups failing or the keychain being missing. Just have a function to re-bind in the script that would get called in those instances. Or, use a separate policy that gets triggered on Macs that fall into a Smart Group that doesn't have a value of "Yes" for example.

As @davidacland mentioned, you want to be careful with this, as you may need to rejoin Macs to their existing records in AD, overwriting them. So just make certain it needs to be done before actually (re)binding, since if it uses the incorrect information it could clobber an existing AD computer record, causing issues for some other Mac.

JustinN
New Contributor III

I wonder how this can be adapted to Jamf School. I can run this on our machines with School, but I don't think we have reporting options.

eddiel0w
New Contributor III

dsconfigad -passinterval '0'

http://support.apple.com/en-us/HT3422

calumhunter
Valued Contributor

i'd put money on what @thoule said about deep freeze and the AD computer object password

Zed2014
New Contributor II

Thanks everybody for your advice.

However, mm2270 could you please tell me on how to compile (it lookslike C program) the script if it indeed needs to compiled into an executable and how to attach to a launch daemon.

Thanks

davidacland
Honored Contributor II
Honored Contributor II

Hi Zed, its a bash script, you add it directly to an extension attribute in the JSS (Settings > Computer Management > Management Framework > Extension Attributes).

Select "Script" as the input type.

Once the Macs run their next inventory update the field will be populated with the results from the script.

The only downside (although I could be wrong), is a broken AD connection wouldn't be reported for up to a week, depending on when the last inventory update ran.

Zed2014
New Contributor II

Sorry davidacland, i'm being really thick here but what is JSS? (JavaScript?)where would i find this? is on the Mac? or a windows DC?

davidacland
Honored Contributor II
Honored Contributor II

It's the JAMF Software Server, ie https://jssaddress.com:8443

I might have missed something here though, do you have Casper?

Zed2014
New Contributor II

No Casper :(

calumhunter
Valued Contributor

@Zed2014 I would ask your AD admin what the computer object password refresh time is. I think the default is 14 days.

Find out what that delay is and perhaps turn off deep freeze for a couple days longer than that delay and see if the machines still become unbound. I'm pretty sure thats what the problem is.

I haven't used deep freeze in many many years so i can't remember how you go about freezing and unfreezing. But perhaps it is possible to schedule a script to run once a week or something that thaws the deep freeze image, unbinds and rebinds to ad, then freezes again. That should help keep the machines bound

Zed2014
New Contributor II

Thanks Calumhunter. I'm an AD admin myself and yes it is set to 14 days. However, I have overode this as mentioned in my original post point number 2 (under "Now the problem")

I've turned Deepfreeze for several days no difference. What is so peculiar is the randomness i.e Mac No3 might drop off and then Mac No 14. Very strange as they were all imaged together at the same time so I'd expect consistent behaviour but it's far from consistent hence diagnosing is very difficult

davidacland
Honored Contributor II
Honored Contributor II

Without Casper you can still use the "self-healing" approach I mentioned, just ignore the extension attribute / API bits. Instead, run the check script locally on the Mac as before on a schedule of your choosing (via a launchdaemon) if an issue is found, run another command / script to rebind.

dward
New Contributor

all,

I have seen this with AD bound Mac systems 10.8.5 and up when using Deep Freeze. Faronics knows of this bug and it is something we have worked with them on resolving. Not the best AD guy but has something to do with back end AD process for updating a certificate. When frozen this update is not possible. If more info is needed I can gather my list of support tickets for additional details.

danshaw
Contributor II

Hey @mm2270 - I realize that this post is a couple years old, but I stumbled on your extension attribute script above and have it working for our situation. Thanks. What I am trying to get a better understanding for is all of the layers you have and if a computer has to pass all of those situations to be able to change their password.

In our company we require that users change their password every 60 days. It leads to a lot of frustration when the Mac's can't change the password due to AD issues. And we have found that AD can be a challenge.

For example if AD lookup failed or the AD keychain not found comes up, does that mean they won't be able to change their password? I am not clear on what exactly is needed for a good connection to AD.

Thanks for the help!

Gonzalez
New Contributor III

It was suggested I share this with the community. I use this to create an extension attribute. Then a just 're-use' ongoing policies to rejoin the domain if with the scope defined by the EA. By default the machine account will update every 14 days but we increased our interval to accommodate vacations in Europe.

#!/bin/bash ADPSWD=$(/usr/bin/security find-generic-password -s "/Active Directory/ADNAME" /Library/Keychains/System.keychain 2> /dev/null | grep mdat | awk '{print substr($2, 2, length($2)-7)}') if [[ $ADPSWD -gt 0 ]]; then d1=$(date -j -f "%a %b %d %T %Z %Y" "date" "%s") d2=$(date -j -f "%Y%m%d%H%M%S" $ADPSWD %s) result=$(( (d1 - d2) / 86400 )) else result="9999" fi echo "<result>$result</result>" exit 0

FastGM3
Contributor

Is it just our site or has anyone else noticed that AD Macs are falling off their domain more frequently? I know the built in password interval of 14 days for AD has always been the default for OSX but we've never seen computers drop AD like we have since El Cap. I don't know if it's something on our AD server that has changed or if OSX is now actually using that 14 day interval.

I know I can fix this by increasing the password interval days or disabling it all together. I'm just alarmed by the amount of computers this year in our environment dropping off AD and was wondering if I'm the only one seeing this?

Chuck

AVmcclint
Honored Contributor

I've been at my current job for 2 years now and I can honestly say that I NEVER once had any problems with Macs joining and staying joined to our Active Directory... until the past week. In the span of a single week, I have suddenly encountered 2 (and possibly a 3rd) Macs that can no longer communicate with Active Directory.

My observations:
1) using the script above as a guide, I was able to verify that the active directory keychain item was missing from the affected computers.
2) these problems only surfaced after our Network and InfoSec teams enabled a proxy. (EDIT: and server patching)
3) The only way to get them back on the network is to force unbind them and then rejoin them. I use our JSS Active Directory utility to automatically join and insert themselves in the proper group membership.

My questions:
1) what would cause the AD keychain item to be deleted? I know for a fact they were there and properly working with AD previously.
2) why would the addition of a proxy server affect this (assuming that's the cause)?

AVmcclint
Honored Contributor

It appears that our sudden problem with the Macs dropping off AD may be related to some server patching that occurred the day before the first reported problem. There's a handful of Windows Server updates that specifically "fix" problems with computer passwords (but ends up breaking for Macs). I still don't know exactly what it changes or what I have to do to fix it. Until I can figure that out, I have come up with a way to not only notify me of an affected computer but also have JSS fix it on its own.
First, there's the EA that @mm2270 shared with us above.
Next, I built a smart group that is based on the "No" results of the EA
Next, I built a simple script based on the command found here.
Next, I created a Policy that is scoped to the smart group I created. I set it for Ongoing so it runs at check-in and also made a custom trigger so I can run it at-will. (I might make a Self Service item for it too.) In SCRIPTS, I put the script I built with the setting to run BEFORE. Then I included our standard DIRECTORY BINDING. MAINTENANCE has Update Inventory (to remove the computer from the smart group).

So far the tests are successful and I think this will save me the hassle of trying to fix it on each affected computer before. ... at least until I can fix the underlying cause.

rafemoody
New Contributor

@AVmcclint Please keep us posted on your search. I am finding many of our machines loosing the AD Keychain item but I haven't been able to track it back to a root cause. I will look into server updates. Thank you for that bit of information.

AVmcclint
Honored Contributor

The automated detection and repair is working flawlessly so far. At least I've got that going for me. It has detected and fixed about a dozen machines now. I'm working on getting the server team to undo the patches that relate to computer accounts, but I don't know how long that's going to take.

I should clarify that I built the smart group based on the "No - AD Keychain Not Found" results. This weeds out the AD lookup failures because that is sluggish on our WiFi even on a good day. I find Macs that report AD lookup failures usually report that at startup and will be normal within 15 minutes after login is completed so there is no need to repair those.

jdizzle222
New Contributor

Can you be more specific about the "computer accounts patches"? Which ones are causing this issue?

AVmcclint
Honored Contributor

Our Windows Server folks pointed to this Security Update: https://support.microsoft.com/en-us/kb/3178465 As the most likely culprit. I am still trying to navigate the political minefield of getting them to undo those patches just so I can confirm that those are the actual cause or not.

alemai
New Contributor

@AVmcclint I am having the exact same issue you are having did removing the security update fix your issues?

AVmcclint
Honored Contributor

I was never able to get our Windows Server admins to undo the security update. "Macs don't matter." I'm stuck using my automatic detection and repair option. So far it seems to work OK, but I would rather it not happen in the first place.

jhuls
Contributor III

@AVmcclint Are you experiencing this where the same systems repeatedly unbind from your AD or is that since the security patch that certain Macs find themselves no longer bound the one time and your auto detection fix addresses it?

AVmcclint
Honored Contributor

Before the server patches, my Macs NEVER forgot their AD bindings - not even once. Since the server patches last month, it has happened about 40 times. Some of the Macs have had it happen repeatedly (one Mac had it happen 6 times) while other Macs have never had it happen even once. The autodetection/fix is something I put into place to help me stay on top of it as it happens instead of users telling me there's a problem with their Macs and then I have to manually fix it. I keep looking for other explanations for this happening - Ethernet, WiFi, VPN, uptime, apps running - and there is nothing consistent among the incidents.
As long as our server guys refuse to roll back those server patches, this is my only option.

gachowski
Valued Contributor II

@AVmcclint

Or it could be another opportunity to ditch AD.

C

pueo
Contributor II

@AVmcclint

Hello. This issue is still on going for us too!. We are a new customer to JamF Pro (Dec 2016). As it takes quite sometime to discover everything Jamf Pro can do, we are just getting around to this issue now as it started to really affect our environemnt.
OS X devices seem to be dropping off more and more. I have not tried anything else other than your workflow you described above. It works and we are happy with the result.
Hopefully a real fix will appear one day.
Thanks everyone for their input.

Cheers
a.

AVmcclint
Honored Contributor

After a LOT of troubleshooting and observation, I noticed something odd that is a factor in this (for us). For the computer this happened to the most frequently - at least once every other week for several weeks - I talked with the user about something unrelated and I found out that he always turned his WiFi before shutting his MBP off at the end of the day. I asked him why he did that and he said "it's just a habit of mine." I asked him to leave the WiFi enabled from now on to see what happens. Lo and behold, the problem went away! His computer went 2 full months without dropping off AD. Now it did eventually happen again but it definitely wasn't as frequent as before. Another thing I did after that was to add dsconfigad -passinterval 0 to my workflow and that also seems to help reduce repeat occurrences.

Why would disabling WiFi have any effect on the AD binding? We are using 802.1x with AD Cert authentication for all network interfaces and the AD certs are still valid and not expired. The problem still exists but at least now it's so infrequent that it doesn't keep me up at nights pondering the situation.

pueo
Contributor II

Hello

Does anyone have any further updates to this issue?
The AD drop off is occurring on most of our Macs at some point. Some more than others. So i thought I would try the 'dsconfigad -passinterval 0' on our Mac Lab to test. Didn't do much to resolve the issue. The Mac's are still dropping off the domain. Its never all of them only a handful. I must admit this issue has increased over the past 10 months. Our network time is OK, the Wifi is inactive on the Mac Lab Mac Pro's. Maybe the AD drop off issue was occurring as much as it did prior to us implementing Jamf Pro, but with out Jamf Pro and the awesome script from @mm2270 one would never know other than users reporting they cannot see Domain printers or file shares.

almonte32
New Contributor III

After experiencing this many many times for many months, and testing and troubleshooting with Apple Engineers and no one seemed to be able to explain why, even after directory utility > directory editor > telling us the error -2100, we couldn't figure it out, but then I kept at it, and in my case it turned out to be the DNS server aging/scavenging settings. The defaults dont work for everyone, and it was because the computers that had the issue were not updating their DNS record on the server, so after trial and error I set my scavenge settings to be:
No-Refresh Interval: 6 hours
Refresh-Interval: 15 days

nielandj
New Contributor III

almonte32, could you share the full process you used to fix this? We had two labs drop AD twice in the last two weeks (The other two are fine, go figure...). It'd be nice to be able to resolve this. Was this setting changed on the domain server or on the machines themselves?

And yes, this is way more frequent for us in Sierra than it was in ElCap.

pueo
Contributor II

Hello

Ok, so I am going to write out everything i have done for a set of 30 Laptops (20 Airs and 10 2010 MBPs). These devices are all running Sierra 10.12.6. This has NOT resolved the issue but so far it has slowed it down considerably.
All of our Payloads are managed by Jamf Pro.

Learning Common Macs setup
- 10.12.6
- Deepfreeze
- Machine Authentication - 802.1x
- Wifi Payload - Auto Join SSID, WPA2 Enterprise, PEAP, Use as Directory Authentication, Use as a Login Windows Configuration
- All other options are in one Payload.
- Radius Cert
- Root Cert
- Alertus
- Sophos

Steps Taken to resolve the AD Drop Off Issue
I did this to each of the 30 Laptops. Took many days especially the machines with no SSD's.
- AD: Disjoin and Rejoin, 'dsconfigad -passinterval 0', Mobile Accounts - ON, Set Preferred domain server, unchecked Allow Authentication from any domain in the forest
- Ran any updates
- Turned off Auto Check for updates
- Removed any other Network Connections and set the Wifi to be primary adaptor.
- Added New WiFi Payload - Auto Join SSID, WPA2 Enterprise, PEAP, Use as Directory Authentication, Use as a Login Windows Configuration with 2 Certs (Radius and Root-CA cert) (via JSS)
- Other Payloads - Restrictions, Login window, Parental controls.
- Opened Keychain and changed 'When using this certificate' from 'Default' to 'Always Trust'

This issue is also occurring on our Ethernet Connected Labs (all running 10.12.6) the issue is like a wildfire. Over the weekend pretty much all of our Mac Pro's fell of the Domain. Staff/Faculty devices are also affected but with Mobile accounts and the script running via a Smart Group we seldom see these machines in the office. Only when they need to add a Network Printer as the Printer list does not show up.

Cheers,
A.