Apple Macs losing AD binding

Zed2014
New Contributor II

Hi,

I will buy dinner for anyone that can help me solve this problem:

  1. 18 x Apple iMacs bound to our AD network using the built-in AD plugin
  2. ALL are on a VLAN
  3. DeepFrozen
  4. Were imaged using DeployStudio
  5. Currently on OS Mavericks (OS 10.9.4)
  6. Using AD only no OD servers

Now the problem:

Every now and again these Macs drop off the network randomly and the only solution is to unbind and rebind.

Things i've tried:
1. Using a preferred server for domain and set that same server for Time synchronisation
2. Set pass interval to 0 (this is recommended if you don't want it to check the machine password every 14 days with AD)
3. Set manual IP addresses then reverted back to DHCP (on that VLAN)
4. Cleared ALL DNS entries, cleared caches etc
5. Using a domain that does NOT end in .local
6. Disabled bonjour advertising services (to completely disable the .local issue even though I know I'm not using a .local domain)
7. Altered the mdns_timeout period in the file /System/Library/SystemConfiguration/IPMonitor.bundle/Contents/Info.plist (again i know this is for .local domain issues)
8. Tried Centrify Express just as a binding tool
9. Taken them ALL off the VLAN and back on after finding out it makes no difference.

Just as a side note does anybody know why Apple decided to remove the red dot (the traffic light indicating system thing)? I'm talking about the "Network Accounts are unavailable" red dot (Stays on until they do become available). In Mavericks its just a little pop-up flag that displays the same message but disappears instantly when the user starts to type username and obviously if they haven't waited long enough they get the infamous "shake" of the window.

Would be grateful of any advice/suggestions.

Many thanks

Zahid πŸ™‚

51 REPLIES 51

almonte32
New Contributor III

@nielandj

Hi, sorry I dont check this account's posts very often.

Before I continue, I just want to tell you that for all those that dont believe this has to do with DNS specifically, you can contact a level 2 apple engineer and if he has any networking experience, he can confirm this to be the case.
The most probable cause (if computer was already bound and working fine) for Network Accounts unavailable has to do with DNS between the client computer and DNS server, in most cases the Active Directory server which is running the DNS as well.

Ok, so if you right click the DNS server and you change those settings to what I wrote, I can almost guarantee you will resolve your issue.

Which part of the process you dont understand so I can be more specific? (I edited post to include image of the scavenging settings I have).

The reason DNS breaks communication with client/host which in turn breaks AD communication is because the scavenging settings are sometimes too soon and remove DNS Host(A) records if the client and server dont have that constant communication which in turn break the communication between the client the server.

-ea

75dd7acf02044991bfc7adbfbde6bd8e

nielandj
New Contributor III

Thanks, @almonte32 ! Unfortunately, I have no control over our DNS server as I am at a large university (and that's handled by other techs I have no real contact with). I'll see if I can contact those folks and ask if it's possible to investigate making the changes (we have thousands of machines, so asking to make a change for a small section of machines that are having issues just may not happen).

It's funny, I actually was working on this problem again as I had a large group drop off the last couple of days after a month of playing nice. This thread was the top result, and I had forgotten I had commented here.

lmeinecke
New Contributor III

I was able to get @mm2270 script to work in our environment by adding a -F to the grep flag in line 22. For some reason the special character of $ on our hostnames (though I thought that was a macOS appended default as we do not have this in the ADUC name) was not returning a result from the grep with just -i. This was causing a machine joined to the domain to report "No - AD Lookup Failed" from line 27 when it was indeed joined. Sharing this in case anyone else is seeing similar issues.

I'm just now starting to apply this to our environment to try and understand the scope of our hosts that have lost connection to the domain. We have about 60 hosts bound and are finding issues here and there with AD password changes from intranet site being out of sync with macOS. When the AD binding breaks they never get the update your keychain prompt and are effectively logging in to mobile profiles while on the network.

mconners
Valued Contributor

Can anyone else who attempted the "fix" @almonte32 posted above verify this has helped you with your AD connection failures on the Macs? I would like to share with our AD/DNS folks who are actually looking into the DNS for another problem. I figured if they are poking around, now might be a good time to ask them to look at this too.

mconners
Valued Contributor

Hello @lmeinecke I am also working on modifying the script for our environment. We are running into a small issue with finding no keychain for the AD. Did you find a workaround for this considering the script is quite old? I was going to comment out the keychain echo piece but without being much of a scripter, I didn't know if this would throw out false positives.

lmeinecke
New Contributor III

I did not have to modify the security find-generic-password line. Make sure you're specifying your domain name in all CAPS as I found that was the case with mine and mentioned in other threads. It wasn't FQDN either. If you're core.test.com it's just CORE.

I was failing on that line last Friday and couldn't get this to work for anything but then I slept on it and came back. Sure enough it was working. It could of been VPN I don't know. I do know the script works over VPN as I'm showing bound to AD when the connection is active. If I run the script manually after the tunnel comes up it does take 10-15 sec before the AD state results transitions from remote to yes.

I've got this deployed in our environment now and I'm waiting on recon to run on all hosts to get data updates. I've already got smart groups starting to update showing healthy AD accounts that can lookup their own hostname and some that are showing remote because they are offsite without VPN nailed up.

aamjohns
Contributor II

Hi,
This document discusses what I was suspecting from a read of the topic. The computer account password is not getting updated. Computers running Deep Freeze lose connection or fall off domain

I'm not saying the document has a solution that will work for you, but it does seem like they are describing the issue. An auto-rebind script seems like a good plan.

NightFlight
New Contributor III

Not tested, but you could include a boot launchd that spits out all DNS and related traffic using tcpdump. Then just eview logs during your outage.

Example:

tcpdump port 53 or port 88 or port 389 -i any

We hardcode /etc/krb5.conf - This allows us to get tickets even though the machine binding is broken. Lock ticket viewer to the dock and train the users to use it, etc.

Example /etc/krb5.conf

[libdefaults]
                default_realm = DOMAINX.COM
                dns_lookup_kdc = false
                dns_lookup_realm = true

[realms]
                DOMAINX.COM = {
                                kdc =  kdc.domainx.com
                                admin_server = kdc.domainx.com
                }
                DOMAINY.COM = {
                                kdc = kdc_a.domainy.com
                                kdc = kdc_b.domainycom
                                admin_server = kdc_a.domainy.com
                }

[domain_realm]
                .domainx.com = DOMAINX.COM
                domainx.com = DOMAINX.COM
                .domainy.com = DOMAINY.COM
                domainy.com = DOMAINY.COM

Programmatically generating /etc/krb5.conf is pretty straightforward using the output from the host command.

Eg

host -t SRV _ldap._tcp.domainx.com

This issue can be a downright plague. We have over 100 DC in our global environment and a lot of them are unreachable. macOS doesn't handle that too well at all thus the above turning off some DNS lookups.

Hope this helps.

AVmcclint
Valued Contributor III

I just rediscovered this thread and it made me think.... I have had maybe 2 Macs randomly lose their AD binding in the past 2 years. I've done nothing toward fixing this other than the detection via an EA (that sends me an email) and the automated script that fixes it as it happens. The only things I have done are upgrades to High Sierra and Mojave. However with the initial manifestation being triggered by a Windows Server patch, I might have to think that something on the Windows Server side fixed it for me too. It just happened so long ago, and I had an automated system in place that didn't require my constant attention, I basically forgot about this issue. It still bothers me that no one can put their finger on the definitive cause. There's no telling if this kind of thing could happen again.

AdamCraig
Contributor III

@AVmcclint would be able to share your rebind script? I am trying to build an automated process and the EA that I have detecting if the bind has gone bad seems to be accurate, but my rebind script is failing about 2/3 of the time with error dsconfigad: Permission error

AVmcclint
Valued Contributor III

EA for determining AD bound status

#!/bin/bash

OSvers=$( sw_vers -productVersion | cut -d. -f2 )

## NOTE: Change "dc.domain.comp.org" to an internal master domain controller,
## or some other internal device for the ping command

if ping -c 2 -o dc.domain.comp.org; then
    ## Mac is on the network, continue
    if [[ $(dsconfigad -show | awk '/Active Directory Domain/{ print $NF }') == "company.com" ]]; then
        ADCompName=$(dsconfigad -show | awk '/Computer Account/{ print $NF }')
        ## Mac has correct dsconfigad info, continue
        if [[ "$OSvers" -ge "7"  ]]; then

            ## NOTE: Change the "DOMAIN" in the command below to your domain name

            security find-generic-password -l "/Active Directory/DOMAIN" | grep "Active Directory"
            if [ "$?" == "0" ]; then
                ## AD keychain file exists, continue
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep -i "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            else
                res="No - AD Keychain Not Found"
            fi
        else
            if [[ "$OSvers" == "6" ]]; then
                ## OS is 10.6.x, moving on to AD lookup
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            fi
        fi
    else
        ## NOTE: Change the "domain.comp.org" in the command below to your fqdn
        res="No - Not Joined to domain.comp.org domain"
    fi
else
    ## Mac is not on the network or has no network connection
    echo "Not connected to company network"
    res="Remote"
fi

echo "<result>$res</result>"

Force Unbind script

#!/bin/sh

# Use this to force a computer to unbind from Active Directory.
# Then use JSS to re-join to the domain
# https://derflounder.wordpress.com/2013/10/09/force-unbinding-with-dsconfigad-without-using-an-active-directory-admin-account/

dsconfigad -force -remove -u johndoe -p nopasswordhere

sleep 6

Build a Policy that runs at recurring check in and is scoped to a smart group with the following criteria:
19d484475a5a4d72960e38f986a42587

9e88a75ca4dc49d7a33e0b88768cdef0
The Script is the one above.
Directory Binding is whatever you have configured for Directory Binding.
Maintenance is to run Inventory -Ignore Files and Processes in this screenshot. I run an extra command for our internal purposes that has no bearing on the fix.

AdamCraig
Contributor III

Thanks so much! It turns out my error was due to incorrect permissions on the service account we were using to bind as well as our computers being spread across different OU's in AD.