Skip to main content

Hi,

I will buy dinner for anyone that can help me solve this problem:

  1. 18 x Apple iMacs bound to our AD network using the built-in AD plugin
  2. ALL are on a VLAN
  3. DeepFrozen
  4. Were imaged using DeployStudio
  5. Currently on OS Mavericks (OS 10.9.4)
  6. Using AD only no OD servers

Now the problem:

Every now and again these Macs drop off the network randomly and the only solution is to unbind and rebind.

Things i've tried:
1. Using a preferred server for domain and set that same server for Time synchronisation
2. Set pass interval to 0 (this is recommended if you don't want it to check the machine password every 14 days with AD)
3. Set manual IP addresses then reverted back to DHCP (on that VLAN)
4. Cleared ALL DNS entries, cleared caches etc
5. Using a domain that does NOT end in .local
6. Disabled bonjour advertising services (to completely disable the .local issue even though I know I'm not using a .local domain)
7. Altered the mdns_timeout period in the file /System/Library/SystemConfiguration/IPMonitor.bundle/Contents/Info.plist (again i know this is for .local domain issues)
8. Tried Centrify Express just as a binding tool
9. Taken them ALL off the VLAN and back on after finding out it makes no difference.

Just as a side note does anybody know why Apple decided to remove the red dot (the traffic light indicating system thing)? I'm talking about the "Network Accounts are unavailable" red dot (Stays on until they do become available). In Mavericks its just a little pop-up flag that displays the same message but disappears instantly when the user starts to type username and obviously if they haven't waited long enough they get the infamous "shake" of the window.

Would be grateful of any advice/suggestions.

Many thanks

Zahid :)

Can you be more specific about the "computer accounts patches"? Which ones are causing this issue?


Our Windows Server folks pointed to this Security Update: https://support.microsoft.com/en-us/kb/3178465 As the most likely culprit. I am still trying to navigate the political minefield of getting them to undo those patches just so I can confirm that those are the actual cause or not.


@AVmcclint I am having the exact same issue you are having did removing the security update fix your issues?


I was never able to get our Windows Server admins to undo the security update. "Macs don't matter." I'm stuck using my automatic detection and repair option. So far it seems to work OK, but I would rather it not happen in the first place.


@AVmcclint Are you experiencing this where the same systems repeatedly unbind from your AD or is that since the security patch that certain Macs find themselves no longer bound the one time and your auto detection fix addresses it?


Before the server patches, my Macs NEVER forgot their AD bindings - not even once. Since the server patches last month, it has happened about 40 times. Some of the Macs have had it happen repeatedly (one Mac had it happen 6 times) while other Macs have never had it happen even once. The autodetection/fix is something I put into place to help me stay on top of it as it happens instead of users telling me there's a problem with their Macs and then I have to manually fix it. I keep looking for other explanations for this happening - Ethernet, WiFi, VPN, uptime, apps running - and there is nothing consistent among the incidents.
As long as our server guys refuse to roll back those server patches, this is my only option.


@AVmcclint

Or it could be another opportunity to ditch AD.

C


@AVmcclint

Hello. This issue is still on going for us too!. We are a new customer to JamF Pro (Dec 2016). As it takes quite sometime to discover everything Jamf Pro can do, we are just getting around to this issue now as it started to really affect our environemnt.
OS X devices seem to be dropping off more and more. I have not tried anything else other than your workflow you described above. It works and we are happy with the result.
Hopefully a real fix will appear one day.
Thanks everyone for their input.

Cheers
a.


After a LOT of troubleshooting and observation, I noticed something odd that is a factor in this (for us). For the computer this happened to the most frequently - at least once every other week for several weeks - I talked with the user about something unrelated and I found out that he always turned his WiFi before shutting his MBP off at the end of the day. I asked him why he did that and he said "it's just a habit of mine." I asked him to leave the WiFi enabled from now on to see what happens. Lo and behold, the problem went away! His computer went 2 full months without dropping off AD. Now it did eventually happen again but it definitely wasn't as frequent as before. Another thing I did after that was to add dsconfigad -passinterval 0 to my workflow and that also seems to help reduce repeat occurrences.

Why would disabling WiFi have any effect on the AD binding? We are using 802.1x with AD Cert authentication for all network interfaces and the AD certs are still valid and not expired. The problem still exists but at least now it's so infrequent that it doesn't keep me up at nights pondering the situation.


Hello

Does anyone have any further updates to this issue?
The AD drop off is occurring on most of our Macs at some point. Some more than others. So i thought I would try the 'dsconfigad -passinterval 0' on our Mac Lab to test. Didn't do much to resolve the issue. The Mac's are still dropping off the domain. Its never all of them only a handful. I must admit this issue has increased over the past 10 months. Our network time is OK, the Wifi is inactive on the Mac Lab Mac Pro's. Maybe the AD drop off issue was occurring as much as it did prior to us implementing Jamf Pro, but with out Jamf Pro and the awesome script from @mm2270 one would never know other than users reporting they cannot see Domain printers or file shares.


After experiencing this many many times for many months, and testing and troubleshooting with Apple Engineers and no one seemed to be able to explain why, even after directory utility > directory editor > telling us the error -2100, we couldn't figure it out, but then I kept at it, and in my case it turned out to be the DNS server aging/scavenging settings. The defaults dont work for everyone, and it was because the computers that had the issue were not updating their DNS record on the server, so after trial and error I set my scavenge settings to be:
No-Refresh Interval: 6 hours
Refresh-Interval: 15 days


almonte32, could you share the full process you used to fix this? We had two labs drop AD twice in the last two weeks (The other two are fine, go figure...). It'd be nice to be able to resolve this. Was this setting changed on the domain server or on the machines themselves?

And yes, this is way more frequent for us in Sierra than it was in ElCap.


Hello

Ok, so I am going to write out everything i have done for a set of 30 Laptops (20 Airs and 10 2010 MBPs). These devices are all running Sierra 10.12.6. This has NOT resolved the issue but so far it has slowed it down considerably.
All of our Payloads are managed by Jamf Pro.

Learning Common Macs setup
- 10.12.6
- Deepfreeze
- Machine Authentication - 802.1x
- Wifi Payload - Auto Join SSID, WPA2 Enterprise, PEAP, Use as Directory Authentication, Use as a Login Windows Configuration
- All other options are in one Payload.
- Radius Cert
- Root Cert
- Alertus
- Sophos

Steps Taken to resolve the AD Drop Off Issue
I did this to each of the 30 Laptops. Took many days especially the machines with no SSD's.
- AD: Disjoin and Rejoin, 'dsconfigad -passinterval 0', Mobile Accounts - ON, Set Preferred domain server, unchecked Allow Authentication from any domain in the forest
- Ran any updates
- Turned off Auto Check for updates
- Removed any other Network Connections and set the Wifi to be primary adaptor.
- Added New WiFi Payload - Auto Join SSID, WPA2 Enterprise, PEAP, Use as Directory Authentication, Use as a Login Windows Configuration with 2 Certs (Radius and Root-CA cert) (via JSS)
- Other Payloads - Restrictions, Login window, Parental controls.
- Opened Keychain and changed 'When using this certificate' from 'Default' to 'Always Trust'

This issue is also occurring on our Ethernet Connected Labs (all running 10.12.6) the issue is like a wildfire. Over the weekend pretty much all of our Mac Pro's fell of the Domain. Staff/Faculty devices are also affected but with Mobile accounts and the script running via a Smart Group we seldom see these machines in the office. Only when they need to add a Network Printer as the Printer list does not show up.

Cheers,
A.


"Network accounts are unavailable". We have this issue often, too. It's not as much now because we added the AD bind to a configuration profile rather than policy, so the profile is "installed" rather than just deployed. The issue I keep seeing, however, is that after I've imaged the MacBook's and confirmed it'll log in to an AD account, over time if the Mac hasn't been used, it's like it'll never see AD again once turned back on. I have to log in to our HAdmin account and unbind/rebind to get it to finally "sync" and see our AD server and remove that god awful red dot on the login screen.


@nielandj

Hi, sorry I dont check this account's posts very often.

Before I continue, I just want to tell you that for all those that dont believe this has to do with DNS specifically, you can contact a level 2 apple engineer and if he has any networking experience, he can confirm this to be the case.
The most probable cause (if computer was already bound and working fine) for Network Accounts unavailable has to do with DNS between the client computer and DNS server, in most cases the Active Directory server which is running the DNS as well.

Ok, so if you right click the DNS server and you change those settings to what I wrote, I can almost guarantee you will resolve your issue.

Which part of the process you dont understand so I can be more specific? (I edited post to include image of the scavenging settings I have).

The reason DNS breaks communication with client/host which in turn breaks AD communication is because the scavenging settings are sometimes too soon and remove DNS Host(A) records if the client and server dont have that constant communication which in turn break the communication between the client the server.

-ea


Thanks, @almonte32 ! Unfortunately, I have no control over our DNS server as I am at a large university (and that's handled by other techs I have no real contact with). I'll see if I can contact those folks and ask if it's possible to investigate making the changes (we have thousands of machines, so asking to make a change for a small section of machines that are having issues just may not happen).

It's funny, I actually was working on this problem again as I had a large group drop off the last couple of days after a month of playing nice. This thread was the top result, and I had forgotten I had commented here.


I was able to get @mm2270 script to work in our environment by adding a -F to the grep flag in line 22. For some reason the special character of $ on our hostnames (though I thought that was a macOS appended default as we do not have this in the ADUC name) was not returning a result from the grep with just -i. This was causing a machine joined to the domain to report "No - AD Lookup Failed" from line 27 when it was indeed joined. Sharing this in case anyone else is seeing similar issues.

I'm just now starting to apply this to our environment to try and understand the scope of our hosts that have lost connection to the domain. We have about 60 hosts bound and are finding issues here and there with AD password changes from intranet site being out of sync with macOS. When the AD binding breaks they never get the update your keychain prompt and are effectively logging in to mobile profiles while on the network.


Can anyone else who attempted the "fix" @almonte32 posted above verify this has helped you with your AD connection failures on the Macs? I would like to share with our AD/DNS folks who are actually looking into the DNS for another problem. I figured if they are poking around, now might be a good time to ask them to look at this too.


Hello @lmeinecke I am also working on modifying the script for our environment. We are running into a small issue with finding no keychain for the AD. Did you find a workaround for this considering the script is quite old? I was going to comment out the keychain echo piece but without being much of a scripter, I didn't know if this would throw out false positives.


I did not have to modify the security find-generic-password line. Make sure you're specifying your domain name in all CAPS as I found that was the case with mine and mentioned in other threads. It wasn't FQDN either. If you're core.test.com it's just CORE.

I was failing on that line last Friday and couldn't get this to work for anything but then I slept on it and came back. Sure enough it was working. It could of been VPN I don't know. I do know the script works over VPN as I'm showing bound to AD when the connection is active. If I run the script manually after the tunnel comes up it does take 10-15 sec before the AD state results transitions from remote to yes.

I've got this deployed in our environment now and I'm waiting on recon to run on all hosts to get data updates. I've already got smart groups starting to update showing healthy AD accounts that can lookup their own hostname and some that are showing remote because they are offsite without VPN nailed up.


Hi,
This document discusses what I was suspecting from a read of the topic. The computer account password is not getting updated. Computers running Deep Freeze lose connection or fall off domain

I'm not saying the document has a solution that will work for you, but it does seem like they are describing the issue. An auto-rebind script seems like a good plan.


Not tested, but you could include a boot launchd that spits out all DNS and related traffic using tcpdump. Then just eview logs during your outage.

Example:

tcpdump port 53 or port 88 or port 389 -i any

We hardcode /etc/krb5.conf - This allows us to get tickets even though the machine binding is broken. Lock ticket viewer to the dock and train the users to use it, etc.

Example /etc/krb5.conf

[libdefaults]
                default_realm = DOMAINX.COM
                dns_lookup_kdc = false
                dns_lookup_realm = true

[realms]
                DOMAINX.COM = {
                                kdc =  kdc.domainx.com
                                admin_server = kdc.domainx.com
                }
                DOMAINY.COM = {
                                kdc = kdc_a.domainy.com
                                kdc = kdc_b.domainycom
                                admin_server = kdc_a.domainy.com
                }

[domain_realm]
                .domainx.com = DOMAINX.COM
                domainx.com = DOMAINX.COM
                .domainy.com = DOMAINY.COM
                domainy.com = DOMAINY.COM

Programmatically generating /etc/krb5.conf is pretty straightforward using the output from the host command.

Eg

host -t SRV _ldap._tcp.domainx.com

This issue can be a downright plague. We have over 100 DC in our global environment and a lot of them are unreachable. macOS doesn't handle that too well at all thus the above turning off some DNS lookups.

Hope this helps.


I just rediscovered this thread and it made me think.... I have had maybe 2 Macs randomly lose their AD binding in the past 2 years. I've done nothing toward fixing this other than the detection via an EA (that sends me an email) and the automated script that fixes it as it happens. The only things I have done are upgrades to High Sierra and Mojave. However with the initial manifestation being triggered by a Windows Server patch, I might have to think that something on the Windows Server side fixed it for me too. It just happened so long ago, and I had an automated system in place that didn't require my constant attention, I basically forgot about this issue. It still bothers me that no one can put their finger on the definitive cause. There's no telling if this kind of thing could happen again.


@AVmcclint would be able to share your rebind script? I am trying to build an automated process and the EA that I have detecting if the bind has gone bad seems to be accurate, but my rebind script is failing about 2/3 of the time with error dsconfigad: Permission error


EA for determining AD bound status

#!/bin/bash

OSvers=$( sw_vers -productVersion | cut -d. -f2 )

## NOTE: Change "dc.domain.comp.org" to an internal master domain controller,
## or some other internal device for the ping command

if ping -c 2 -o dc.domain.comp.org; then
    ## Mac is on the network, continue
    if [[ $(dsconfigad -show | awk '/Active Directory Domain/{ print $NF }') == "company.com" ]]; then
        ADCompName=$(dsconfigad -show | awk '/Computer Account/{ print $NF }')
        ## Mac has correct dsconfigad info, continue
        if [[ "$OSvers" -ge "7"  ]]; then

            ## NOTE: Change the "DOMAIN" in the command below to your domain name

            security find-generic-password -l "/Active Directory/DOMAIN" | grep "Active Directory"
            if [ "$?" == "0" ]; then
                ## AD keychain file exists, continue
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep -i "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            else
                res="No - AD Keychain Not Found"
            fi
        else
            if [[ "$OSvers" == "6" ]]; then
                ## OS is 10.6.x, moving on to AD lookup
                ## NOTE: Change the "DOMAIN" in the command below to your domain name

                dscl "/Active Directory/DOMAIN/All Domains" read /Computers/"$ADCompName" | grep "$ADCompName"
                if [ "$?" == "0" ]; then
                    ## Successful lookup of computer record. AD communication is working
                    res="Yes"
                else
                    res="No - AD Lookup Failed"
                fi
            fi
        fi
    else
        ## NOTE: Change the "domain.comp.org" in the command below to your fqdn
        res="No - Not Joined to domain.comp.org domain"
    fi
else
    ## Mac is not on the network or has no network connection
    echo "Not connected to company network"
    res="Remote"
fi

echo "<result>$res</result>"

Force Unbind script

#!/bin/sh

# Use this to force a computer to unbind from Active Directory.
# Then use JSS to re-join to the domain
# https://derflounder.wordpress.com/2013/10/09/force-unbinding-with-dsconfigad-without-using-an-active-directory-admin-account/

dsconfigad -force -remove -u johndoe -p nopasswordhere

sleep 6

Build a Policy that runs at recurring check in and is scoped to a smart group with the following criteria:


The Script is the one above.
Directory Binding is whatever you have configured for Directory Binding.
Maintenance is to run Inventory -Ignore Files and Processes in this screenshot. I run an extra command for our internal purposes that has no bearing on the fix.