Mavericks + Active Directory - Domain Join Issues

Lewandowski
New Contributor III

Good afternoon all,

We're prepping for install of 3 iMac labs and we're having a problem getting any computers running Mavericks to domain join. We've hit a wall in trouble shooting this issue. Has anyone had a similar experience? Can anyone confirm their ability to newly domain join a machine running 10.9 to AD (10.8 computers domain join as expected)? Lastly, if anyone has experience troubleshooting this type of issue, a fresh set of eyes would be amazing.

Thank you so much!
Nikki Lewandowski
Canisius High School Technology Department
lewandowski@canisiushigh.org

The error that we're receiving is:
"Unable to add server.
Node name wasn't found.
(2000)"

Our Opendirectoryd.log advanced logging shows a Kerberos auth issue where it bounces from one DC to another and then times out (have screen shots of this log available if needed).

Various potential causes that we've looked into:

+DNS
Resolves addresses forward and backwards, internal and external. Dig shows correct servers and resolutions.
Attempted from different subnets, with manually added host/ptr records - same errors.
Attempted to bind via separate DCs, using FQDNs and IPs - same errors.

+NTP
All DCs pointing to same authoritative host, which is pointed to external pool. Client Mac time set manually within 60sec of time server - same errors.
Client Mac pointed directly to time server (ntpd & System Pref)- same errors.
Client Mac pointed directly to external pool - same errors.
Time drift well within 300sec (Kerberos limit) on all occasions. No known issues on any domain systems.
Reset SMC&PRAM multiple times for good measure (someone claimed this helped every unit on their network)

+DHCP
Attempted with DHCP reservations, and temporary leases, on subnets that are known to work for PCs and 10.8 Mac bindings - same errors.
IPv6 turned off per-interface via CLI - common issue - same errors

+CAs
Some expired certificates have been reissued, but old ones not deleted from DC. Possible breadcrumb?

+ADUC
Some weirdness on how OSX/Casper handles LocalHostName, HostName, ComputerName and NetBIOS after imaging. Noted that names not always persistent.
Manually modified all ‘names’ to match existing ADUC entries - same errors.
Deleted existing ADUC computer objects - same errors.
Changed names to ‘computernameA’ with and without existing ADUC objects - same errors.
All names <15 characters - same errors.
Included explicitly correct container path - same errors
Double checked object OUs and tried different containers - same errors.

+Authentication
Domain Admin accounts unsuccessful on 10.9 machines; successful on other machines.

Our conclusion thus far has been that there is something preventing Kerberos authentication during the bind process (both via GUI and terminal). Commonly it seems to be a time-related or IPv6-related issue but I think I have ruled that out in multiple test cases. Possible SSL is cracked in some way?

23 REPLIES 23

Janowski
New Contributor II

three out of four times we've run into this, it's been time related. The last example we made a very similar list - turned out the person had managed to get their DATE set to a whole day in the future.... so it took us a bit to notice since the time and everything else looked spot on.

pblake
Contributor III

I agree. I would add a script to update network time settings before binding.

Lewandowski
New Contributor III

Thanks for the thoughts folks, but as noted in the troubleshooting steps we've done extensive testing with our NTP. We've also manually set a computer to the exact time as the authoritative NTP and received the same error.

What's the other causes you've run into? Please comment regarding ability to bind in Mavericks installs as that's the only place we've seen troubles.

JPDyson
Valued Contributor

We've bound a few hundred Macs running Mavericks to AD, so it's something in the environment or process I'm sure (there was a change between the OS versions that exposed it, that's all).

Do you still have the problem if you create the computer object before you do the bind?

stevewood
Honored Contributor II
Honored Contributor II

I've seen the exact same error, and like @Lewandowski I verified time/date/DNS everything. I started seeing the problem after 10.9.3, I believe.

I've had some luck getting around the error by restarting the DC and restarting the computer. We've also done some work in AD with the replication settings in AD Sites & Services, making sure all of the proper subnets are entered in.

scottb
Honored Contributor

If you lookup the Macs in Active Roles, are any "disabled" or in an "Archive OU"? We have seen problems with time offsets and the two examples above. When a Mac doesn't bind, it's almost always one of these three things for us.

alanmcseveney
New Contributor

We are in the middle of upgrading our machines to Mavericks now, and none have failed to bind to AD that i've seen. We are only 165 machines deep so far, but none have failed to bind. We have about 15 domain controllers spread across seven sites, in a single AD domain. 10.9.2, 10.9.3 and 10.9.4 have all bound to our domain reliably.

Do any 10.9 machines bind, or none at all?
Do your client machines have forward and reverse dns at binding time (not saying this should be necessary, but all our machines do, and they have joined where yours haven't);
Certficates could be an issue.
Do you get the same result with a clean OS X with nothing else done to it, not config profiles, etc?
The string you are entering into the Active Directory Domain field in Directory Utility, does that resolve in dns to an A record, and does that A Record have reverse? i HAVE to use a dns name here. the NETBios domain name that works on our windows machines will not suffice for binding our Macs, nor will a dns domain unless the root of that domain resolves to a DC. That said, the same has been true in my environment since Snow Leopard.

tnielsen
Valued Contributor

Did you upgrade the computers to 10.9 or do a clean wipe/reimage?

I suggest blowing away the /library/preferences/systemconfiguration folder on one of the computers then try binding it after rebooting.

Lewandowski
New Contributor III

Our Computer objects are correctly grouped and containered.
Forward and reverse lookups work fine at all times.
Domain field and resolution work fine

We've tried both machines that were upgraded to 10.9, fresh installs after wiping the hard drive, as well as a brand new, just shipped to us machine. All result in the same error.

It's just a really strange issue. :/

Thanks to everyone who has offered their set of eyes on this one!

endor-moon
Contributor II

Using the same time server as the AD server solved the problem for me. They were using time-a.nist.gov and I was using time.apple.com on the Mac systems. Not sure why this should solve the issue but it did. Cheers...

calumhunter
Valued Contributor

10.9 and 10.10 also will not bind to AD if they are in a site that has read only domain controllers specified for it.
in 10.8, the mac would search the list of DC's to find a writeable dc, instead 10.9 and 10.10 use the weighted RODC for the site and can not create the computer record and it fails usually with a 10001 error

bentoms
Release Candidate Programs Tester

@ender-moon I've some information about NTP & AD in the following: https://macmule.com/2013/12/14/how-to-check-your-active-directory-domains-time/

etippett
Contributor II

@Lewandowski : Nikki, did you ever come up with a solution for this? We had no problems with Mavericks, but now are experiencing very similar issues with Yosemite. Binding fails maybe 20% of the time, as well as Macs randomly lose the ability to authenticate users (trying to id a username returns "no such user" and trying to browse the domain with dscl results in "DS Error: -14009 (eDSUnknownNodeName)". The computers cycle back in forth between working and this non-working state with seemingly no reason. I've setup a smart group in Casper with email notifications to detect computers that are experiencing this issue and several have bounced in and out of the group.

Sometimes this is fixed by flushing the DNS cache and/or killing opendirectoryd. Other times nothing seems to fix it and then it will seem to fix itself.

We did add some domain controllers recently but those are the only AD changes I know of. If you found a solution to your problems, I wonder if it might help me as I've already looked at the items you outlined in your troubleshooting steps.

Thanks!
Eric

Lewandowski
New Contributor III

We found a work around...

For us, we deleted the following two directories: /Library/Preferences/DirectoryServices and /Library/Preferences/OpenDirectory, then rebooted.

After reboot we bound through the dsconfigad command being sure to use -preferred, we were able to bind successfully.

At this point, we bind to AD with this workflow on Mavericks and up, but cannot bind through the GUI prompts.

Hope it helps!

nessts
Valued Contributor II

Just a guess but start by Verifying DNS, use a command similar to this replacing my.ad.com with your AD domainname.
nslookup -type=srv _ldap._tcp.my.ad.com
verify all of those addresses exist and are reachable via ping. do nslookup my.ad.com
verify all of those are pingable
make sure time servers are set to my.ad.com
and if you still have flakiness then get networking involved and have them check firewall logs about what is getting denied. I have a client that everything was working great for a couple of months, then somebody got the great idea to pre-add some domain controllers into DNS and add some other new domain controllers on new subnets, and all of the sudden binding started taking a long time, could not resolve usernames. because the authentication plugin would get a hold of one of the not on DC or one of the ones that has firewall issues and everything breaks real quick that way. If an old DC has been decommissioned improperly and a record is hanging around it can cause these random problems on the Mac side and never affect the windows clients.

nessts
Valued Contributor II

Or for the more unix minded out there

dig _ldap._tcp.my.ad.com -t srv

and

dig my.ad.com

are probably the more advanced way to lookup the same information...

etippett
Contributor II

@Lewandowski Thanks for the quick response! On the one system I've tried so far, that's unfortunately not working. Were you having bind issues with freshly imaged machines as well? I'm surprised that clearing out those files on a computer that has been imaged did anything, since they should have been empty to start with. What is your bind script like? Any specific options with dsconfigad? What are you using for the preferred server? A GC or just a DC?

@nessts We've verified DNS as outlined in this Apple KB article. Everything checks out. Time server is verified on the client, but I'll check on the domain controllers too. I'm seeing issues joining to multiple domain controllers, so I doubt they're all out of whack, but it's worth a look. I also need to talk to my AD guy about any decommissions that have happened. I know we're added some DCs but I don't think any have been decomm'ed. I also don't think firewall issues would play into this as this is all on our internal network and I don't believe there are any firewalls between the devices. I'll double-check, though.

Thanks!

Lewandowski
New Contributor III

@etippett Trust me when I say we were more surprised than you... If we run the script/terminal command without deleting those folders first, it fails. We're thankful, but still not entirely sure why it works. We first found the issue on freshly imaged machines, but were able to replicate on machines that were upgraded since being bound to AD if we unbound them and attempted to rebind.

We preferred our primary DC. We did specify it by its DNS entry rather than its IP. I don't think we tested to confirm if that mattered.

Full command: dsconfigad -add domain.com -username administrator -password -preferred primarydc.domain.com -mobile enable -mobileconfirm disable -groups "administrators"

calumhunter
Valued Contributor

the -preferred flag only takes effect once the machines are bound..

-preferred server
              Use the specified server for all Directory lookups and authentications.  If the server is no longer available, it will fail-over to other servers.

You can not specify which DC to bind to with dsconfigad or any other built in tool.

I believe that you can specify which DC to bind to with centrify though

I would definitely verify DNS as per that kbase article and go through the results with your AD/DNS team to ensure all the results are valid and those machines are active and not decommed or unresponsive.

You mention imaged machines... how did you create the images? from never booted dmg's like AutoDMG or COSXIP package?
If you used a golden master method, did you scrub the local KDC?

etippett
Contributor II

@calumhunter Golden master, but created/imaged with DeployStudio, which scrubs the local KDC automatically. I would think if there was an issue in the image itself, we'd being seeing this issue on all systems, but at this point it's a limited set and seems totally random. Definitely a good thought, though, and I'll try it just to make sure.

@Lewandowski Sheesh that is weird. Thanks for sharing your bind command. I'll give it a shot.

etippett
Contributor II

@calumhunter Do you have the command to reset localKDC on Yosemite? Everything I'm finding online seems to not apply (files don't exist)

etippett
Contributor II

So we ended up discovering a stale DNS record for a domain controller that had been moved to a new IP as well as a DC that was having replication and other issues. After resolving those, it seems like things are improving. I setup a policy to run on any computer that enters my "broken bind" smart group; it flushes DNS (killall -HUP mDNSResponder) and restarts directory services (killall opendirectoryd). So far all computers that have checked in to run this have then fallen out of the smart group. Now to see if it stays that way!

Thanks everyone for your input. It's looking like I'm going to have a better weekend thanks to you! :)

Eric

macninja_IO
New Contributor III

@Lewandowski Did you ever find a consistently working solution?

/Michael