Posted on 10-09-2018 02:11 AM
Hi,
I'm experiencing a very unusual issue. At first it was only me and I put it down to my laptop being ditzy, but now one of my users is experiencing the problem too which has got me spooked. Here's the deal:
We have multiple sites and will use 2 as illustrative examples; site A and site B and I'll start with site B first as this is where we are seeing the problem.
At site B we have a Server 2012R2 AD server. Affected users can log into their machines sometimes, but at other times the password will not be accepted; instead, the previous password for that user's account will be accepted. The issue is experienced when waking the laptop from sleep (our devices are set to prompt for password when waking from sleep by policy). The issue seems to get worse as the affected users spends more time at site B, but I have no measurable metric against which to confirm this; further it seems possible that if you type your password very soon after waking you can get in, but if you wait for a few seconds/for the wireless network connection to become established (not sure which) you won't be able to get past the login screen with the correct password. Further, if the affected users starts up their device at site B, only the 'wrong' password will be accepted and the OS will require the keychain to be updated - keychain will ask for the 'previous' (i.e. correct) password which, when entered, will update it to the wrong password. Once the wrong password has taken effect it is used to make changes to device settings (e.g. change time zone, etc.) as well as permitting login.
At site A we have a Server 2008R2 AD server. Users can log in to their device without issue at this site. If the affected user's keychain has been updated as described above, the process can be reversed from site B.
Note that, at site B, Windows 7-based devices do not suffer the issue; if the affected user logs into a Windows 7-based device they can do so with the correct password even if the macOS-based device is experiencing the issue. The issue will present itself on multiple macOS-based devices; the constants seem to be the site, the user and the device OS.
The number of users unaffected vastly outweigh those who are - only 2 known cases vs 200 working without issue.
Further, changing password through System Preferences on the local device is not possible at site B, giving an error message and refusing to proceed; no issues at site A, even when changing the password to the same string as attempted without success at site B.
Lastly, network services requiring login can be accessed from any site, including B, using the correct password. Only the local device is affected by the issue, even when the hosted service server is using a Server 2012R2 AD for authentication.
Lastly, the issue has also been observed at another site, site C, which also has 2012R2 as AD server. All servers are in the same domain and a comparison of affected user objects on the 2012R2 and 2008R2 server does not show any discrepancy.
The below describes how the laptop is configured.
macOS Sierra and High Sierra (issue observed on both)
FileVault enabled
Device bound to Active Directory (Create mobile account at login)
Issue happens only at sites with Server 2012R2 as AD server
Kerberos & RADIUS authentication is performed at login (network payload in Jamf Pro config profile)
Can anyone help identify the cause of the above issue?
Posted on 10-10-2018 07:15 AM
@kidtrebor Hi,
This probably means there are issues with AD binding on these machines. The password is updated in AD but on the client machine the old password is effective. When the domain controller is not reachable, the user is able to log in with the old cached password.
Try to remove the binding and bind again, log out, make sure the machine is connected to the network and log in with the actual AD credentials.
Also, do you have any read only domain controllers in the forest?
Posted on 10-11-2018 07:23 AM
Hi rihardsp,
Thanks for the reply - I've re-bound the affected computer previously, to no avail. Further, if I unbind and try to rebind at site B, I just get an error saying Directory Utility can't find a domain controller.
Again, the issue is not observed at site A, so I don't think there is a straightforward bind issue - at site A I can log in with the correct password and change my password through system preferences. Move to site B and the issue arises.
Likewise I don't think there's a straightforward issue with the domain controller - Windows machines at site B can log in without issue. I'm also seeing the same problem at site C which, like site B, has a Server 2012R2 domain controller.
Interestingly a colleague of mine suggested checking which DC my Macbook is authenticating against using netstat and having done that a couple of times I'm seeing that the machine is reporting a few different domain controllers, which might account for why I'm seeing unpredictable results.
Further - the FQDN of the domain controller reported by netstat is malformed. Let's say the FQDN of the DC is server.domain.net - netstat is reporting it as server.domai.ldap - or in more detail, the hostname is shown, the domain is truncated to 5 characters and ".ldap" is appended.
Is that normal for netstat?
Posted on 10-11-2018 07:34 AM
Are you preferring a particular DC or are you leaving that blank? Try leaving that blank and just bind to domain.net. Also, can you verify that you can ping both domain.net and server.domain.net, is the responding IPs the same?
Posted on 10-12-2018 01:04 AM
Hi Ryan,
Yes, that field is blank - interestingly I've noticed through using netstat repeatedly that the laptop is sometimes referring to a domain controller at site A. I have a hunch that when it has picked this DC the issue will not present, but I'll need to gather some data on that first.
I can also see that sometimes with netstat no active connection will be reported at all; I imagine this shouldn't be normal behaviour as, when waking from sleep, the OS should be checking in with the domain controller that the password is correct? Then again, this isn't a pure Microsoft environment we're talking about, so who knows?
If my theory is correct and it is not always doing that I am left wondering what the trigger for checking in with the DC is? And also, if the times the machine reverts to the old password coincides with times when it can't reach a domain controller, why is the cached password an old one, rather than the up to date one?
Where would the cached credentials be stored in the Keychain? Is it possible to delete only that, so the cached version is flushed out? Or is there perhaps a way of forcing the entry stored in the keychain to flush itself?
Regards,
Robert
Posted on 10-12-2018 09:18 AM
Through carefully matching netstat results with authentication logs on our domain controllers I've confirmed that the affected machine is only having issues with the DC at site B (the 2012R2 box) and I can see a corresponding kerberos pre-authentication failure event; when it authenticates against a DC at site A (2008R2) there is a corresponding authentication success event.
Kerberos pre-authentication failure means bad password. I appreciate this is more becoming a Windows Server conversation, but quite how two different servers could be interpreting the same credentials differently is not something I've ever witnessed before; particularly when it seems only macOS devices are seeing the issue. Does this tally with anyone else's experiences?
Regards,
Robert