Restarting the Jamf binary when Macs stopping checking in

jamesandre
Contributor

Howdy... I’ve detected an issue where the Jamf binary stops checking in, this can be for hours, days, weeks or even months. This is evident when Macs have run their Inventory Update for "x days". It would appear that the Jamf binary begins its check-in process but never completes, this stops any further check-in attempts as the process is still running and wont attempt to check-in until the original process has completed. If you attempt to manually check-in you will get a similar error to:

This policy trigger is already being run: root 88591 0.0 0.0 34245156 1048 ?? Ss 21Jul22 0:03.85 /usr/local/jamf/bin/jamf policy -stopConsoleLogs -randomDelaySeconds 300

I suspect this issue is caused by a network interuption when a policy is running or a script within a policy that cannot complete (the softwareupdated process has been hanging on some versions of macOS Big Sur). There does not appear to be a time-out for the Jamf binary.

CasperCheck or the new Jamf-Management-Framework-Redeploy API function do not resolve ths issue. Killing the Jamf binary resolves the issue and the Mac can check-in with the jamf server again. Since getting access to a Mac that isn't checking-in is can be somewhat difficult, I have made something I am calling "Jamf Restart" which lifts ideas and code from CasperCheck and AppProcessKiller.

 

Jamf Restart consists of:

  • A Launch Daemon ( /Library/LaunchDaemons/com.expample.jamfRestart.plist) that will run the script once a day.
  • A script (/Library/Scripts/jamfRestart.sh) that checks if the process has been running for 1 day or more.
  • A log file (/var/log/jamfRestart.log) that captures that output of the script.
  • An Extension Attribute (Jamf Restart) that reads the log and displays if the Jamf binary has been killed.

 

The LaunchDaemon and Script can be packaged and deployed via Jamf (a lot easier to do when all Macs are checking-in). You can launch the LaunchDaemon with the command

/bin/launchctl load /Library/LaunchDaemons/com.example.jamfRestart.plist

Once Jamf Restart has been deployed to a Mac, it will check if the process for the Jamf binary has been running for more than 1 day, if the Jamf binary has been running for more than a day, it will kill the process. After the Jamf binary has been killed, the next scheduled check-in will run correctly. Any policy run from Jamf can reasonably be expected to complete within 1 day, so killing the process when in it has been running for so long will not stop any policy with any chance of success from completing.

Jamf Restart will not fix the "Device Signature Error” which stops the Jamf binary from running, I have been testing the Jamf-Management-Framework-Redeploy API function for that.

Jamf Restart should ensure Macs keep checking into Jamf, and will allow you to identify which Macs have had issues checking in, so they can be investigated further.

 

The LaunchDaemon:

 

 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>com.example.jamfrestart</string>
	<key>ProgramArguments</key>
	<array>
		<string>sh</string>
		<string>/Library/Scripts/jamfRestart.sh</string>
	</array>
	<key>RunAtLoad</key>
	<false/>
	<key>StartInterval</key>
	<integer>86400</integer>
</dict>
</plist>

 

 

 

The Script:

 

 

#!/bin/bash

processRuntime=$(ps -ax -o user,pid,etime,args | grep "/usr/local/[j]amf/bin/jamf policy -stopConsoleLogs -randomDelaySeconds 300" | awk '{ print $3; }' | grep -o '.*[-]' | awk -F\- '{print $1}')
processCheck=$(ps -ax -o user,pid,etime,args | grep "/usr/local/[j]amf/bin/jamf policy -stopConsoleLogs -randomDelaySeconds 300" | awk '{ print $2; }')

logLocation="/var/log/jamfRestart.log"
scriptLogging(){

    DATE=`date +%Y-%m-%d\ %H:%M:%S`
    LOG="$logLocation"
    
    echo "$DATE" " $1" >> $LOG
}

	if [ "${processRuntime}" = "" ]; then
        scriptLogging "JamfBinary has not run for more than 1 day"
    else
        scriptLogging "JamfBinary has run for ${processRuntime} days"
        scriptLogging "JamfBinary Process ID: ${processCheck}"
        scriptLogging "Quitting JamfBinary..."
        sudo kill -9 ${processCheck}
    fi
    
exit 0

 

 

 

The Extension Attribute:

 

 

#!/bin/bash

jamfRestart=`/usr/bin/tail -10 /var/log/jamfRestart.log | grep "JamfBinary has run for"`

if [ "$jamfRestart" != "" ]; then
   echo "<result>$jamfRestart</result>"
else
   echo "<result>No Restarts</result>"
fi

 

 

 

Hopefully this is of use to someone.

22 REPLIES 22

AVmcclint
Honored Contributor

I've been experiencing this for at least 5 months.  I can't wait to give this a try!

AJPinto
Valued Contributor II

I may be totally glossing over something but, why are we not just rebooting the devices regularly? Set a FV auth reboot policy to run daily or weekly if the devices are unattended. Or make an EA to read up time and reboot devices that have not rebooted in XYZ days. It just seems like a lot of hoops to just bounce daemons and agents which restart when you reboot a Mac.

AVmcclint
Honored Contributor

The problem is that if the jamf process is stuck, then it won't ever check in to run any policies. Currently I have been reaching out to affected users (which sometimes is over 100) and politely asking them to restart their computers. I only get about 20% response rate with that. I like what @jamesandre has done here because it makes the computer fix itself. If we see repeated instance of it happening on a computer, then we can focus our support efforts only on those. 

AJPinto
Valued Contributor II

Seems like a long way around to avoid forcing Macs to reboot on a schedule preemptively before any problems occur. I absolutely see a usefulness in something like this as a fall back, but the main solution should simply be dont let the Macs get an up time of more than a week or so.

 

On a side note if SSH is enabled, and you have a local admin accounts on the macs that you have access to. You can just SSH the devices to do whatever you need to. May be helpful for the users who dont respond, just run sudo shutdown -r now on their devices. Not the nicest of solutions, but its also not nice to ignore emails. 

AVmcclint
Honored Contributor

The SSH solution is blocked by the fact that many users aren't on the local network. They are at home behind their NAT'd home routers. And if they are on the company LAN, they aren't checking in with Jamf to update their IP address.  Getting users to restart their Macs is a never-ending battle. It doesn't help that the jamf process gets stuck from time to time. Implementing this will not be as simple and pushing the policy out to all computers. We will still need to reach out to users who are already in this state of non-check-in and get them to restart. But once that's done and they get the jamfRestart Daemon and script, then my Forced Reboot policy can play a more active role in keeping all the systems functioning properly.

AVmcclint
Honored Contributor

I can say that most of the computers I am discovering this problem on have uptimes in excess of 30 days - some as high as 100 or more. I think the lowest uptime I've seen this affect was about a week.  I am proposing to my management that I implement a forced reboot policy that will tell (not ask) the user that the computer will restart in 2 hours when it is detected that their uptime is greater than [TBD] days. The weak link in that is if the jamf process is stuck, then that policy will never run. Automating the restart of the jamf process is a great idea that I wish Jamf would implement on their own.

SCCM
Contributor II

why not just create a uptime script which runs locally. If the device has been up for x number of days then force a restart (which should fix your issue).

AVmcclint
Honored Contributor

I've considered that as well, but the threshold may not be etched in stone. This month, management wants to limit it to 30 days, next month they may change their minds and want to limit it to 14 days. It's a lot easier to change the script variable than it is to push out a new  script. And there's no easy way to know which version of the script a Mac may have on it at any given moment.  

scottb
Honored Contributor

This is a bigger problem with "clients" not enforcing some sort of reboot/update schedule with hard dates.  I have Macs that have not rebooted for MONTHS!  Can we get client reps to let us do something?  No.  So then I just say "it's broken until you let us manage these Macs with common sense or you learn your peeps on proper computer use."

Nobody has the sack to say OK to any of it.  

jamesandre
Contributor

Most of the Macs won't be reachable via SSH. I'm not going to regularly check a Smart Group for Macs that haven't checked in, then try and connect to them via SSH, then kill the Jamf binary. That sounds like too much work, I'm going to automate it.

Forcing everyone to restart after an arbitrary amount of days is going to create an increase in the number of calls to the Helpdesk. I do not want people contacting the Helpdesk because they have a deadline or presentation and they want to cancel a forced restart, that's not fair on the Helpdesk or person using the Mac and I want to ensure a good relationship between the two.

The Mac is working fine, the binary is the part that is causing the issue for me (and not the person using the Mac). I'm also not going to force everyone to restart when it may only be 10% of Macs affected, I want to keep people happy.

By implementing this method rather than waiting to restart every 7 or 30 days, I can ensure the Jamf binary checks in at least every day. If I have a patch for a zero-day vulnerability, then I want the Macs to be checking in at least every day. As long as security updates are getting applied, I do not have an issue with long uptimes. It also gives me visibility over which Macs are experiencing an issue with the Jamf binary.

 

 

jamesandre
Contributor

And if the Jamf binary would just time out when a process runs to long then that would be great. Maybe it could self heal when it gets a device signature error too. Then I could get on with managing Mac, rather than managing the thing that manages the Macs. 🤠

AVmcclint
Honored Contributor

@jamesandre I was going over the Extensions Attribute code and shouldn't the tail command look at the very last line instead of the last 10 lines? 

tail -1 

instead of

tail -10

 

jamesandre
Contributor

I'm looking for the log entry "JamfBinary has run for X days" which won't be the last line in the log file. It should be in the last 10 lines of the log file depending on how often you run an Inventory Update. You could increase it if you want a better idea of history. 

I also have a Smart Group that looks for like "JamfBinary has run for" within the Jamf Restart Extension Attribute.

martindulguerov
New Contributor

Thanks for sharing, @jamesandre !

I'm just testing it in our environment, and will deploy it wider soon.

 

Just a heads up that there's a typo in your post;

This line:

A log file (/Library/Scripts/jamfRestart.sh) that captures that output of the script.

should be:

A log file (/var/log/jamfRestart.log) that captures that output of the script.

 

 

thanks again!

Thank you! I read it over a million times and never spotted it. 😿

christian_ortiz
New Contributor

This seems like a great idea
@jamesandre 
I'm trying to wrap my head around how this will all be doable in my environment. 
From the looks of it, we'll have to use Jamf Policy to push the launchdaemon using:
/bin/launchctl load /Library/LaunchDaemons/com.example.jamfRestart.plist

But, the problem is that anyone who's currently affected by a stuck jamf policy won't get this new policy. 
In terms of deployment, is there something more simple i'm not seeing here? 

You are correct. That's the Catch-22 situation. To work around this, I've been reaching out to Macs that I can identify as not having checked in for a while. I let the users know there's a technical glitch on their Mac that needs to be addressed by a restart. Once they restart, then the policy runs to install the LaunchDaemon. So far it seems to work ok. 

Yeah, not the easiest. But generally a restart will allow the policy to install. I also set the policy to install on "Network State Change" as this seems to work as a seperate process. 

cc_rider
New Contributor III

hi @jamesandre,

"Jamf Restart will not fix the "Device Signature Error” which stops the Jamf binary from running, I have been testing the Jamf-Management-Framework-Redeploy API function for that."

Did the API function fixed the error?

Naisu
New Contributor II

I added a signed configuration profile created in iMazing Profile Editor, so be able to not allow the user to disable the launchdaemon, thought I share.

Open iMazing: Search for "Service Management - Managed Login Items" 

Add label: com.example.jamfrestart

Save profile and sign it.

Upload to jamf and deploy. 

 

This makes sure users can't disable it.

 

MatG
Contributor III

@jamesandre 
is there a way to test this out on a Mac that is working fine?

cc_rider
New Contributor III

Did anyone test this solution somehow?

@jamesandre, also, in your LaunchDaemon, the key is set to False...Wasn't supposed to be True?

<key>RunAtLoad</key>
	<false/>
	<key>StartInterval</key>

And the EA, what is it for? Is it just a sort of sanity check, to see if the whole solution worked?