SSH and Screen Sharing Issues in Big Sur and Monterey

elsmith
Contributor II

I've already put in a ticket with Apple and perused the macadmins Slack, but I thought someone here might have had the same issue or know of a fix...

We have a fleet of about 3800 machines. Most are Catalina, but we have about 800 running Big Sur and we're currently testing out Monterey (about 10 machines there). We're seeing a lot of problems with both SSH and Screen Sharing (also through ARD) where, if the machine is running Big Sur or Monterey and it sits for a day or two (like over a weekend) without being used remotely, it is no longer accessible via SSH or VNC/Screen Sharing/ARD. Rebooting the machine fixes this, but it's hard to do that since 1) the machine is onsite and the user is typically trying to get in from offsite and 2) SSH doesn't work. Currently we have a policy in Jamf that will reboot the machine for us and that normally fixes it, but now *users* are experiencing it and putting in tickets. This problem only occurs on the Monterey and Big Sur machines - Catalina never has any problems. (for example, Big Sur or Monterey to Catalina will always work, but Catalina to Big Sur or Monterey or even Big Sur to Big Sur/Monterey to Monterey will have issues.)

We thought it might be a power setting, so we've set everything to keep the machines from going to sleep. It also seems that sleep is not the issue, because you can have someone wake it up by hitting spacebar or jiggling the mouse and it *still* won't allow the remote connection. It's also not a permissions thing because it never reaches the point where it's asking for the login information - it just acts like it's offline (sits for a few minutes and then times out).

If someone onsite can access it, they can reboot it and fix it, but it's a hard reboot at that point.

The most frustrating part of all of this is that the machine will allow access all week, sit for the weekend, and not work on Monday. It rarely happens during the week. It also seems to happen if the machine is not accessed via SSH or Screen Share for a few days, even if someone has actually been physically using the machine... like if I work on site in front of my Big Sur Mac Pro all week and then leave on Thursday without rebooting, there's a 50% chance I won't be able to access it from home on Friday.

Has anyone else seen this or have any suggestions?

24 REPLIES 24

mschroder
Valued Contributor

I only have few machines to connect to using SSH or Screen-Sharing, but I don't have this problem to connect after a weekend. For devices that have dynamic IP it is essential they don't go to sleep, otherwise they would lose their lease, but you have already covert the sleep part. Did you have somebody log in at the console, or just jigle the mouse to get the login window? I wonder at which stage at gets a new DHCP lease. When somebody logs in, what are the Sharing settings? Are these bound to AD or all local accounts?

Everything is DHCP but they usually get the same IP over and over again. It's funny you ask about someone logging in or just moving the mouse... we've tried both. In fact, right after I wrote the post, I walked to another machine and could no longer remote back to the one I was using. We did find that stopping and restarting the services for these works instead of a restart so that's the new way to go, but there has to be something we're missing ☹️ All these machines are bound to AD, but the results are the same for both local and AD/Mobile accounts (mostly because I don't think it even tries to authenticate). It really just acts like Screen Sharing or SSH are not turned on in the Sharing System Preferences screen.

nycnewman
New Contributor III

Seeing the same issue myself. Trying to figure out what's going on.

Lincolnep
New Contributor III

having the same issue we nailed it down to 802.1x issue were the Mac's were failing there auth to many times and then being moved to a locked down vlan with not network access,

Apparently this is the standard approach 802.1x systems do for multiple failed auth attempts.

 

Hope that can help some people out.

scottwertz
New Contributor

Did anyone else come up with a solution for this?  Specifically, when Screen Sharing stops working, is there a command to run that will get it back online?  Going on-prem to reboot them is not working out so well.

Actually, we ended up writing a script that users can click in Self Service. The script then verifies they are the owner of the machine and that it's actually checking in... and if all things come back "OK" then it creates a policy for that machine only that restarts the services. I can post more info on how we're doing it in a few minutes... it seems to be working out pretty well, but it's still frustrating that Apple claims there is no problem.

launchctl unload /System/Library/LaunchDaemons/ssh.plist
launchctl unload /System/Library/LaunchDaemons/com.apple.screensharing.plist
launchctl load -w /System/Library/LaunchDaemons/ssh.plist
launchctl load -w /System/Library/LaunchDaemons/com.apple.screensharing.plist

 

We do it "on demand" which is why we have the other part that creates a policy for that one machine when the user wants to do it.

nycnewman
New Contributor III

Currently still debugging cause but have created a script to restart services from command line. For our use case we also restart the device every night to clear the issue (due to a separate issue with a vagrant utility that occasionally fails)

For SSH you need to do

launchctl unload /System/Library/LaunchDaemons/ssh.plist

launchctl load -w /System/Library/LaunchDaemons/ssh.plist

Please post if you do find a cause... this is driving us nuts, too!

elsmith
Contributor II

@nycnewman @scottwertz - do either of you use Carbon Black or Symantec? Apple responded to us wanting us to disable the network content filters of both of those pieces of software... so I was curious if it was common between those of us having issues.

We do in fact have Symantec Endpoint Protection, but we're working hard to ditch that in favor of Microsoft Defender.  Did Apple confirm that that causes a problem or was it just a suggestion for troubleshooting purposes?

It was the standard Apple response of "did you try with these things disabled" so I think they are trying to narrow down all possibilities. I did indicate to them that we would try on a few test machines, but that it is not a viable fix since our Cyber department requires both products for all network-connected machines.

Quick update: We did not see any real evidence that Carbon Black or SEP network content filters were doing anything to the connection, but we did notice in further testing that every machine we had issues with had the Apple application firewall disabled. We pushed out a config profile that forces that to be enabled (and takes away the user's ability to turn it off) and the bank of machines we'd seen have the issue are now working correctly (and have been for over a week).

I'm not 100% sure this is actually going to permanently fix it, but it has been working for us so far. Just thought it might be helpful for others!

I see this is a somewhat old thread but I'm seeing similar issues for Mac minis that we make available to some remote users. They'll be fine until they're not. What's curious is that they're still checking into Jamf, so they're up, it's just that they're not reachable remotely.

I've not tried adjusting the settings on the Application Firewall and can look at that. Was that a fix for you long-term?

We actually never really got a long-term fix. For SSH, we actually set up a LaunchDaemon that restarts the SSH service every hour, which seems to have mostly fixed the SSH issue. Now I use that to SSH to the machine and force-restart it when VNC stops working. Occasionally, though, we still need someone to be at the machine and either restart screen sharing or power the thing off and back on.

My Apple ticket was never resolved because we could not reproduce it "on demand" for Apple.

The Macs we're seeing affected are still on Monterey. Are yours on Ventura now? I assume that if they are you're still seeing this issue?

We see it on both. It *is* less frequent on Ventura, but it still happens. 

I also forgot to write in my last response - we actually turned the Application Firewall completely off and had no changes. We are running a bunch of other software that our Cyber folks deem "required" so something there might be causing it, but even our Cyber teams have experienced (and tried to fix) the issues, so we're pretty sure it's happening due to an OS setting and not a third-party app.

Thanks for the prompt replies! 

At a minimum it's good to know that it isn't just us. I was beginning to think it was just a subset of Macs in our environment because of where they're physically located (network and such). However, your post makes me think it might be more about the systems sitting idle w/o someone logged in. 

We have a group of Mac servers that are always logged into as they are processing queues of submitted jobs. They don't experience this problem to the best of my knowledge.

I was seeing that problem on machines that were constantly in use as well as those that sat with no user logged in.  It's just occurred to me that I haven't seen this problem in quite a while though, and I'm not sure when it stopped.  We have moved from SEP to Microsoft Defender, and upgraded all Macs to Ventura 13.3.1.  Certainly there have been other changes along the way.

We're currently testing out Microsoft Defender - maybe that is the key! When initially troubleshooting for Apple, though, we had uninstalled SEP and all other Cyber tools (EDR, pf firewall, etc) and still saw the issue.

I was actually hoping it was one of those things Apple will just silently fix in the background 🤣

Hello elsmith,

Did you manage to get a solution for this issue. I know that this post is really old, but was curious if you found a fix/workaround. I am also seeing the same issue on my side, and it seems quite frustrating to deal with.

Hey there! We had a workaround for a while where we made a policy to create another policy that would stop/start SSH and Screen Sharing services. The user could click that and then it would run the policy on whatever machine they typed in as the machine name on the next checkin.

Recently, we noticed no one was running the policy anymore, and we've checked around - it seems to have fixed itself (which took forever since it's been a problem since Big Sur was in beta...) but we're only running Ventura and Sonoma here.

We just scripted a launchd periodic job to cycle the service

AnibalP
New Contributor

Having same issue with a twist 

Previous steps only worked on Intel machines

Cannot get it to work on M1 machine