Windows 2008 R2 JSS Crashing (Casper 9.2)

joemamasmac
New Contributor III

Hello,

I recently migrated from Casper 8.7.1 (Mac OS 10.6) to Casper 9.2 (Windows 2008) with about 900 users. The migration and install went ok, with the exception of a few hiccups upgrading. Now though under 9.2, we are experiencing some Tomcat issues that cause my JSS to become unusable.

The server is a Windows 2008 R2 VM with 4CPUs and 8GB of RAM. Usage is fairly flat with the VM normally taking up 2GB of RAM. It appears though that from time to time, tomcat will take almost 80-100% CPU and begin monopolizing all 4 cores. Eventually web requests will become unavailable and the only solution is to restart Tomcat.

Has anyone else seen anything similar? Any suggestions for tweaking my instance? I have a ticket open with support and they told me there will be a new version out soon with some Windows fixes. Until then, I have been rebooting my JSS a few times a day when the issue crops up. Any tweaking suggestions would be appreciated.

Joe

8 REPLIES 8

Niels_Illem
New Contributor II

We had this problem too. Every morning restart Tomcat and during the day the CPU was slowly fill up to max.... luckely also a VM so we could add an extra CPU.

Support let me delete all pending commands from the database, they gave me a SQL update command. After running the command I had to restart Tomcat and since the last restart CPU usage is back to normal.

joemamasmac
New Contributor III

I was guessing this was an issue with the database. I don't suppose you could share the solution with me, could you? Since it is a VM I could snapshot the server and give it a try.

Joe

benleroy
New Contributor II

We had this issue as well out of the blue with Casper 9.2 though we were a new install on 9.0. We have around 6500 mobile devices and they were running great and suddenly our Tomcat spun wildly out of control. The cause was a Smart group that somehow the Criteria had become corrupted. If you have access the DB you can try and run a query:

select mobile_device_group_id,count() from smart_mobile_device_group_criteria group by mobile_device_group_id order by count() asc;

Most of our Smart Groups had 1-2 Criteria but one had corrupted to many thousand criteria (all blank). The UI was hanging before we could delete the group. So we had to delete the group form the SQL command line, make sure you have good DB backup before you try any of this and I would encourage you to do it with support if you are the least bit hesitant about this.

Niels_Illem
New Contributor II

Hi Joe,

Just checked our server and CPU usage is climbing again, at a much slower rate then before, so the script isn't the final sollution.

W're not doing mobile devices, but I will check our Smart Groups which are in the scopes of our Profiles.

Niels_Illem
New Contributor II

oepss double click please remove

Lincoln
Contributor

I had a similar issue about 18 months ago (not on 9.2 obviously) where the CPU use would climb until maxed out, then memory use would climb until maxed out and tomcat would crash. In the beginning this was happening over the course of a couple of days but gradually got worse and worse until the point where tomcat would crash in as little as 30 - 45 minutes.

In the end it came down to some policies which we set to any trigger and ongoing. The policies were scoped to smart groups and a recon at the end of successful policy execution would seethe machine drop out of the group. The problem appeared to be that machines were failing to report in properly and were getting stuck trying to submit data to the JSS but were re-triggering the policies and so I had a perfect storm.

Even just a handful of machines were sufficient to set the cycle off. To fix I went through all my policies with an 'any' or 'every XX' trigger and changed ongoing to once per day. Then I had to reenrol all the machines (I used remote) to get them to dump all their backlog of 'stuff' they were trying to do and ask the JSS for fresh instructions.

That sorted it and I have had no such problems since. I am now very careful about which triggers I combine 'ongoing' with. The problem was made worse by the fact that I am in Australia and so I had a narrow window each day for connecting with support and trying stuff. I would get up early and come in to screen share with the guys at JAMF at the end of their day. After trolling through logs and trying various things including rebuilding the database and re-installing our JSS on a shiny new 2008R2 VM, none of which helped, I hit on the fact that when there were no machines on the JSS was OK, but as soon as my Labs came on in the morning the problems started.

Maybe this will help. Maybe not.

Cheers

Lincoln

joemamasmac
New Contributor III

Hey everyone, thanks for all of the suggestions and feedback. I spoke with Jamf today and apparently there is an update that will be released soon that will focus on performance and issues with windows servers. My account rep thought the release was imminent and would happen any day now.

Until then, I have gone to the daily reboots each morning at 5AM. So far it has worked today, but I expect I will have issues at least a few times and have to reboot the sever. I did go through and cleaned up anything that was set to Ongoing and At Checkin which improved performance. I will check my smart groups and cleaning up anything not linked to some of the older ones.

I will post after I get the latest version from Jamf and have it installed.

Joe

joemamasmac
New Contributor III

Hello all, just an update that 9.2.1 appears to have fixed most of our issues. There are still a few issues that I am working out with Jamf, but I wanted to let you know that 9.2.1 has helped.

Joe