Anyone who has upgraded to 9.96 seen any bizarre behaviors in their clustered environment?
Had some performance issues this morning after going from 9.93 to 9.96. Against the recommendation of Jamf support I increased my connection pools by large margins. It seems to have helped, but I'll be scaling it back down tonight.
What are you seeing @mahughe ?
all of my webapps behind the load balancer dump their ability to communicate about every thirty minutes. Meaning, you can't ping them, remote into them, they can't ping out either. A reboot solves the issue and the timer begins again. I upgraded on Friday, went down yesterday with this going on.
The only issue I ran into was a corrupted log_actions table which took forever to fix via the myisam command line tools. After that, things started up fine. We have several Ubuntu based webservers behind a load balancer and I haven't run into any problems with them since the upgrade on Sunday.
what did you upgrade from? i had a similar issue when i installed 9.93. turned out the 9.93 installer reset the tomcat memory settings so my jss that was running on a vm with 16gb ram was only able to use 256mb. look into your catalina logs. thats where i found the out of memory error that pointing me in the right direction.
upgraded from 9.93 to 9.96. Checked the memory settings first thing, no adjustments need. Today we did adjust the Max Pool size it had been reset back to 90, adjusted those to 1000. OSX devices can recon, but it is not correctly displaying the Inv info for the device correctly. Can enoll iOS fine, don't get profiles or self service web clip or any of the apps set to drop down on enrollment.
Repaired database. optimizing it now.
Bumping to see if anyone else is having problems? We will be forced into JSS 9.96 real soon as users keep moving up to 10 while iPads are off-site.
We are coming up from 9.92.xxxxxxxxx maintenance release.
Thanks!
I went from 9.92 MR to 9.96. Everything was fine until I added a few patch reporting titles. Then connections to my cluster went out of control and random JSS slowness. Removed patch titles and we're back to known good state. Ticket in with JAMF
Thanks for the info! I will approach patch reporting cautiously, although I was looking forward to trying it out.
we had success getting our environment back in good working order last Thursday evening make several adjustments to pool sizes, mysql, and more. At about 2AM last night webapps started dropping again and now they are bouncing about every 30 mins again.
Ironically we also had performance issues today. Things were fine for a week then today lots of table level locks and sql queries getting stacked up. I increased connections on each JSS from 50 to 1000 and we're back in business. We'll see what JAMF ends up recommending I do.
@cbrewer We too in our own troubleshooting bumped up them up to a 1000 also, and they still crashed, however it wasn't as quick as it was with them at the default 90. While working with JAMF engineer when we got things back up I learned there is a direct correlation between the webapp connections (thread)vs sql connections and how they impact each other. The total # of threads cannot exceed your sql connections without crashing sql which we experienced when we bumped up from 90 to 1000. Currently we are at 80, which we stressed tested last Thursday eve and all seemed well, everything was fine Friday and Monday. Tuesday in the earlier morning something caused one of the webapps tomcat to crash, and the roller coaster was back on again today. Today was 7 days from the initial issue beginning, so I started thinking in terms of what might running at this interval that would cause this and at this point I'm still looking for anything that does. I'll have another call with out STAM in the morning.
Anyone else who is having performance issues with 9.96 in clustered environment either not using patch reporting or tried deleting the patch reporting titles and noticed improvement?
Our issues went away after removing the 8 titles I had added. Wondering if that helps others (paging @cbrewer and @mahughe .
We also had major performance issues yesterday afternoon, and it cleared up after disabling the patch reporting. We'd also worked with JAMF over the past weeks cleaning up some complex smart group criteria, which they told us was exacerbating the problem. The improvement from the smart group cleanup wasn't anywhere near as dramatic as disabling patch reporting.
@mahughe As a general rule, if one is to modify those settings one should probably modify both MaxPoolSize and maxThreads.
maxTreads can be found in server.xml and should be 2.5 times the size of MaxPoolSize (So 'engineers' have told me in the past). I used to use 400 for my MaxPoolSize and 1000 for Max Threads. MySQL itself should allow a maximum of 1 more than the MaxPoolSize. So in this instance, it should allow 401 connections as MySQL needs to be able to connect to itself even when running at the maximum number of external connections. Changing this depends on which OS you are hosting MySQL from.
P.S. I was also notified by JAMFSoftware a few years back that Appel warned them against this practice for reliability reasons. In the end, working is working but here's to hoping the issue gets resolved
We are cloud hosted and had to remove our patch reporting to get the JSS to remain functional.
Patch reporting took down our JSS, too.
Disabled Patch Reporting and all good.
JSS (tomcat and db) hosted on a single Mac mini with 10.11.6. Patch Reporting enabled for 5 applications. At various times, when I've adjust the criteria to some smart groups based on Patch Reporting, the mysqld process will spike between 200 to 300%. It lasts from 1.5 to 3 hours. The JSS web interface is slow to process many of the pages. Sometimes newly imaged computers are not able to enroll during the spike. Resolves it's self until I mess with the smart groups again. Happened with 9.93, but I haven't tried to reproduce with 9.96. I have an open case with JAMF Support.
9.93 upgraded tomcat to 8 by default, at least on the windows side. Wonder if that is part of the issues. We ran into several issues after updating to 9.93 once school started and it was getting heavy use. So far things seem to be running better after many hours on the phone with JAMF support. We made so many tweeks and changes I really don't know what fixed our issue, or if it is really fixed. We have had sql crash occasionally as well. No patch reporting being used at all here.
@Chris_Hafner I'm at 80 on the Pool, 300 on the threads and 1010 on sql these were all modified on a call with an JAMF engineer last Friday. I just got everything back up, drove back to the office and down again in 30 mins or close to it. We currently do not have Patch Reporting enabled.
@mahughe Interesting. I'm hoping that you're not saying that your JSS is down now?
@Chris_Hafner that's what I'm saying..down. Just got back onsite to put humpty back on the wall after making a few suggested changes.
I turned on a policy for patching (not JAMFs patch stuff, I'm using Autopkgr) before I left yesterday and it seemed to kill mysql overnight at a random time after midnight. Woke up to 150+ outage emails from jss going down/up behind the load balancer. Under settings - clustering the connection counts were crazy even after I rebooted the 3 JSS servers this morning. I disabled the patching policy for now.
When I got in this morning, I rebooted the sql server and restarted tomcat on the 3 JSS servers again and now the connection counts seem normal again.
Wow! What a pain! I hadn't made it around to really testing this one and was planning to do that next week. I guess I'll be watching this thread carefully! @mahughe what did you end up changing to get back up and running?
This morning when I arrived I had one of the webapps down without it killing the NIC so I didn't get a notification of the port being down from missing it's monitoring pings. @Chris_Hafner late in the day based on input from my STAM, I changed a couple of policies that were running as ongoing and disabled one completely. Imaging worked through the evening and has still been working to this point this morning. All webapps are up and seem to be happy at this time. This morning my colleague and I found we had some remnant VPP user associates that needed to be removed and we did.
Will keep this updated as new events arise..
@mahughe: what platform are your JSS servers on? Same question for MySQL. Also, are you on physical hardware or VM's?
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.