Skip to main content

Anyone who has upgraded to 9.96 seen any bizarre behaviors in their clustered environment?

@andyinindy 3 - webapps run on Mac Mini's 1TB of space, 16 GB RAM 8 GB for Tomcat, Tomcat master runs on an Quad Core XServe 32GB RAM half to Tomcat, MySql runs on a Quad Core Xserve, 32 GB of RAM and the master distribution point is now a Mac Mini, same specs as the webapp minis


Patch reporting seems to be checking for new titles quite often (as seen in the JAMFSoftwareServer.log). If you have large databases, there might be issues if the queries are not answered fast enough and connections could pile up.



Did you check the mysql-slow.log and look for queries which take a long time or are actually blocking the whole server while executing?



I had a lockup problem on 9.81, anaylzed the SQL logs and added new SQL indexes; now everything is going very smooth. The only problem is that you have recreate the indexes every time the Master JSS application restarts and does its database check, but they really help if you have a lot of extension attributes.



For reference, those are the indexes for 9.81 (and the execution times for creating them, which are quite revealing).



Add mySQL indexes to Casper database. This has to be done after every restart of the JSS master node:

CREATE INDEX `computer_group_name` ON `computer_groups` ( `computer_group_name` )
52ms

CREATE INDEX `display_name` ON `extension_attributes` ( `display_name` )
173ms

CREATE INDEX `extension_attribute_id` ON `extension_attribute_values` ( `extension_attribute_id` )
11.6s

CREATE INDEX `is_managed` ON `computers_denormalized` ( `is_managed` )
506ms

CREATE INDEX `last_contact_time_epoch` ON `computers_denormalized` ( `last_contact_time_epoch` )
536ms

CREATE INDEX `last_report_date_epoch` ON `computers_denormalized` ( `last_report_date_epoch` )
525ms

CREATE INDEX `operating_system_build` ON `computers_denormalized` ( `operating_system_build` )
542ms

CREATE INDEX `operating_system_name` ON `computers_denormalized` ( `operating_system_name` )
565ms

CREATE INDEX `operating_system_version` ON `computers_denormalized` ( `operating_system_version` )
592ms

CREATE INDEX `package_file_name` ON `cached_packages` ( `package_file_name` )
103ms

CREATE INDEX `package_name` ON `package_receipts` ( `package_name` )
60s

CREATE INDEX `type` ON `package_receipts` ( `type` )
73,9s


Really, why on earth would you not have an Index on extension_attribute_values.extension_attribute_id, package_receipts.package_name and package_receipts.type? The creation of the index takes almost 3 minutes, imaging how long the queries take without those!


Patch reporting did me in too. It was the first thing I enabled after I upgraded from 9.92 supplemental to 9.96.
The next day, I got max database connections on my three nodes which were slamming our poor database server.
I should have keyed on that detail as I was troubleshooting this issue with our TAM who finally found a bug report regarding patch reporting on large databases.
If you had to deal with this for a few days, I highly recommend repairing your database after removing patch management. One of my policies got corrupted and started removing the Office suite from all of our 7000 Macs.


todays update ...so far so good since yesterday about 930AM. Tuesday seems to be a trigger for some reason, so I'm hoping this weekend is good and that we make it past Tuesday. If something changes I'll post what has taken place.


Survived the weekend and today without any issues with the JSS or it's webapps. Tuesday has been the trigger day for it's instability, here's hoping we can get past it. Either way I'll update this thread.


how's everyone's clustered environments doing?



Our issues got better when we turned off patch reporting, but it still goes down consistently if I push out a config profile to as little as ~900 devices.



It also goes down randomly a few times a week overnight around 12:08-12:13am. My guess is it's when iOS apps are updating.



It's getting really old trying to restart mysql/tomcat on the servers to get everything back up.



Anyone else running into this still? JAMF may want to move our DB next from the D drive over to the C drive, I hate to do this if it's not going to help.


@CasperSally MySQL for us is stopping about twice to three times a week. We aren't finding anything as to why. It just stops. Reinstalled MySQL twice now. We are clustered but I don't think that matters. I left the dmz server off for a week and it still went down twice. Ours goes down at random times. Sometimes 8am, 3pm, 1am not while stressed/updating apps. Patch reporting has never been turned on.


@Nick_Gooch is mysql hosted on windows? if so, c drive? thanks, this sounds very much like our issue so want to send to our TAM in case they don't know our cases are so similar.


I'm seeing something similar. Both the JSS and MySQL are running on a Windows server on the C drive, and I had to restart the services twice this weekend since upgrading to 9.96.



I tried doing a full inventory request on all of our devices and that seemed to kill it this morning, which was never a problem in the past.


thanks @Emmert - this is great info.



Our cluster just crashed again. 2nd time today. I get alerts from the load balancer and then check JSS settings - clustering - and the connection counts are always high and usually at least one of the JSS's behind the load balancer I can't get the website to load.



This is getting really old.


@CasperSally Windows, C drive for both JSS and sql. Does sql just stop running for you when it goes down? That's what we are seeing. And it ALWAYS happens when I am out of the office.


@Nick_Gooch Sql has never stopped for me. For us, I get email from load balancer that one of the child JSS's is down. I check connections in settings and they're out of wack and usually one or both of the web interfaces of the child servers is down.



To get everything back up, I usually have to restart mysql on our db server and tomcat on our parent and 2 child servers. About half the time restarting tomcat doesn't work on one of the child servers and I have to reboot it.



For us, 75+% of the time it happens over night around 12am. Our logs flush at 1am and our backup happens at 2am so it's not either of those. The other 25% of the time it happens if I try to do anything that involves APNS (push a config profile to 900 devices crashes it every time a few mins after the push) or it just went down again and I suspect it was because our iOS tech was changing settings on apps so they don't all try updating at midnight.


We are not a clustered environment; single Ubuntu JSS server with a separate MySQL Ubuntu DB server. We are also seeing some of these issues since 9.96. Very slow JSS web performance and if we make a change to a config profile the server load (using TOP) spikes to 50 or more and the JSS stops responding. I have to wait until the load drops to under 20 and then things get back to normal.



Have NOT seen the JSS just crash though.


My windows JSS's (two behind LB) started going down 2-3x a week (tomcat hanging), starting a couple weeks ago. Just upgraded to 9.96 and changed some memory settings last week and its no better. Our windows admin is started some process trace work to figure out why. Is everyone having issues on Windows? I'm wondering if it was a MS patch causing the problems... I'm working on a plan to convert these boxes to linux...


Hey Todd-
Unfortunately, it's not just a Windows issue. We started having crashing issues with 9.96 on RHEL 6. The issues didn't appear in test, but our test environment doesn't have the same load as production and our test environment isn't (yet) clustered. We had tried increasing the thread pools per JAMF Support, but it didn't resolve anything. In the end, disabling patch reporting brought our JSS environment back to a stable environment (knock, knock)


Would a memcache server help at all in this instance?



https://www.jamf.com/jamf-nation/articles/428/caching-configuration



https://www.jamf.com/jamf-nation/discussions/20755/memcached


jamf support told me not to touch memcache yet when I asked them.



They did just give me an edit on my my.ini file (added line table_open_cache=6000) - after adding and restarting mysql and all my tomcat instances - I was able to send out a MDM command and my servers didn't crash.



First time since going to 9.96, I think. I'd check with your TAMs, but maybe this could help some of you.


Last week this started to occur again on Mon/Tue when I was at JNUC, and nothing after that until today. Different webapps have been bouncing since this morning through out the day.



I'm hoping 9.7 will take care of this issue.


Back at the bouncing web apps this morning again. Rebooted them all from being down all night and one has already went down. Now, when I say it went down..it's still up, but unable to communicate across the network at all. Can't ping it, can't ssh into it..all you can do is power cycle it...


one thing I have noticed about these incidents, is that often happen after the weekend primarily on Mon/Tues..fwiw


I am having the same issues as most everyone else on this thread. I have 5 JSS's 4 behind LB all windows server 2008 VMs running 16G ram and 4 cores each. My database server is a physical machine running 16G ram 16 cores and windows server 08. I have worked with my TAM to adjust threads, connections, pools and anything else we can think of with no success. We setup a memcaching server and converted the database engine from MyIsam to InnoDB for all tables except 2 and still no significant improvement. I am about at my wits end with this problem. I would just revert back to 9.92 but we needed 9.96 for the iOS 10 support.


was advised today by my STAM that it's possible java 8u101 is a commonality in these issues and to update to the latest version which is 8u111. I finished that up a bit ago, and will be monitoring it closely.


I am running java 8u91 I will look into going to 8u111.


It appeared that the update to java was going to work but then all of the webapps eventually crashed again.



Back to the drawing board...


@mahughe @m.donovan have you tried table_open_cache setting in my.ini? I'd check with your TAM first but we've been up ever since doing this.