Question

9.96 Upgrade Issues

Forum|Forum|9 years ago
September 14, 2016
81 replies
287 views

+11

mahughe
Contributor

Anyone who has upgraded to 9.96 seen any bizarre behaviors in their clustered environment?

Show first post

Forum|pagination.label 2 / 4

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
September 22, 2016

@andyinindy 3 - webapps run on Mac Mini's 1TB of space, 16 GB RAM 8 GB for Tomcat, Tomcat master runs on an Quad Core XServe 32GB RAM half to Tomcat, MySql runs on a Quad Core Xserve, 32 GB of RAM and the master distribution point is now a Mac Mini, same specs as the webapp minis

cvgs
Contributor
Forum|Forum|9 years ago
September 23, 2016

Patch reporting seems to be checking for new titles quite often (as seen in the JAMFSoftwareServer.log). If you have large databases, there might be issues if the queries are not answered fast enough and connections could pile up.

Did you check the mysql-slow.log and look for queries which take a long time or are actually blocking the whole server while executing?

I had a lockup problem on 9.81, anaylzed the SQL logs and added new SQL indexes; now everything is going very smooth. The only problem is that you have recreate the indexes every time the Master JSS application restarts and does its database check, but they really help if you have a lot of extension attributes.

For reference, those are the indexes for 9.81 (and the execution times for creating them, which are quite revealing).

Add mySQL indexes to Casper database. This has to be done after every restart of the JSS master node:

CREATE INDEX `computer_group_name` ON `computer_groups` ( `computer_group_name` )
52ms

CREATE INDEX `display_name` ON `extension_attributes` ( `display_name` )
173ms

CREATE INDEX `extension_attribute_id` ON `extension_attribute_values` ( `extension_attribute_id` )
11.6s

CREATE INDEX `is_managed` ON `computers_denormalized` ( `is_managed` )
506ms

CREATE INDEX `last_contact_time_epoch` ON `computers_denormalized` ( `last_contact_time_epoch` )
536ms

CREATE INDEX `last_report_date_epoch` ON `computers_denormalized` ( `last_report_date_epoch` )
525ms

CREATE INDEX `operating_system_build` ON `computers_denormalized` ( `operating_system_build` )
542ms

CREATE INDEX `operating_system_name` ON `computers_denormalized` ( `operating_system_name` )
565ms

CREATE INDEX `operating_system_version` ON `computers_denormalized` ( `operating_system_version` )
592ms

CREATE INDEX `package_file_name` ON `cached_packages` ( `package_file_name` )
103ms

CREATE INDEX `package_name` ON `package_receipts` ( `package_name` )
60s

CREATE INDEX `type` ON `package_receipts` ( `type` )
73,9s

Really, why on earth would you not have an Index on extension_attribute_values.extension_attribute_id, package_receipts.package_name and package_receipts.type? The creation of the index takes almost 3 minutes, imaging how long the queries take without those!

zinkotheclown
Valued Contributor
Forum|Forum|9 years ago
September 23, 2016

Patch reporting did me in too. It was the first thing I enabled after I upgraded from 9.92 supplemental to 9.96. The next day, I got max database connections on my three nodes which were slamming our poor database server.
I should have keyed on that detail as I was troubleshooting this issue with our TAM who finally found a bug report regarding patch reporting on large databases.
If you had to deal with this for a few days, I highly recommend repairing your database after removing patch management. One of my policies got corrupted and started removing the Office suite from all of our 7000 Macs.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
September 23, 2016

todays update ...so far so good since yesterday about 930AM. Tuesday seems to be a trigger for some reason, so I'm hoping this weekend is good and that we make it past Tuesday. If something changes I'll post what has taken place.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
September 26, 2016

Survived the weekend and today without any issues with the JSS or it's webapps. Tuesday has been the trigger day for it's instability, here's hoping we can get past it. Either way I'll update this thread.

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 24, 2016

how's everyone's clustered environments doing?

Our issues got better when we turned off patch reporting, but it still goes down consistently if I push out a config profile to as little as ~900 devices.

It also goes down randomly a few times a week overnight around 12:08-12:13am. My guess is it's when iOS apps are updating.

It's getting really old trying to restart mysql/tomcat on the servers to get everything back up.

Anyone else running into this still? JAMF may want to move our DB next from the D drive over to the C drive, I hate to do this if it's not going to help.

Nick_Gooch
Contributor
Forum|Forum|9 years ago
October 24, 2016

@CasperSally MySQL for us is stopping about twice to three times a week. We aren't finding anything as to why. It just stops. Reinstalled MySQL twice now. We are clustered but I don't think that matters. I left the dmz server off for a week and it still went down twice. Ours goes down at random times. Sometimes 8am, 3pm, 1am not while stressed/updating apps. Patch reporting has never been turned on.

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 24, 2016

@Nick_Gooch is mysql hosted on windows? if so, c drive? thanks, this sounds very much like our issue so want to send to our TAM in case they don't know our cases are so similar.

+21

Emmert
Valued Contributor
Forum|Forum|9 years ago
October 24, 2016

I'm seeing something similar. Both the JSS and MySQL are running on a Windows server on the C drive, and I had to restart the services twice this weekend since upgrading to 9.96.

I tried doing a full inventory request on all of our devices and that seemed to kill it this morning, which was never a problem in the past.

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 24, 2016

thanks @Emmert - this is great info.

Our cluster just crashed again. 2nd time today. I get alerts from the load balancer and then check JSS settings - clustering - and the connection counts are always high and usually at least one of the JSS's behind the load balancer I can't get the website to load.

This is getting really old.

Nick_Gooch
Contributor
Forum|Forum|9 years ago
October 24, 2016

@CasperSally Windows, C drive for both JSS and sql. Does sql just stop running for you when it goes down? That's what we are seeing. And it ALWAYS happens when I am out of the office.

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 24, 2016

@Nick_Gooch Sql has never stopped for me. For us, I get email from load balancer that one of the child JSS's is down. I check connections in settings and they're out of wack and usually one or both of the web interfaces of the child servers is down.

To get everything back up, I usually have to restart mysql on our db server and tomcat on our parent and 2 child servers. About half the time restarting tomcat doesn't work on one of the child servers and I have to reboot it.

For us, 75+% of the time it happens over night around 12am. Our logs flush at 1am and our backup happens at 2am so it's not either of those. The other 25% of the time it happens if I try to do anything that involves APNS (push a config profile to 900 devices crashes it every time a few mins after the push) or it just went down again and I suspect it was because our iOS tech was changing settings on apps so they don't all try updating at midnight.

+11

lehmanp00
Valued Contributor
Forum|Forum|9 years ago
October 24, 2016

We are not a clustered environment; single Ubuntu JSS server with a separate MySQL Ubuntu DB server. We are also seeing some of these issues since 9.96. Very slow JSS web performance and if we make a change to a config profile the server load (using TOP) spikes to 50 or more and the JSS stops responding. I have to wait until the load drops to under 20 and then things get back to normal.

Have NOT seen the JSS just crash though.

+15

thoule
Contributor
Forum|Forum|9 years ago
October 24, 2016

My windows JSS's (two behind LB) started going down 2-3x a week (tomcat hanging), starting a couple weeks ago. Just upgraded to 9.96 and changed some memory settings last week and its no better. Our windows admin is started some process trace work to figure out why. Is everyone having issues on Windows? I'm wondering if it was a MS patch causing the problems... I'm working on a plan to convert these boxes to linux...

mml7
Contributor
Forum|Forum|9 years ago
October 24, 2016

Hey Todd- Unfortunately, it's not just a Windows issue. We started having crashing issues with 9.96 on RHEL 6. The issues didn't appear in test, but our test environment doesn't have the same load as production and our test environment isn't (yet) clustered. We had tried increasing the thread pools per JAMF Support, but it didn't resolve anything. In the end, disabling patch reporting brought our JSS environment back to a stable environment (knock, knock)

+20

bradtchapman
Valued Contributor
Forum|Forum|9 years ago
October 24, 2016

Would a memcache server help at all in this instance?

https://www.jamf.com/jamf-nation/articles/428/caching-configuration

https://www.jamf.com/jamf-nation/discussions/20755/memcached

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 24, 2016

jamf support told me not to touch memcache yet when I asked them.

They did just give me an edit on my my.ini file (added line table_open_cache=6000) - after adding and restarting mysql and all my tomcat instances - I was able to send out a MDM command and my servers didn't crash.

First time since going to 9.96, I think. I'd check with your TAMs, but maybe this could help some of you.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
October 24, 2016

Last week this started to occur again on Mon/Tue when I was at JNUC, and nothing after that until today. Different webapps have been bouncing since this morning through out the day.

I'm hoping 9.7 will take care of this issue.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
October 25, 2016

Back at the bouncing web apps this morning again. Rebooted them all from being down all night and one has already went down. Now, when I say it went down..it's still up, but unable to communicate across the network at all. Can't ping it, can't ssh into it..all you can do is power cycle it...

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
October 25, 2016

one thing I have noticed about these incidents, is that often happen after the weekend primarily on Mon/Tues..fwiw

+10

m_donovan
Valued Contributor
Forum|Forum|9 years ago
October 25, 2016

I am having the same issues as most everyone else on this thread. I have 5 JSS's 4 behind LB all windows server 2008 VMs running 16G ram and 4 cores each. My database server is a physical machine running 16G ram 16 cores and windows server 08. I have worked with my TAM to adjust threads, connections, pools and anything else we can think of with no success. We setup a memcaching server and converted the database engine from MyIsam to InnoDB for all tables except 2 and still no significant improvement. I am about at my wits end with this problem. I would just revert back to 9.92 but we needed 9.96 for the iOS 10 support.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
October 25, 2016

was advised today by my STAM that it's possible java 8u101 is a commonality in these issues and to update to the latest version which is 8u111. I finished that up a bit ago, and will be monitoring it closely.

+10

m_donovan
Valued Contributor
Forum|Forum|9 years ago
October 25, 2016

I am running java 8u91 I will look into going to 8u111.

+11

mahughe
Author
Contributor
Forum|Forum|9 years ago
October 26, 2016

It appeared that the update to java was going to work but then all of the webapps eventually crashed again.

Back to the drawing board...

+17

CasperSally
Honored Contributor
Forum|Forum|9 years ago
October 26, 2016

@mahughe @m.donovan have you tried table_open_cache setting in my.ini? I'd check with your TAM first but we've been up ever since doing this.

Forum|pagination.label 2 / 4

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded