JAMF Pro Windows Server is spiking at 100% CPU utilisation constantly

JAMFNoob
New Contributor III

I've noticed the only thing that helps is to restart it and it gets back to normal for a day or so before spiking again. Mysql.exe is the culprit and when I had a look at the server logs while it was happening I found - 

2022-09-01 12:18:32,527 [ERROR] [Thread-822 ] [GetBootstrapToken ] - Computer ComputerShell [ID=131, Name=ELM069] requested Bootstrap token but no token was found
2022-09-01 12:20:45,479 [INFO ] [duledPool-0] [rentProfileCleanupMonitor] - Running parent profile cleanup.
2022-09-01 12:20:46,156 [INFO ] [duledPool-4] [CsaTokenServiceImpl ] - (Token does not exist) No CSA enablement auth code found, so no additional processing is required
2022-09-01 12:23:03,562 [ERROR] [Thread-827 ] [GetBootstrapToken ] - Computer ComputerShell [ID=12, Name=eggs’s iMac (23)] requested Bootstrap token but no token was found
2022-09-01 12:24:17,443 [ERROR] [Thread-827 ] [GetBootstrapToken ] - Computer ComputerShell [ID=82, Name=EOMB010] requested Bootstrap token but no token was found
2022-09-01 12:26:36,056 [ERROR] [eralPool-20] [ApnsFeedbackConnection ] - IOException getting and entering feedback data:
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:350)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:293)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:288)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422)
at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:171)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1408)
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1314)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440)
at java.base/sun.security.ssl.SSLSocketImpl.ensureNegotiated(SSLSocketImpl.java:819)
at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:910)
at java.base/java.io.InputStream.read(InputStream.java:205)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765)
at com.jamfsoftware.jss.pushnotification.connection.ApnsFeedbackConnection.getFeedbackData(ApnsFeedbackConnection.java:34)
at com.jamfsoftware.jss.pushnotification.connection.ApnsFeedbackConnection.run(ApnsFeedbackConnection.java:88)
at org.springframework.security.concurrent.DelegatingSecurityContextRunnable.run(DelegatingSecurityContextRunnable.java:84)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
at java.base/sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:369)
at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:275)
at java.base/sun.security.validator.Validator.validate(Validator.java:264)
at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:222)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129)
at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638)
... 26 more
Caused by: java.security.cert.CertPathValidatorException: validity check failed
at java.base/sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:135)
at java.base/sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:237)
at java.base/sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:145)
at java.base/sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:84)
at java.base/java.security.cert.CertPathValidator.validate(CertPathValidator.java:309)
at java.base/sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:364)
... 32 more
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Fri Oct 01 06:29:19 NZDT 2021
at java.base/sun.security.x509.CertificateValidity.valid(CertificateValidity.java:277)
at java.base/sun.security.x509.X509CertImpl.checkValidity(X509CertImpl.java:675)
at java.base/sun.security.provider.certpath.BasicChecker.verifyValidity(BasicChecker.java:190)
at java.base/sun.security.provider.certpath.BasicChecker.check(BasicChecker.java:144)
at java.base/sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125)
... 37 more

 

This is my first time managing an app running off MySQL - can anyone help me understand what is going wrong and what I can do to fix it?

 

Thanks

13 REPLIES 13

mstydel
Contributor

We've been dealing with this for over a year now, ever since installing Jamf Pro 10.26 I believe it was.  Mysql.exe will peg the server to 100% CPU usage to the point where the server itself is barely usable even just to check logs and stuff. I worked with support off and on for a month or two and the only conclusion we could come to was too many devices for our single server, but still seems odd that it was fine until 10.26 suddenly.  I've tweaked the mySQL and Tomcat settings quite a bit after hours and hours of research and we've given the VM it runs on some more ram and CPU cores to get it somewhat okay now but we still have a very slow performance frequently and sometimes it grinds to a halt and we have no choice but to stop and restart right then and there, that and our settings are way beyond what support has recommended in the past.   So we restart frequently and I always check the new version release notes hoping to find something about performance issues being fixed.  

Also we've found that opening a second tab or more (such as opening multiple devices in new tabs) will completely halt everything for about 30 minutes or until we restart.  Haven't timed it exactly to see if it has something to do with the login session timing out.  That has also been somewhat annoying as there are times when it's just second nature to open stuff in separate tabs.

JAMFNoob
New Contributor III

Dam I really hope this gets resolved in a future update! Ours is barely useable as is...how much RAM and CPU cores have you assigned to get it to a functional state if you don't mind me asking?

HI, i also would be interesed in the amount of RAM and CPU cores / there specs you are using.
I remember that in some of the presentations about load balancing etc. they more or less talked about it.
Most of the time it was like 1 instance behind the Load balancer per 1000 up to 3000 devices.
Or as an alternative just giving the Instance more power until the drive will be the bottleneck xD

mstydel
Contributor

Currently we're at 28gb of ram and 6 CPU cores.  Originally (probably way back to the beginning of using a Windows server instead of an Xserve) we were at 2 cores and the VM was set to just adjust ram dynamically to what was needed (usually settled around 12gb total) when we started having the 100% cpu usage issue.  We added 2 cores first and set it to a dedicated 18 or 20gb of ram to see if that helped, which it seemed to.  Then recently went up to 28gb/6 cores.  Since we've started upping the server config, our Tomcat settings are 2gb min and 12gb max for ram (seems to use all of it no matter what it's set to, or else that's our bottleneck right there), and threadpool is 752.  Now our mySQL settings are up to 512mb max packet size, 301 max connections, 256mb key buffer, and 8gb buffer pool size.  I did a lot of research to come up with those numbers as far as what is too little/default/not enough for a large number of devices on 1 server, what formulas to use to calculate what they should be, etc.  We have 5150+ devices on one server, which is waaayy too many for one but these settings and "hardware" upgrades have helped quite a bit.  Originally we didn't have as many devices on this original configuration of a single server but we've since gone basically 1:1 with devices since and have added a lot more.  I wouldn't recommend these settings to anyone though, they could be completely way over the top.  We do still get the 100% cpu usage probably once a week or more, depending on how much activity there is, but that didn't used to happen with ~5000 devices on one server with 2 cpu cores/~12gb of ram until a little over a year ago.  What really doesn't make sense is when mySQL is pushing the server to 99-100% CPU, I can run a show full process list command and theres nothing going on during that.

foobarfoo
Contributor

Currently 16 cores, 32GB RAM. Approx 50k devices. Load is high at times, so we might bump it to 32 cores soon. But yes, JAMF code is quite inefficient so you need lots of horsepower to run it. For those that are more DB oriented, check the queries that JAMF does, and you will see that the MySQL query optimizer will often opt to use a full table scan even if there are indexes that can be used. If we execute the same query with FORCE INDEX, they complete at a fraction of the time compared when a full table scan is used. But this might be more of a pointer to the JAMF devs..

 

But to answer the original question, in general there's not so much you can do if you have enough I/O. Just ensure that MySQL is allowed to use approx 75% of the available RAM, that the disk system can handle the load, and you should be good to go. :)

When you say MySQL is allowed ~75% of ram you mean the buffer pool size, correct?

Yes, on our 32GB DB server, we have the following config:

innodb_buffer_pool_size = 24G
innodb_file_per_table = ON

mstydel
Contributor

We just updated to 10.41, seems like we're constantly hitting 100% CPU and crawling server speeds now.  Wonderful.

McLeanSchool
New Contributor III

We had this problem and after lots of back and forth with support we found the issue.  Here's a copy of the email with the solution so you can reference it when you contact their support, but I would NOT try to run the command without talking to Jamf support since everyone's environment is different:

 

Thanks for following up! My name is Ben and I am a Senior Technical Support Engineer with Jamf - I'm helping out Kirsten with this case today.

Thanks for sending that screenshot in, that helped us narrow it down right away. This is almost certainly an issue with the apple_school_manager_sync_records table. There is an open known issue that explains this behavior. When the count on this table gets exceedingly high, SQL selects against it are not performant and cause high CPU utilization. This table is not flushed over time, and since it has been ~2 years since the table was implemented, we are seeing issues with databases that have high row counts in this table. We are fixing this by implementing a rolling flush on the table, which will be pushed in a future Jamf Pro release. For now, we have to flush this table and its normalizations once we exceed ~10k rows.

So all that to say, what I'd like to do here is run the workaround for that issue and see if MySQL CPU utilization ever spikes like this again over the next day. You are well above the ~10k records in this table. So, here is the approved workaround, please run through this, SQL commands are escaped by backticks (`) in the below:

1. Stop Tomcat

2. Backup the database

3. Run the following query and save the total count for reference:
`SELECT count(*) FROM apple_school_manager_sync_records;`

4. Run the following DELETE query:
`DELETE FROM apple_school_manager_sync_records WHERE
sync_id NOT IN (SELECT max(sync_id) FROM apple_school_manager_sync_records) AND
sync_id NOT IN (SELECT sync_id FROM apple_school_manager_active_syncs) AND
sync_id NOT IN (SELECT sync_id FROM apple_school_manager_sync_change_requests);`

5. Once complete, repeat step 3 and verify the count has decreased

6. Start Tomcat

Then let us know if you see the same behavior occurring later on. If so, grab a fresh screenshot of the processlist so we can see what's running at that time.

 

We heard back from the Sustaining team who reviewed this query, and they suggested we include a GROUP by in order to account for multiple ASM instances. So the newly approved query for this would be the following:

DELETE
FROM apple_school_manager_sync_records
WHERE sync_id NOT IN (SELECT * FROM (SELECT MAX(sync_id) FROM apple_school_manager_sync_records GROUP BY instance_id) AS max_sync_id_per_instance)
AND sync_id NOT IN (SELECT sync_id FROM apple_school_manager_active_syncs)
AND sync_id NOT IN (SELECT sync_id FROM apple_school_manager_sync_change_requests);

Let us know how things are going, thanks!

We see a lot of Apple School Manager entries in our "show processlist;" command for the database that always seem to be there.  In an effort to stop that, thinking that was the issue, we've disabled most of the Apple School Manager stuff as we don't do managed Apple ID's or anything.  I will open a ticket with them and see if that's our solution as well.

Just ran the DELETE command after talking with support, they didn't have me change anything.  Our mysqld.exe process is the lowest I've ever seen it, that I can recall.  It's idling around .5-1%, previously 30% was doing pretty darn good.  The overall CPU usage on the server hovers around 10-20% now.  Hopefully this was solution!

jhunter
New Contributor II

Ran the DELETE command also after contacting support.  They said I was affected by PI110051.  So maybe reference that number if you need to open a case.  The mysql.exe is now taking very little cpu compared to what it was using before.