Monitor users RAM usage?

Chris_Hafner
Valued Contributor II

Perhaps this is an odd question, perhaps not. Is anyone using anything to monitor their users RAM utilization? I'd like to be able to simply track the users who are constantly hitting the limit of their RAM. Thoughts? One could use TOP of course, but I don't like that for several reasons (mostly overhead). What I'm really interested in is measuring how many times a user runs out of physical RAM and moves heavily over to swap. This seems like a very very tricky thing to sort out in my head. I'm hoping that someone has a nice simple answer because I'm over thinking it!

1 ACCEPTED SOLUTION

ChrisL
New Contributor III

Hi Chris,

We don't use an EA, or even the JSS to track this - the data is kept in another, homegrown database that we've had for years. Our older tool has some overlapping function with the JSS, and where possible I've tried to move such data into the JSS, but some of it is tricky.

We have the client machines take readings of pageouts, cpu load, and cpu usage every 10 minutes. The samples (like the output of the last line of my script above) are appended to a file on the local drive, then 4x/day they are submitted to the central server, where each sample becomes a record in a table. (well, three tables, one for pageouts, one for load, and one for usage). From that granular data, we can run reports to find the data for a machine, or find the machines that are eating all their RAM all the time, or just focus on certain times of the day.

Since EAs are only gathered when a full recon is run, you'd either have to run recons every few minutes to get a reasonable granularity (not really an option), or you could have some other process running that appends samples to a file, then the EA could get the file contents into the JSS. But then you'd have to have other code that parses the EA contents into individual samples. And you might have to have special code to find and merge samples from other recons on the same computer.

As you've been thinking, you can also just have something on the client machines that gathers the samples, then the EA computes a daily average and stores it. That might be the best option for getting something usable quickly.

I know several people have asked to be able to gather EAs separately from a full recon, and I submitted a feature request asking for better options for EA collection (https://jamfnation.jamfsoftware.com/featureRequest.html?id=686) Both of those would help with gathering this kind of data into the JSS.

Good Luck with it!
-Chris

View solution in original post

11 REPLIES 11

ChrisL
New Contributor III

We monitor RAM usage like that. The simplest way to know that swapping is happening is to look at the total number of pageouts in a given period of time.

If you look at the output or vm_stat, you'll see the total pageouts since the last boot.

If you just want something to trigger when any pageouts happen, just get that number every few minutes, and if it goes up from the last time, take some action.

If you keep a history of that number every few minutes, you can use the data to find times when it increases rapidly, and perhaps isolate patterns of memory over-use.

The historical data can be very useful when a user calls up saying "I need more RAM" :-)

-Chris

jhbush
Valued Contributor II

This may be of some help...

#!/bin/sh
pageIns=`/usr/bin/vm_stat | /usr/bin/grep -i pageins | /usr/bin/awk '{print $NF}' | /usr/bin/sed 's/.//'`
pageIns="<page_ins>$pageIns</page_ins>"
pageOuts=`/usr/bin/vm_stat | /usr/bin/grep -i pageouts | /usr/bin/awk '{print $NF}' | /usr/bin/sed 's/.//'`
pageOuts="<page_outs>$pageOuts</page_outs>"
echo "<result><stats>$pageIns$pageOuts</stats></result>"

Chris_Hafner
Valued Contributor II

Fantastic! What do you use as a benchmark for pageouts/timeframe and what method do you use to monitor that? I was thinking about writing an EA to track it but havent decided on where to store the counts... I'd be really interested to see what you're doing if you don't mind sharing?

Chris_Hafner
Valued Contributor II

Thank you both! I'm trying that script out now.

ChrisL
New Contributor III

Here's basically what we do - its part of a much larger data-gathering script that runs every 10 minutes. The data eventually ends up in a central database (currently not the JSS, but I'm working slowly on getting it integrated :-)

#!/bin/bash

# where to remember the data for next time
LAST_POUT_FILE=/var/log/last_pageout_check

# timestamp of the last boot
LAST_BOOT=`sysctl -n  kern.boottime | sed -E 's/^.*sec = (.*),.*/1/'`

# when are we?
NOW=`date +%s`

# here's the number of pageouts since boottime
CURR_POUTS=`vm_stat | grep -i pageouts | sed -E 's/[^[:digit:]]*([[:digit:]]+).*/1/'`

# if the last pageout file exists and is newer than the last reboot
if [ -f "$LAST_POUT_FILE" ] && [ `stat -f '%m' "$LAST_POUT_FILE"` -gt $LAST_BOOT ] ; then

    # use the file and note the interval (in seconds) since it was recorded
    LAST_POUTS=$( < "$LAST_POUT_FILE" ) 
    POUTS_INTERVAL=$(( $NOW - `stat -f '%m' "$LAST_POUT_FILE"` ))

else
    # otherwise, the last count is zero, and the interval is seconds since bootime
    LAST_POUTS=0 
    POUTS_INTERVAL=$(( $NOW - $LAST_BOOT ))

fi 

# remember the number of current ones
echo $CURR_POUTS > "$LAST_POUT_FILE"

# the number of pageouts that happened during the interval
RECENT_POUTS=$(( $CURR_POUTS - $LAST_POUTS ))

# output comma-delimited data
echo "$NOW,$RECENT_POUTS,$POUTS_INTERVAL"

Chris_Hafner
Valued Contributor II

I've slightly modified 'jhbush's script to display only the pageout count as an EA

#!/bin/sh
pageOuts=/usr/bin/vm_stat | /usr/bin/grep -i pageouts | /usr/bin/awk '{print $NF}' | /usr/bin/sed 's/.//'
pageOuts="$pageOuts"
echo "<result>$pageOuts</result>"

However, I'm really interested in the script ChrisL has submitted here. I'm going to start playing around with that to see if I can't get it worked around as an EA. I just have to decide on the metric. This is some great info!

Chris_Hafner
Valued Contributor II

For anyone currently tracking pageouts... how do you measure frequency? By this I mean, Do you compile daily page out stats in a larger database? Do you average daily pageouts? I'd like to write up something as an EA, but am spending a lot of time trying to figure out how I'd like to measure this effectively. Thanks for all the help so far!

ChrisL
New Contributor III

Hi Chris,

We don't use an EA, or even the JSS to track this - the data is kept in another, homegrown database that we've had for years. Our older tool has some overlapping function with the JSS, and where possible I've tried to move such data into the JSS, but some of it is tricky.

We have the client machines take readings of pageouts, cpu load, and cpu usage every 10 minutes. The samples (like the output of the last line of my script above) are appended to a file on the local drive, then 4x/day they are submitted to the central server, where each sample becomes a record in a table. (well, three tables, one for pageouts, one for load, and one for usage). From that granular data, we can run reports to find the data for a machine, or find the machines that are eating all their RAM all the time, or just focus on certain times of the day.

Since EAs are only gathered when a full recon is run, you'd either have to run recons every few minutes to get a reasonable granularity (not really an option), or you could have some other process running that appends samples to a file, then the EA could get the file contents into the JSS. But then you'd have to have other code that parses the EA contents into individual samples. And you might have to have special code to find and merge samples from other recons on the same computer.

As you've been thinking, you can also just have something on the client machines that gathers the samples, then the EA computes a daily average and stores it. That might be the best option for getting something usable quickly.

I know several people have asked to be able to gather EAs separately from a full recon, and I submitted a feature request asking for better options for EA collection (https://jamfnation.jamfsoftware.com/featureRequest.html?id=686) Both of those would help with gathering this kind of data into the JSS.

Good Luck with it!
-Chris

Chris_Hafner
Valued Contributor II

Had my email notification off... in any event, this is pretty much how I figured that it would go. Thanks!

bearzooka
Contributor

Hey, reviving this old thread… I started collecting the pageouts with the script above, but I really don't know at what point we could consider that the device is "using too much RAM" or what level of pageouts is acceptable.

Could you provide any hints on this?

Thanks!

Chris_Hafner
Valued Contributor II

In our case we use high pageout over time numebrs as an indicator that we should at least talk to the user about their experience to make sure we have them in the proepr device. Also, given that we're MacBook Air/Pro heavy we have increased the RAM in the machines we're buying across the board. Ups the resale value quite a bit as well.

In reality, we find the users who have 30+ tabs open on three different browsers.