Posted on 06-25-2014 04:00 PM
So when using SNMP, what do you do with the info? Well, a lot of that depends on the monitoring software you use, and if you use good software, you can get some really handy results from the same number, just by using it differently.
For example take printers. Please. And then throw them in the ocean.
Since we can't do that, one of the things we end up caring about a lot is page counts and usage. How many pages has that printer printed, and when are the "busy times" for a printer. Both are necessary for good printer monitoring. The first you want to know because if you're leasing, often you're paying based on page counts. Having your own monitoring utility is just good practice. Also, honestly, most of the Utilities you get from printer manufacturers are pants. They're overblown, kludgy, and a pain to use. The sad thing is, most times, they're just using SNMP as well.
So let's have some fun. Pretend that like me, you have Konica Minolta bizhubs. Okay, don't pretend too hard, there's a lot of pain in that, but that's what I have. You want to know not just how many pages the printer is printing, but when is it busy. When is it printing the most, when is it not busy, etc.
While getting the former is pretty easy, getting the latter can be tricky, unless you know the difference between a counter and a gauge, in SNMP terms. The technical definitions are:
Counter
The Counter32 type represents a non-negative integer which monotonically increases until it reaches a maximum value of 2^32-1 (4294967295 decimal), when it wraps around and starts increasing again from zero.
Gauge
The Gauge32 type represents a non-negative integer, which may increase or decrease, but shall never exceed a maximum value. The maximum value can not be greater than 2^32-1 (4294967295 decimal). The value of a Gauge has its maximum value whenever the information being modeled is greater or equal to that maximum value; if the information being modeled subsequently decreases below the maximum value, the Gauge also decreases.
Wow, that's really helpful, and almost english. Here's a faster, reasonably accurate mental model. A counter is like an odometer. It only counts one way, it retains its value over time, unless that value is incremented, and when it hits its max, it goes back to zero.
A gauge is more like a tachometer or a speedometer. It's telling you what's going on, at that moment. What happened five minutes ago has no inherent relationship to what's happening now, or what will be happening in five minutes. It's a point in time measurement.
So, what happens if we take the same number, total page count, and treat it as both a counter and a gauge?
You get this:
A perfectly backwards-labeled example. (really, the gauge is the counter and the counter is the gauge, and I shouldn't set up demos in a hurry. Derp.)
But if we look at the green, really-is-a-counter image, we get about what you'd expect. the number never goes down, and only goes up. If a LOT of pages had been printed today, you'd see a more visible incline.
Here, some detail:
Ignore the units for now. Cacti assumes everything is packets, and if you don't really tweak the graph, things can look odd. I haven't been monitoring this particular OID long, so there's not a lot of data. But over time, that will increase until it wraps, (and given SNMPv2c allows for 64-bit counters, that could take a while), and then restart at zero.
The gauge however, is showing us when the printer is busiest:
In most companies, you'll see the same kind of thing...nothing until working hours, then bursts of activity. However, what this can do perhaps better than the counter is illustrate how busy that printer really is. If working hours are a steady blue bar in the gauge, and the counter is a steady slope, you know you have a printer that's really being worked. Maybe time to check your load balancing or get another printer, or remind people to stop killing trees.
If you suddenly see bursts of activity during hours when no one should be around, that might be a sign of other things you need to check in to.
The point is, that if you play around a bit with how you use a given number from an OID, you can get a lot of useful information out of that same piece of data, just by displaying it differently.