Some simple statistics gathering for systems monitoring, graphicalized for your pleasure! Everyone likes graphics and they do help to make trends visible quickly. This short explanation should give some starting points and references on using Tobi Oetiker’s fine rrdtool to store and later display the data in question.

RRDTool is basically a way to store some recurring data, summarizing it over time, and keeping the data collection from growing beyond a known set initial bound. This is useful for it’s original intent: measuring router statistics over time, making nice graphs to show you in/out byte counts from your router interfaces. It has been extended by it’s users in several ways since it’s inception, mostly around the ability to graph other measured datapoints. One set of these datapoints could be, and is for this discussion, system statistics like network connection and connection-type data.

Looking at the sample graph of selected network statistics for this machine we can easily see trends and spikes in the data over time.

This may show us interesting usage information, or help capacity planning efforts in the future. It’s trivial to collect the data required to produce this graph, and even more trivial to store it over time. Storage is done in the normal RRD RRA file which is created with the following rrdtool create command:

rrdtool create /location/to/store/data/network.rra \
--start now --step 300 \
DS:established:GAUGE:600:U:U \
DS:smtp:GAUGE:600:U:U \
DS:mysql:GAUGE:600:U:U \
DS:http:GAUGE:600:U:U \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \

The best explanation of this is really found at the manpage location for RRDTool’s Create function.

This creates a single datastore for our data, the data we want to collect are the values for:

  1. Established TCP sessions
  2. SMTP sessions
  3. MySQL sessions
  4. HTTP sessions

(obviously we could store more, just create more ‘DS’ (data source) entries)

This will store samples in the following reduced formats:

  1. 600 - 5 minute samples (50 hours)
  2. 700 - averages of 6 five minute samples (average 6 individual samples together, 14 days)
  3. 775 - averages of 24 five minute samples (average 24 individual samples together, 64 days)
  4. 797 - averages of 288 five minute samples (average 288 individual samples together, 797 days)

This is all maintained in a 100k file in the end, very efficient and very easy to insert data into or pull data out of when required.

Storing the data is simple as has been shown, adding data to the store is also equally simple, a short shell script like:

# find out the following:
# established network connections
# mysql network connections
# smtp network connections
# http network connections
# required in following order:
# est, smtp, mysql, http
TIME=`perl -e ‘print time’`
EST=`netstat -an | grep ESTAB | wc -l`
SMTP=`netstat -an | grep ESTAB | grep “:25” | wc -l`
MYSQL=`netstat -an | grep ESTAB | grep “:3306” | wc -l`
HTTP=`netstat -an | grep ESTAB | grep “:80” | wc -l`

${RRDTOOL} update ${RRDFILE} ${TIME}:${EST}:${SMTP}:${MYSQL}:${HTTP}

finds all the data points we need each time it runs. Suggested running method is, of course, cron:

*/5 * * * * /script/location/ > /tmp/network_collection.log 2>&1

So, with the RRA and the collection script we are collecting data, we should graph it to make people happy, and get to the point of this post. Graphing can be pushed through some web-cgi interface, but why do that when you can just shove some script through cron and make graphs each 5 minutes? That’s what i thought too! A sample script to make a single graph is:

# Graph daily (for now) data on amazon stats
sleep 30
TIME=`perl -e ‘$t=localtime; print $t’`

for d in established smtp mysql http ; do
# Daily
${RRDTOOL} graph ${RRDGRAF}-${d}-day.png --start -86400 \
--title “Daily Network Stats - ${d} - at ${TIME}” \
DEF:${d}=${RRDFILE}:${d}:AVERAGE \
COMMENT:” ${d} Min Max Last Average\n” \
AREA:${d}#0000FF:"${d} “ \
GPRINT:${d}:MIN:"%03.2lf “ \
GPRINT:${d}:MAX:"%03.2lf “ \
GPRINT:${d}:LAST:"%03.2lf “ \
GPRINT:${d}:AVERAGE:"%03.2lf “

This will make daily graphs of each DS in the RRA and output them somewhere we can view them (hopefully via some web service). The basics in this rrdtool graph command are:

  1. create a png file, starting ‘now’ minus 86400 seconds (1 day ago)
  2. make a title on the image
  3. have an explanation for the units/numbers
  4. setup an area graph (not a line, a filled area) for the DS
  5. print out the min/max/last/average of the graphed points

This gives us a sample like:

Pretty eh? To make the more complex stacked image in the top of this post something like:

${RRDTOOL} graph ${RRDGRAF}-stack.png --start -86400 \
--title “Daily Combined Stats - at ${TIME}” \
DEF:established=${RRDFILE}:established:AVERAGE \
DEF:smtp=${RRDFILE}:smtp:AVERAGE \
DEF:mysql=${RRDFILE}:mysql:AVERAGE \
DEF:http=${RRDFILE}:http:AVERAGE \
COMMENT:” Stat Min Max Last Average\n” \
AREA:established#00FF00:"EST “:STACK \
GPRINT:established:MIN:"%03.2lf “ \
GPRINT:established:MAX:"%03.2lf “ \
GPRINT:established:LAST:"%03.2lf “ \
GPRINT:established:AVERAGE:"%03.2lf \n” \
AREA:smtp#FF0000:"SMTP “:STACK \
GPRINT:smtp:MIN:"%03.2lf “ \
GPRINT:smtp:MAX:"%03.2lf “ \
GPRINT:smtp:LAST:"%03.2lf “ \
GPRINT:smtp:AVERAGE:"%03.2lf \n” \
AREA:mysql#0000FF:"MySQL “:STACK \
GPRINT:mysql:MIN:"%03.2lf “ \
GPRINT:mysql:MAX:"%03.2lf “ \
GPRINT:mysql:LAST:"%03.2lf “ \
GPRINT:mysql:AVERAGE:"%03.2lf \n” \
GPRINT:http:MAX:"%03.2lf “ \
GPRINT:http:LAST:"%03.2lf “ \
GPRINT:http:AVERAGE:"%03.2lf “ \
GPRINT:http:MIN:"%03.2lf “

would be used. This just maps each DEF (DS) into a single colored area and overlays them on the same image. Hopefully these will be used to some effect by you, hopefully this is interesting to you. You could also sample the number of processes of a certain name:

  1. httpd (ps -ef | grep http | grep -v grep | wc -l )
  2. smtp (ps -ef | grep smtp | grep -v grep | wc -l )
  3. sh (ps -ef | grep sh | grep -v grep | wc -l )

and graph those over time. Graph the number of running processes over time (ps -ef | grep -v UID | wc -l)... the possibilities are endless. Graph the input/output traffic volume for your interface:

netstat -i | grep eth | grep -v no | awk ‘{print $4 “ “ $8}’
2880756 2732226

go forth and graph things of interest.