Re-calculate a checksum for monit

Today I changed one init script that was being checked by monit by a simple rule:

check file tomcat_bin with path /usr/sbin/tomcat6
  if failed checksum then unmonitor

The documentation isn’t very clear about how to regenerate the checksums after a legit change. You must either reload monit conf (with recent versions of monit):

# monit reload

or restart monit:

# service monit restart

you may need to put the resource under monitor again:

# monit monitor tomcat_bin

Stuff I’m gonna look into: Graphite and collectd

In the eternal struggle against crappy monitoring and alerting systems, my next step will be to check collectd and Graphite.

The post that sparked my interest is: Graphite alerts with Monit. I will probably avoid Monit since it’s not flexible enough for what I have in mind, but it’s a good read.

Why Collectd? Because I read that it works with Graphite and already provides lots of stats (CPU and mem usage, MySQL server stats, etc).

Why Graphite? Because it provides a very easy way to store metrics. Basically, all you need to do is send a plain text message to Carbon (a component of Graphite) with the metric name, a numerical value and a UNIX epoch timestamp:

PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc ${SERVER} ${PORT};

Collect your jaw from the floor and keep reading: not only Graphite provides a web interface that allows you to easily chart your metrics, you can even retrieve the data for arbitrary timesets in structured formats! Wanna see the data for a specific metric in the last few minutes? Send a request like:

http://graphite-server/render?target=your.metric.name&from=-5m&format=json

And you’ll get the data in json format:

[{
  "your.metric.name": "entries",
  "datapoints": [
    [1.0, 1311836008],
    [2.0, 1311836009],
    [3.0, 1311836010],
    [5.0, 1311836011],
    [6.0, 1311836012]
  ]
}]

It even works with wildcard tokens, so if all your mysql servers metrics are stored with the mysql word in their name, you can run chart or retrieve the data asking for:

http://graphite-server/render?target=your.metric.*.mysql.*&from=-5m&format=json

Graphite also provides powerful aggregation functions that you can use on your data.

If you, like me, are now interested in Graphite and Collectd, check these links: