How to display IOwait percentage in Prometheus

Prometheus has a few quirks, dealing with cpu time is one of these. this article explains how to deal with cpu time, and these are the rules I made for my own Prometheus/Grafana dashboard:

avg by (instance) (irate(node_cpu{mode="iowait"}[1m])) * 100

this rule groups by instance the iowait average for the system (all cpus)

avg by (instance) (irate(instance=~"hostname.*", node_cpu{mode="iowait"}[1m])) * 100

while this rule is like the one above, with the difference that you can filter which systems are reported, by hostname

hopefully this will be useful for someone out there :)

Stuff I’m gonna look into: Graphite and collectd

In the eternal struggle against crappy monitoring and alerting systems, my next step will be to check collectd and Graphite.

The post that sparked my interest is: Graphite alerts with Monit. I will probably avoid Monit since it’s not flexible enough for what I have in mind, but it’s a good read.

Why Collectd? Because I read that it works with Graphite and already provides lots of stats (CPU and mem usage, MySQL server stats, etc).

Why Graphite? Because it provides a very easy way to store metrics. Basically, all you need to do is send a plain text message to Carbon (a component of Graphite) with the metric name, a numerical value and a UNIX epoch timestamp:

PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc ${SERVER} ${PORT};

Collect your jaw from the floor and keep reading: not only Graphite provides a web interface that allows you to easily chart your metrics, you can even retrieve the data for arbitrary timesets in structured formats! Wanna see the data for a specific metric in the last few minutes? Send a request like:

http://graphite-server/render?target=your.metric.name&from=-5m&format=json

And you’ll get the data in json format:

[{
  "your.metric.name": "entries",
  "datapoints": [
    [1.0, 1311836008],
    [2.0, 1311836009],
    [3.0, 1311836010],
    [5.0, 1311836011],
    [6.0, 1311836012]
  ]
}]

It even works with wildcard tokens, so if all your mysql servers metrics are stored with the mysql word in their name, you can run chart or retrieve the data asking for:

http://graphite-server/render?target=your.metric.*.mysql.*&from=-5m&format=json

Graphite also provides powerful aggregation functions that you can use on your data.

If you, like me, are now interested in Graphite and Collectd, check these links:

Install a Sensu monitoring server

Sensu is a “monitoring framework”: it manages timed execution of checks, handlers for the data he gets from the agents and communication (on a RabbitMQ layer) between the agents and the server.

Since I found quite hard to follow Sensu’s official documentation, I took some time to write down the procedure I followed to get Sensu working on my setup (Ubuntu 12.04 Server on the server and the clients).

Pre-requirements: SSL certificates

We want the agents to transfer data to the server over a secure channel, so we’ll be configuring RabbitMQ with SSL support. We need to get server and client (self-signed) certificates. We’ll create a Certification Authority and the certs using the script provided by Joe Miller (for more info and better understanding, before deploying in production you should really check the RabbitMQ documentation about SSL):

$ git clone git://github.com/joemiller/joemiller.me-intro-to-sensu.git
$ cd joemiller.me-intro-to-sensu/
$ ./ssl_certs.sh clean
$ ./ssl_certs.sh generate

This procedure will generate several items that we will need during the install:

  • server_cert.pem – we will copy this to /etc/rabbitmq/ssl/ on the server
  • server_key.pem – as above
  • testca/cacert.pem – as above
  • client_cert.pem – we will copy this to /etc/sensu/ssl/ on the client
  • client_key.pem – as above

Common tasks (on both the server and the clients alike)

Add Sensu repos to Ubuntu:

$ sudo -i
password: 
# echo "deb  http://repos.sensuapp.org/apt  sensu  main" > /etc/apt/sources.list.d/sensu.list
# wget -q http://repos.sensuapp.org/apt/pubkey.gpg -O- | sudo apt-key add -
OK
# apt-get update

Install sensu “omnibus” package:

# apt-get install sensu

On the server

Add RabbitMQ repos to Ubuntu:

$ sudo -i
password: 
# echo "deb  http://www.rabbitmq.com/debian/  testing  main" >/etc/apt/sources.list.d/rabbitmq.list
# wget -q http://www.rabbitmq.com/rabbitmq-signing-key-public.asc -O- | sudo apt-key add -
OK
# apt-get update

Install Erlang and RabbitMQ:

# apt-get -y install erlang-nox
(...)
# apt-get -y --allow-unauthenticated --force-yes install rabbitmq-server
(...)

Copy the server certificates (go back to ~/joemiller.me-intro-to-sensu/):

# cd joemiller.me-intro-to-sensu/
# mkdir -p /etc/rabbitmq/ssl
# cp server_key.pem server_cert.pem testca/cacert.pem /etc/rabbitmq/ssl/

Create RabbitMQ configuration file /etc/rabbitmq/rabbitmq.config:

[
    {rabbit, [
    {ssl_listeners, [5671]},
    {ssl_options, [{cacertfile,"/etc/rabbitmq/ssl/cacert.pem"},
                   {certfile,"/etc/rabbitmq/ssl/server_cert.pem"},
                   {keyfile,"/etc/rabbitmq/ssl/server_key.pem"},
                   {verify,verify_peer},
                   {fail_if_no_peer_cert,true}]}
  ]}
].

Setup RabbitMQ to start at boot and make sure it is started now:

# update-rc.d rabbitmq-server defaults
# service rabbitmq-server restart
(...)

Create a vhost and a user in RabbitMQ for Sensu:

# rabbitmqctl add_vhost /sensu
# rabbitmqctl add_user sensu YouRseCurE.PassWord
# rabbitmqctl set_permissions -p /sensu sensu ".*" ".*" ".*"

Install Redis:

# apt-get -y install redis-server
(...)

Setup Redis to start at boot and make sure it is started now:

# update-rc.d redis-server defaults
# service redis-server restart
(...)

Copy *client* SSL certificates (since Sensu will be a RabbitMQ client):

# cd joemiller.me-intro-to-sensu/
# mkdir -p /etc/sensu/ssl
# cp client_key.pem client_cert.pem ì/etc/sensu/ssl/

Create Sensu server config file /etc/sensu/config.json:

{
  "rabbitmq": {
    "ssl": {
      "private_key_file": "/etc/sensu/ssl/client_key.pem",
      "cert_chain_file": "/etc/sensu/ssl/client_cert.pem"
    },
    "host": "localhost",
    "port": 5671,
    "user": "sensu",
    "password": "YouRseCurE.PassWord",
    "vhost": "/sensu"
  },
  "redis": {
    "host": "localhost",
    "port": 6379
  },
  "api": {
    "host": "localhost",
    "port": 4567
  },
  "dashboard": {
    "port": 8080,
    "user": "admin",
    "password": "AnoTheRsecURe.passWORD"
  }
}

Setup Sensu services to start at boot and make sure they are started now:

# for srv in sensu-server sensu-api sensu-client sensu-dashboard; do 
    update-rc.d $srv defaults
    service $srv restart
  done
(...)

Check /var/log/sensu/sensu-server.log to make sure everything is working nicely.

On the clients

You may need to copy SSL certs from the server, then install them like we already did on the server:

# cd joemiller.me-intro-to-sensu/
# mkdir -p /etc/sensu/ssl
# cp client_key.pem client_cert.pem ì/etc/sensu/ssl/

Create Sensu config file /etc/sensu/config.json (notice how we changed the host parameter):

{
  "rabbitmq": {
    "ssl": {
      "private_key_file": "/etc/sensu/ssl/client_key.pem",
      "cert_chain_file": "/etc/sensu/ssl/client_cert.pem"
    },
    "host": "your-server-hostname",
    "port": 5671,
    "user": "sensu",
    "password": "YouRseCurE.PassWord",
    "vhost": "/sensu"
  }
}

Also create a client config file (eg. /etc/sensu/conf.d/client.json):

{
  "client": {
    "name": "sensu-client01",
    "address": "192.168.1.116",
    "subscriptions": [ "test" ]
  }
}

Setup Sensu client to start at boot and make sure it is started now:

# update-rc.d sensu-client defaults
# service sensu-client restart
(...)

Check /var/log/sensu/sensu-client.log to make sure everything is working nicely.

Now that everything is working, you can go back to the official wiki and learn how to add a check and add a handler.