Disk I/O errors on Adaptec ASR8805 raid controller

We have an Adaptec ASR8805 controller on one of the servers we manage. For various reasons we need to shrink a logical volume that is sitting on a RAID 6 logical device created and exposed by this controller, but we can’t because we’re getting seek errors:

Buffer I/O error on device dm-2, logical block 3330419721
sd 6:0:1:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:1:0: [sdb]  Sense Key : Hardware Error [current] 
sd 6:0:1:0: [sdb]  Add. Sense: Internal target failure
sd 6:0:1:0: [sdb] CDB: Read(16): 88 00 00 00 00 06 34 11 69 00 00 00 01 00 00 00
end_request: critical target error, dev sdb, sector 26643360000
sd 6:0:1:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

From what the controller is reporting, the RAID 6 is healthy, and all the physical drives SMART information seems ok(ish).

It turns out, no background checking of the RAID 6 parity has been enabled, and that is probably the problem, as reported by this article.

To get a “quick” fix (it’s a 24T array), I started:

# arcconf task start 1 logicaldrive 1 verify_fix

when it’ll be finished, I’ll enable the background check with:

# arcconf consistencycheck 1 on

I really hope this saves time to some fellow admin out there :)

How to display IOwait percentage in Prometheus

Prometheus has a few quirks, dealing with cpu time is one of these. this article explains how to deal with cpu time, and these are the rules I made for my own Prometheus/Grafana dashboard:

avg by (instance) (irate(node_cpu{mode="iowait"}[1m])) * 100

this rule groups by instance the iowait average for the system (all cpus)

avg by (instance) (irate(instance=~"hostname.*", node_cpu{mode="iowait"}[1m])) * 100

while this rule is like the one above, with the difference that you can filter which systems are reported, by hostname

hopefully this will be useful for someone out there :)

How to run a Flask application in Docker

Flask is a nice web application framework for Python.

My example app.py looks like:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
  return 'Hello, World!'

According to Flask documentation, to run the application we need to run FLASK_APP=app.py flask run. So our Dockerfile will run this command and we’ll pass an environment variable with the application name when we start the container:

FROM python:3-onbuild
EXPOSE 5000
CMD [ "python", "-m", "flask", "run", "--host=0.0.0.0" ]

The --host=0.0.0.0 parameter is necessary so that we will be able to connect to flask from outside the docker container.

Using the -onbuild version of the Python container is handy because it will import a file named requirements.txt and install the Python modules listed in it, so go on and create this file in the same directory, containing the single line, flask.

Now we can build our container:

docker build -t flaskapp .

This might take a while. When it ends, we’ll be able to run the container, passing the FLASK_APP environment variable:

docker run -it --rm --name flaskapp \
  -v "$PWD":/usr/src/app -w /usr/src/app \
  -e LANG=C.UTF-8 -e FLASK_APP=app.py \
  -p 5000:5000 flaskapp

As you can see I’m mounting the local directory $PWD to /usr/src/app in the container and setting the work directory there. I’m also passing the -p 5000:5000 parameter so that the container tcp port 5000 is available by connecting to my host machine port 5000.

You can test your app with your browser or with curl:

$ curl http://127.0.0.1:5000/
Hello, World!

I hope this will be useful to someone out there, have fun! :)

How to fix Fedora 25 dnf upgrade “certificate expired” failure

After logging on a Fedora 25 system I didn’t log for a while, I ran dnf clean all ; dnf upgrade to update it, but I ran into this problem:

# dnf -vvvv -d 5 upgrade
cachedir: /var/cache/dnf
Loaded plugins: playground, builddep, config-manager, debuginfo-install, generate_completion_cache, needs-restarting, copr, protected_packages, noroot, download, Query, reposync
DNF version: 1.1.10
Cannot download 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f25&arch=x86_64': Cannot prepare internal mirrorlist: Curl error (60): Peer certificate cannot be authenticated with given CA certificates for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f25&arch=x86_64 [Peer's Certificate has expired.].
Errore: Failed to synchronize cache for repo 'updates'

The certificates expired and dnf refuses to work. To update the certificates, I simply installed with rpm the packages ca-certificates, p11-kit, p11-kit-trust, openssl and openssl-libs from the distro upgrades.

# rpm -Uvh http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/updates/25/x86_64/c/ca-certificates-2017.2.11-1.1.fc25.noarch.rpm \ http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/updates/25/x86_64/p/p11-kit-0.23.2-3.fc25.x86_64.rpm \ http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/updates/25/x86_64/p/p11-kit-trust-0.23.2-3.fc25.x86_64.rpm \
http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/updates/25/x86_64/o/openssl-1.0.2k-1.fc25.x86_64.rpm \ http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/updates/25/x86_64/o/openssl-libs-1.0.2k-1.fc25.x86_64.rpm
# dnf upgrade
Fedora 25 - x86_64 - Updates  
Fedora 25 - x86_64
Ultima verifica della scadenza dei metadati: 0:00:19 fa il Mon Apr 17 13:47:09 2017.
[...]

You might have to find the latest version by browsing around your favourite Fedora mirror (you can find the base url in /etc/yum.repos.d/fedora-updates.repo).

Some links worth reading:

PHP Sessions on MS Azure Redis service

I sure hope you’ll never end up facing this, but in case you do indeed have a PHP application on MS Azure, and you want to use MS Azure Redis service for the session backend, you’ll have to set your session.save_path to something like:

session.save_path='tcp://UNIQUENAME.redis.cache.windows.net:6379?auth=MSHATESYOU4FUNandPROFIT=&timeout=1&prefix=OHNOMS'

Easy enough, unless your auth key happens to contain a + symbol. In that case, your PHP session creation will fail with this error:

# php count.php
PHP Fatal error:  Uncaught exception 'RedisException' with message 'Failed to AUTH connection' in count.php:3
Stack trace:
#0 count.php(3): session_start()
#1 {main}
  thrown in count.php on line 3
PHP Fatal error:  Uncaught exception 'RedisException' with message 'Failed to AUTH connection' in [no active file]:0
Stack trace:
#0 {main}
  thrown in [no active file] on line 0

From redis-cli the authentication was working fine, so it took us a while to debug. It ended up being a problem with the + symbol, and the the quickest solution in our case was to just regen the auth key so it didn’t have a + in it, but I suspect (but I didn’t test this solution) that URLencoding the + as %2B might work as well.

How to quickly install and configure Splunk Enterprise

As you may have noticed, I’m not a huge fan of proprietary, closed source software. And of course I ended up having to install Splunk for a client. So here’s a few notes on what I did to get it working.

I started following this guide with a few integrations here and there.

Install the Splunk Server

First thing, you need to download the server. You have to register for it (proprietary software).

I got the 64bit RPM for my CentOS 7 server and installed it with

yum install splunk-*-linux-2.6-x86_64.rpm
/opt/splunk/bin/splunk --answer-yes --no-prompt --accept-license enable boot-start
/opt/splunk/bin/splunk --answer-yes --no-prompt --accept-license start

This will automatically accept the license and setup the Splunk Server to start at boot time.

If everything worked correctly, you should be able to connect to your Splunk Server on:

url: http://your-server-name-or-ip:8000
user: admin
pass: changeme

If it doesn’t work, check if you have a firewall on your server machine and open port tcp/8000 if needed.

For more information on this step, I’ll referr you to the Fine Manual:

Configure the Splunk Server

The logical next step is to configure the Splunk Server to listen for incoming logs.

Assuming you didn’t change (yet) your Splunk Server user and password, you’ll need to run:

/opt/splunk/bin/splunk enable listen 9997 -auth admin:changeme
/opt/splunk/bin/splunk enable deploy-server -auth admin:changeme

For more information on this step, check:

Install the Splunk Universal Forwarder on clients

Now that the server side is configured, we need to setup a client to send some logs to it. Again, head off to the download page and grab the package you need.

For large scale deployment you might want to read about how to use user-seed.conf, so you can pre-seed your installation user and password. For this quick tutorial, we’ll skip that and run directly these commands:

yum -y install splunkforwarder-*-linux-2.6-x86_64.rpm
/opt/splunkforwarder/bin/splunk --answer-yes --no-prompt --accept-license enable boot-start
/opt/splunkforwarder/bin/splunk --answer-yes --no-prompt --accept-license start

Again, this will automatically accept the license and enable the forwarder at boot time.

For more information about this step:

Configure the Universal Forwarder

Once the forwarder is installed, you’ll need to configure it to talk to your server.

Please note that the user and password I’m using are those of the local splunk, not the Splunk Server.

/opt/splunkforwarder/bin/splunk add forward-server splunk-server:9997 -auth admin:changeme
/opt/splunkforwarder/bin/splunk set deploy-poll splunk-server:8089 -auth admin:changeme
/opt/splunkforwarder/bin/splunk enable deploy-client -auth admin:changeme
/opt/splunkforwarder/bin/splunk add monitor /var/log/nginx/error.log
/opt/splunkforwarder/bin/splunk restart

In my case I added /var/log/nginx/error.log to the files that will be monitored and sent to the server.

For more information about this step, check out:

Accessing your logs on the Splunk Server

At this point you should be able to log in your Splunk Server web interface, head to the “Search & Reporting” app, and search for your data, for example I used a simple query:

source="/var/log/web/nginx/error.log"

to make sure the data from my log files was ending up in Splunk.