.Monitoring ZFS pool status with Uptime Kuma - Chris Stretton

Chris Stretton

Jul 11, 2024

Monitoring ZFS pool status with Uptime Kuma

I use the excellent Uptime Kuma to monitor my home network and server. This tool letsy you define monitors using probes like TCP, HTTP(s) Queries or good old fashioned ICMP ping to determine a service’s status.

My home server uses the OpenZFS file system, which provides redundancy as well as a whole host of neat features. However in the recent warm weather, the LSI-HBA I use for my hard drives started throwing errors which caused the ZFS pool to mark itself as degraded.

The hardware fix was simple, I increased the airflow in my server’s case by adding a new fan pointing directly at the HBA, and the errors stopped.

But I had to discover this myself, I was not aware it had happened and found it by chance, I wanted to be sure I would be alerted if this happened again.

Push Probes

One of the probes Kuma supports is “push”, this is essentially a web hook that you can set the status of by making a GET request to a URL with a token and status. If the web hook is not pinged within the target interval, the status is marked as DOWN.

I wanted to leverage this to ensure ZFS pool status was pushed to Kuma.

To do this I created this script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
if [ $# -lt 1 ]; then
echo "Usage: $0 <push_token> <pool>" >&2
exit 1
fi

push_token=$1; shift
pool=$1; shift
start_time=$(date -u +%s%3N)

health=$(zpool list -H -o health $pool)

status="down"
if [ $health = "ONLINE" ]; then
status="up"
fi

end_time=$(date -u +%s%3N)
duration=$(($end_time - $start_time))

output=$(curl --fail --no-progress-meter --retry 3 "https://my.kuma.url/api/push/$push_token?ping=$duration&status=$status&message=$pool%3A%20$health" 2>&1)
if [ $? -ne 0 ]; then
echo "Ping failed: $output" >&2
fi

This script does a few things.

  1. Get the current date/time formatted as miliseconds.
  2. Determine the health of the ZFS pool using zpool list -H -o health <pool name>
  3. Determines if the status should be considered “UP” or “DOWN” based on if the health of the pool is “ONLINE” or anything else (UNNAVAIL, DEGRADED etc)
  4. Determines the duration of the check by reducing the start time from the current time
  5. Uses curl to ping the Kuma URL with the probe token and data.

Putting it together

I already have a user that runs monitoring tasks on that server, and the zpool list command is non-privileged, so in this instance I just placed the script in $HOME/bin for that user and added it to the user’s crontab to run every 10 minutes.

*/10 * * * * $HOME/bin/zfs_monitor mytoken storage

Obviously replace “mytoken” with your actual token from the Kuma probe and “storage” with the name of your ZFS pool.

You need to ensure that the heartbeat time for your probe in Kuma matches the timing in your cron job or it could trigger alerts, for instance here I run it every 10 minutes, so my heartbeat time in Kuma is 600 seconds (10 minutes).

OLDER >