Linux SysAdmin & DevOps

Nagios Plugin – CPU temp using lm_sensors and pysensors

Nagios - Plugin to check CPU temperature! I like to monitor every aspect of my linux servers and one of these aspects is the CPU temperature. Most common way to find out your CPU temperature for pretty much any linux distribution is to use the package lm_sensors. It’s quite simple to install it:

Debian/Ubuntu:

apt-get install lm-sensors

CentOS/RedHat/Fedora:

yum install lm_sensors

Suse/OpenSuse:

zypper install lm_sensors

For other distros you can check their official documentation, see what’s the default tool used to install packages.

I was able to find a few bash scripts based on lm_sensors but most of them where only taking into consideration the temperature from a single core of the CPU. My approach is a little different. Most of my servers have 2 CPUs so I would like to know the temperature on any of them. Also, if let’s say the server has 4 CPUs, then I want to be able to run the exact same script and get the temperature for each of the 4 CPUs.

Because I was not able to find anything to suit my needs, I’ve decided to write my own nagios plugin, in Python, based on lm_sensors package and the python pysensors. You cand find the plugin here: check_cpu_temp.

As I said, you’ll need to install lmsensors package and the python module pysensors (pip install pysensors).

As for most programs on linux, running the plugin with -h (help) will display some basic usage instructions:

./check_cpu_temp -h

usage: check_cpu_temp [-h] [-w WARN] [-c CRIT]

Nagios plugin to check CPU(s) temperature(s)

optional arguments:
-h, --help show this help message and exit
-w WARN, --warn WARN Check temperature against a custom HIGH value
-c CRIT, --crit CRIT Check temperature against a custom CRIT value

Then simply run the plugin:

./check_cpu_temp

The output should be something like this:

OK - CPU(s) temperature(s): 29°C 33°C; high=80.0; crit=96.0

By default, if you run the pugin with no arguments, HIGH and CRIT temperatures are the default ones for your CPU(s). You can also use your own CUSTOM values if you like using the -w (warning) and -c (critical) arguments. Assuming you would want 50 degrees Celsius for HIGH and 60 degrees Celsius for CRIT, the syntax becomes this:

./check_cpu_temp -w 50 -c 60

I mostly use CentOS 7.x for my servers, so the example bellow is related to CentOS:

NRPE (nrpe.cfg):

command[check_cpu_temp]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_cpu_temp $ARG1$

Nagios command definition (commands.cfg):

define command {
	command_name check_cpu_temp
	command_line $USER1$/check_nrpe -H $HOSTADDRESS$
}

Host service definition:

define service {
        use                             generic-service
        host_name                       srv1
        service_description             CPU Temperature
        check_command                   check_nrpe!check_cpu_temp!"-w 60 -c 75"
        notifications_enabled           1
}

Enjoy!