Nagios Plugin – check CPU temperature! I like to monitor every aspect of my linux servers and one of these aspects is the CPU temperature. Most common way to find out your CPU temperature for pretty much any linux distribution is to use the package lm_sensors. It’s quite easy to install it:
– if your linux distro is Debian based (like Debian, Ubuntu etc) you’ll issue this command: apt-get install lm-sensors
– if your linux distro is RedHat based (RedHat, CentOS, Fedora etc) you’ll issue this command: yum install lm_sensors
– if your linux distro is SuSe based (Suse, OpenSuse etc) you’ll issue this command: zypper install lm_sensors
For other distros you can check their official documentation, see what’s the default tool used to install packages.
I was able to find a few bash scripts based on lm_sensors but most of them where only taking into consideration the temperature from a single core of the CPU. My approach is a little different. Most of my servers have 2 CPUs so I would like to know the temperature on any of them. Also, if let’s say the server has 4 CPUs, then I want to be able to run the exact same script and get the temperature for each of the 4 CPUs.
Because I was not able to find anything to suit my needs, I’ve decided to write my own nagios plugin, in Python, based on lm_sensors package and the python pysensors. You cand find the plugin here: check_cpu_temp.
As I said, you’ll need to install lm_sensors package and the python module pysensors ( pip install pysensors).
As for most programs on linux, running the plugin with -h (help) will display some basic usage instructions:
1 2 3 4 5 6 7 8 9 10 | ./check_cpu_temp -h usage: check_cpu_temp [-h] [-w WARN] [-c CRIT] Nagios plugin to check CPU(s) temperature(s) optional arguments: -h, --help show this help message and exit -w WARN, --warn WARN Check temperature against a custom HIGH value -c CRIT, --crit CRIT Check temperature against a custom CRIT value |
Then simply run the plugin:
./check_cpu_temp
The output should be something like this:
OK - CPU(s) temperature(s): 29°C 33°C; high=80.0; crit=96.0
By default, if you run the pugin with no arguments, HIGH and CRIT temperatures are the default ones for your CPU(s). You can also use your own CUSTOM values if you like using the -w (warning) and -c (critical) arguments. Assuming you would want 50 degrees Celsius for HIGH and 60 degrees Celsius for CRIT, the syntax becomes this:
./check_cpu_temp -w 50 -c 60
I mostly use CentOS 7.x for my servers, so the example bellow is related to CentOS:
NRPE (nrpe.cfg):
1 | command[check_cpu_temp]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_cpu_temp $ARG1$ |
Nagios command definition (commands.cfg):
1 2 3 4 | define command { command_name check_cpu_temp command_line $USER1$/check_nrpe -H $HOSTADDRESS$ } |
Host service definition:
1 2 3 4 5 6 7 | define service { use generic-service host_name srv1 service_description CPU Temperature check_command check_nrpe!check_cpu_temp!"-w 60 -c 75" notifications_enabled 1 } |
Enjoy!
Hi,
I’m getting the error below when I run the python script on Centos 7
python temper.py
‘ascii’ codec can’t encode character u’\xb0′ in position 30: ordinal not in range(128)
Traceback (most recent call last):
File “temper.py”, line 141, in
print(“OK – ” + output)
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xb0′ in position 30: ordinal not in range(128)
Are you sure you’re running it using Python 2.7.x and not Python3? I’ll try to run the python script on a clean CentOS 7.x install to double check. I’ll update the answer here as soon as I did that