Setting up pnp4nagios on Ubuntu 12.04 LTS Precise Pangolin

Prerequisites

nagios3 is installed.

Install

aptitude install pnp4nagios

Configuring Bulk Mode with NPCD

nagios.cfg

In /etc/nagios3/nagios.cfg, update process_performance_data=1. Use the sed command or just edit the file!

grep process_performance_data /etc/nagios3/nagios.cfg
sed -i 's/process_performance_data=0/process_performance_data=1/' /etc/nagios3/nagios.cfg

Create the following directories where the performance data will be stored

mkdir -p /var/spool/pnp4nagios/nagios
chown -R nagios:nagios /var/spool/pnp4nagios

Configure nagios.cfg to use the directories and files for storing the data

#
# service performance data
#
service_perfdata_file=/var/spool/pnp4nagios/nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=pnp-bulk-service
 
#
# host performance data
# 
host_perfdata_file=/var/spool/pnp4nagios/nagios/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=pnp-bulk-host
#

conf.d/pnp4nagios.cfg should already have pnp-bulk-service & pnp-bulk-host commands (came about with the install), if not paste this at: /etc/nagios3/pnp4nagios.cfg. Note the path for process_perfdata.pl - may be different and is sometimes at /usr/local/pnp4nagios/libexec.

define command {
    command_name    pnp-synchronous-service
    command_line    /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl
}

define command {
    command_name    pnp-synchronous-host
    command_line    /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
}

##############################################################################

define command{
    command_name    pnp-bulk-service
    command_line    /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl --bulk=/var/spool/pnp4nagios/nagios/service-perfdata
}

define command{
    command_name    pnp-bulk-host
    command_line    /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl --bulk=/var/spool/pnp4nagios/nagios/host-perfdata
}

##############################################################################

define command{
    command_name    pnp-bulknpcd-service
    command_line    /bin/mv /var/spool/pnp4nagios/nagios/service-perfdata /var/spool/pnp4nagios/npcd/service-perfdata.$TIMET$
}

define command{
    command_name    pnp-bulknpcd-host
    command_line    /bin/mv /var/spool/pnp4nagios/nagios/host-perfdata /var/spool/pnp4nagios/npcd/host-perfdata.$TIMET$
}

Config pnp4nagios

  • Edit /etc/default/npcd
    • Update: RUN=“yes”
  • Validate: /etc/pnp4nagios/npcd.cfg or /usr/local/pnp4nagios/etc/npcd.cfg
    • No change usually required

Apache web server configuration

Link the pnp4nagios web configuration to apache configuration

ln -s /etc/pnp4nagios/apache.conf /etc/apache2/conf-enabled/pnp4nagios.conf
service apache2 restart

Restart

Restart services

service nagios3 restart
service npcd start
service apache2 restart

Use

Optional

In /etc/pnp4nagios/npcd.cfg change log_type from syslog to file

log_type = file
#log_type = syslog

Errors

Common errors are when monitoring parameters change in nagios and the old XML definitions have to be deleted. Do determine if there are issues run:

grep -R "found extra data" /var/lib/pnp4nagios/perfdata/*/*.xml

For the related dot xml file, delete both the dot rrd (and the dot xml) file. For example if you get the output as

/var/lib/pnp4nagios/perfdata/server1/Disks.xml:    <TXT>/var/lib/pnp4nagios/perfdata/server1/Disks.rrd: found extra data on update argument: 613954</TXT>

then delete as follows

rm /var/lib/pnp4nagios/perfdata/server1/Disks.*

Of course this means you lose historical data as well.

Alternative grep to check for other errors/information as well:

grep -R TXT /var/lib/pnp4nagios/perfdata/*/*.xml|grep -v successful

Cron job for error check

Create /etc/cron.daily/pnp4nagios_check as below:

#!/bin/bash
#
PNP4LOC=/var/lib/pnp4nagios/perfdata
#
PNPERRCNT=`grep -R TXT $PNP4LOC/*/*.xml|grep -c -v successful`
if [ $PNPERRCNT -gt 0 ]; then
  grep -R TXT $PNP4LOC/*/*.xml | grep -v successful | mailx -s "PNP4Nagios Error" admin@example.org
fi
#
exit

Increasing RRD Resolution

The /etc/pnp4nagios/rra.cfg has the default resolution. The default rolls up and aggregates the data quite quickly and I prefer to have more resolution over longer periods of time. The below change the resolution to be more fine grained over longer time periods.

Sometimes this file is at /usr/local/pnp4nagios/etc/rra.cfg

Sometimes this file is referenced from /usr/local/pnp4nagios/etc/process_perfdata.cfg

High Resolution

#
# Define the default RRA Step in seconds
# More Infos on
# http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
#
RRA_STEP=60
#
# PNP default RRA config
#
# you will get 6 MB of data per datasource
#
# 51840 entries with 1 minute step = 36 days
#
RRA:AVERAGE:0.5:1:51840
#
# 115200 entries with 5 minute step = 400 days
#
RRA:AVERAGE:0.5:5:115200
#
# 38400 entries with 30 minute step = 800 days
#
RRA:AVERAGE:0.5:30:38400
#
# 35040 entries with 60 minute step = 4 years
#
RRA:AVERAGE:0.5:60:35040

RRA:MAX:0.5:1:51840
RRA:MAX:0.5:5:115200
RRA:MAX:0.5:30:38400
RRA:MAX:0.5:60:35040

RRA:MIN:0.5:1:51840
RRA:MIN:0.5:5:115200
RRA:MIN:0.5:30:38400
RRA:MIN:0.5:60:35040

Medium Resolution

#
# Define the default RRA Step in seconds
# More Infos on
# http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
#
RRA_STEP=60
#
# PNP default RRA config
#
# you will get 2.8 MB of data per datasource
#
# 11520 entries with 1 minute step = 8 days
#
RRA:AVERAGE:0.5:1:11520
#
# 11520 entries with 5 minute step = 40 days
#
RRA:AVERAGE:0.5:5:11520
#
# 19200 entries with 30 minute step = 400 days
#
RRA:AVERAGE:0.5:30:19200
#
# 17520 entries with 120 minute step = 4 years
#
RRA:AVERAGE:0.5:120:17520

RRA:MAX:0.5:1:11520
RRA:MAX:0.5:5:11520
RRA:MAX:0.5:30:19200
RRA:MAX:0.5:360:17520

RRA:MIN:0.5:1:11520
RRA:MIN:0.5:5:11520
RRA:MIN:0.5:30:19200
RRA:MIN:0.5:360:17520

Low Resolution

The default resolution you get on install.

#
# Define the default RRA Step in seconds
# More Infos on
# http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
#
RRA_STEP=60
#
# PNP default RRA config
#
# you will get 400kb of data per datasource
#
# 2880 entries with 1 minute step = 48 hours
#
RRA:AVERAGE:0.5:1:2880
#
# 2880 entries with 5 minute step = 10 days
#
RRA:AVERAGE:0.5:5:2880
#
# 4320 entries with 30 minute step = 90 days
#
RRA:AVERAGE:0.5:30:4320
#
# 5840 entries with 360 minute step = 4 years
#
RRA:AVERAGE:0.5:360:5840

RRA:MAX:0.5:1:2880
RRA:MAX:0.5:5:2880
RRA:MAX:0.5:30:4320
RRA:MAX:0.5:360:5840

RRA:MIN:0.5:1:2880
RRA:MIN:0.5:5:2880
RRA:MIN:0.5:30:4320
RRA:MIN:0.5:360:5840

Performance Data Format

Format

'label'=value[UOM];[warn];[crit];[min];[max] 

Example showing multiple data sources

Access Count is OK. Response Time is OK. HTTP 2xx Count is OK. HTTP 3xx Count is OK. HTTP 4xx Count is OK. HTTP 5xx Count is OK. Access Count=23 Response Time=179357us HTTP 2xx Count=13 HTTP 3xx Count=10 HTTP 4xx Count=0 HTTP 5xx Count=0|'Access Count'=23;1500;1600;0 'Response Time'=179357us;250000;300000;0 'HTTP 2xx Count'=13;1500;1600;0 'HTTP 3xx Count'=10;350;400;0 'HTTP 4xx Count'=0;30;50;0 'HTTP 5xx Count'=0;10;15;0

Performance Data Custom Graphs

You can customize data graphs by creating a custom php template and naming it appropriately. The naming convention is to use the same name (with .php extension) as used by the Nagios command. The default template specifies the underlying Nagios command name in the graph (at the bottom right corner). The custom templates are typically located in the following directory /usr/local/pnp4nagios/share/templates.dist. Copy the default.php to the <command>.php file and customize as required. Check pnp4nagios Templates for more information.

Also refer to Custom Templates to change the default behavior of which command name the template will use. The etc/check_commands directory (usually under /usr/local/pnp4nagios) will refer to the config file (<check_command>.cfg) to determine which command file to use. This is useful when the Nagios command is the same (such as in the case of check_nrpe) and you need to customize for the sub-command (such as check_nrpe_1arg!check_ls_memory_usage). In this case create a file check_nrpe.cfg with CUSTOM_TEMPLATE = 1 to specify the sub-command name to be used in the custom template.

Other


QR Code
QR Code tech:linux:pnp4nagios (generated for current page)