====== Setting up pnp4nagios on Ubuntu 12.04 LTS Precise Pangolin ====== ===== Prerequisites ===== nagios3 is installed. ===== Install ===== aptitude install pnp4nagios ===== Configuring Bulk Mode with NPCD ===== ==== nagios.cfg ==== In /etc/nagios3/nagios.cfg, update process_performance_data=1. Use the sed command or just edit the file! grep process_performance_data /etc/nagios3/nagios.cfg sed -i 's/process_performance_data=0/process_performance_data=1/' /etc/nagios3/nagios.cfg Create the following directories where the performance data will be stored mkdir -p /var/spool/pnp4nagios/nagios chown -R nagios:nagios /var/spool/pnp4nagios Configure nagios.cfg to use the directories and files for storing the data # # service performance data # service_perfdata_file=/var/spool/pnp4nagios/nagios/service-perfdata service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$ service_perfdata_file_mode=a service_perfdata_file_processing_interval=15 service_perfdata_file_processing_command=pnp-bulk-service # # host performance data # host_perfdata_file=/var/spool/pnp4nagios/nagios/host-perfdata host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$ host_perfdata_file_mode=a host_perfdata_file_processing_interval=15 host_perfdata_file_processing_command=pnp-bulk-host # conf.d/pnp4nagios.cfg should already have pnp-bulk-service & pnp-bulk-host commands (came about with the install), if not paste this at: ''/etc/nagios3/pnp4nagios.cfg''. Note the path for ''process_perfdata.pl'' - may be different and is sometimes at ''/usr/local/pnp4nagios/libexec''. define command { command_name pnp-synchronous-service command_line /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl } define command { command_name pnp-synchronous-host command_line /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA } ############################################################################## define command{ command_name pnp-bulk-service command_line /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl --bulk=/var/spool/pnp4nagios/nagios/service-perfdata } define command{ command_name pnp-bulk-host command_line /usr/bin/perl /usr/lib/pnp4nagios/libexec/process_perfdata.pl --bulk=/var/spool/pnp4nagios/nagios/host-perfdata } ############################################################################## define command{ command_name pnp-bulknpcd-service command_line /bin/mv /var/spool/pnp4nagios/nagios/service-perfdata /var/spool/pnp4nagios/npcd/service-perfdata.$TIMET$ } define command{ command_name pnp-bulknpcd-host command_line /bin/mv /var/spool/pnp4nagios/nagios/host-perfdata /var/spool/pnp4nagios/npcd/host-perfdata.$TIMET$ } ==== Config pnp4nagios ==== * Edit /etc/default/npcd * Update: RUN="yes" * Validate: /etc/pnp4nagios/npcd.cfg or /usr/local/pnp4nagios/etc/npcd.cfg * No change usually required ===== Apache web server configuration ===== Link the pnp4nagios web configuration to apache configuration ln -s /etc/pnp4nagios/apache.conf /etc/apache2/conf-enabled/pnp4nagios.conf service apache2 restart ===== Restart ===== Restart services service nagios3 restart service npcd start service apache2 restart ===== Use ===== http://localhost/pnp4nagios/ ===== Optional ===== In /etc/pnp4nagios/npcd.cfg change log_type from syslog to file log_type = file #log_type = syslog ===== Errors ===== Common errors are when monitoring parameters change in nagios and the old XML definitions have to be deleted. Do determine if there are issues run: grep -R "found extra data" /var/lib/pnp4nagios/perfdata/*/*.xml For the related dot xml file, delete both the dot rrd (and the dot xml) file. For example if you get the output as /var/lib/pnp4nagios/perfdata/server1/Disks.xml: /var/lib/pnp4nagios/perfdata/server1/Disks.rrd: found extra data on update argument: 613954 then delete as follows rm /var/lib/pnp4nagios/perfdata/server1/Disks.* Of course this means you lose historical data as well. Alternative grep to check for other errors/information as well: grep -R TXT /var/lib/pnp4nagios/perfdata/*/*.xml|grep -v successful ===== Cron job for error check ===== Create /etc/cron.daily/pnp4nagios_check as below: #!/bin/bash # PNP4LOC=/var/lib/pnp4nagios/perfdata # PNPERRCNT=`grep -R TXT $PNP4LOC/*/*.xml|grep -c -v successful` if [ $PNPERRCNT -gt 0 ]; then grep -R TXT $PNP4LOC/*/*.xml | grep -v successful | mailx -s "PNP4Nagios Error" admin@example.org fi # exit ===== Increasing RRD Resolution ===== The ''/etc/pnp4nagios/rra.cfg'' has the default resolution. The default rolls up and aggregates the data quite quickly and I prefer to have more resolution over longer periods of time. The below change the resolution to be more fine grained over longer time periods. Sometimes this file is at ''/usr/local/pnp4nagios/etc/rra.cfg'' Sometimes this file is referenced from ''/usr/local/pnp4nagios/etc/process_perfdata.cfg'' ==== High Resolution ==== # # Define the default RRA Step in seconds # More Infos on # http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html # RRA_STEP=60 # # PNP default RRA config # # you will get 6 MB of data per datasource # # 51840 entries with 1 minute step = 36 days # RRA:AVERAGE:0.5:1:51840 # # 115200 entries with 5 minute step = 400 days # RRA:AVERAGE:0.5:5:115200 # # 38400 entries with 30 minute step = 800 days # RRA:AVERAGE:0.5:30:38400 # # 35040 entries with 60 minute step = 4 years # RRA:AVERAGE:0.5:60:35040 RRA:MAX:0.5:1:51840 RRA:MAX:0.5:5:115200 RRA:MAX:0.5:30:38400 RRA:MAX:0.5:60:35040 RRA:MIN:0.5:1:51840 RRA:MIN:0.5:5:115200 RRA:MIN:0.5:30:38400 RRA:MIN:0.5:60:35040 ==== Medium Resolution ==== # # Define the default RRA Step in seconds # More Infos on # http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html # RRA_STEP=60 # # PNP default RRA config # # you will get 2.8 MB of data per datasource # # 11520 entries with 1 minute step = 8 days # RRA:AVERAGE:0.5:1:11520 # # 11520 entries with 5 minute step = 40 days # RRA:AVERAGE:0.5:5:11520 # # 19200 entries with 30 minute step = 400 days # RRA:AVERAGE:0.5:30:19200 # # 17520 entries with 120 minute step = 4 years # RRA:AVERAGE:0.5:120:17520 RRA:MAX:0.5:1:11520 RRA:MAX:0.5:5:11520 RRA:MAX:0.5:30:19200 RRA:MAX:0.5:360:17520 RRA:MIN:0.5:1:11520 RRA:MIN:0.5:5:11520 RRA:MIN:0.5:30:19200 RRA:MIN:0.5:360:17520 ==== Low Resolution ==== The default resolution you get on install. # # Define the default RRA Step in seconds # More Infos on # http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html # RRA_STEP=60 # # PNP default RRA config # # you will get 400kb of data per datasource # # 2880 entries with 1 minute step = 48 hours # RRA:AVERAGE:0.5:1:2880 # # 2880 entries with 5 minute step = 10 days # RRA:AVERAGE:0.5:5:2880 # # 4320 entries with 30 minute step = 90 days # RRA:AVERAGE:0.5:30:4320 # # 5840 entries with 360 minute step = 4 years # RRA:AVERAGE:0.5:360:5840 RRA:MAX:0.5:1:2880 RRA:MAX:0.5:5:2880 RRA:MAX:0.5:30:4320 RRA:MAX:0.5:360:5840 RRA:MIN:0.5:1:2880 RRA:MIN:0.5:5:2880 RRA:MIN:0.5:30:4320 RRA:MIN:0.5:360:5840 ===== Performance Data Format ===== Format 'label'=value[UOM];[warn];[crit];[min];[max] Example showing multiple data sources Access Count is OK. Response Time is OK. HTTP 2xx Count is OK. HTTP 3xx Count is OK. HTTP 4xx Count is OK. HTTP 5xx Count is OK. Access Count=23 Response Time=179357us HTTP 2xx Count=13 HTTP 3xx Count=10 HTTP 4xx Count=0 HTTP 5xx Count=0|'Access Count'=23;1500;1600;0 'Response Time'=179357us;250000;300000;0 'HTTP 2xx Count'=13;1500;1600;0 'HTTP 3xx Count'=10;350;400;0 'HTTP 4xx Count'=0;30;50;0 'HTTP 5xx Count'=0;10;15;0 ===== Performance Data Custom Graphs ===== You can customize data graphs by creating a custom php template and naming it appropriately. The naming convention is to use the same name (with .php extension) as used by the Nagios command. The default template specifies the underlying Nagios command name in the graph (at the bottom right corner). The custom templates are typically located in the following directory ''/usr/local/pnp4nagios/share/templates.dist''. Copy the ''default.php'' to the ''.php'' file and customize as required. Check [[http://docs.pnp4nagios.org/pnp-0.4/tpl|pnp4nagios Templates]] for more information. Also refer to [[http://docs.pnp4nagios.org/pnp-0.6/tpl_custom|Custom Templates]] to change the default behavior of which command name the template will use. The ''etc/check_commands'' directory (usually under /usr/local/pnp4nagios) will refer to the config file (.cfg) to determine which command file to use. This is useful when the Nagios command is the same (such as in the case of check_nrpe) and you need to customize for the sub-command (such as check_nrpe_1arg!check_ls_memory_usage). In this case create a file check_nrpe.cfg with ''CUSTOM_TEMPLATE = 1'' to specify the sub-command name to be used in the custom template. ===== Other ===== * [[pnp4nagios_averages|pnp4nagios extracting averages]] * [[pnp4nagios_graphs|pnp4nagios extracting graphs]] * [[http://docs.pnp4nagios.org/pnp-0.6/perfdata_format|Performance Data Format]]