Hello Ran
This is Laurent DUFOUR (laurent.dufour@havas.com) from Paris France
I propose a small little change to the handling of timeouts in this check-netapp-ng script, in order to avoid the message "Return code of 142 is out of bounds" from Nagios. In fact we have to deal with two type of timeouts as I explain below, feel free to contact me if you need more explanations
CLARIFICATION FOR TIMEOUTS
There are multiples timeouts we depend on
Perl plugins timeout (utils.pm)--- > $TIMEOUT --- > originally 15 sec --> recommandation to raise it to 180
Net::SNMP timeout --- > Used in Net::SNMP->session --- > originally 5 sec --> recommandation to raise it to 60
Beware that the max value 60 seconds. If set above you get the error message "Can't create snmp session"
Do not forget that in nagios you need to increase service_check_timeout to a value above $TIMEOUT_PLUGINS
Nagios service check timeout (nagios.cfg) --- > service_check_timeout=240 --- > originally 30 sec
the check for a valide session in _create_session was not working, because $sess was always filled with an hash value even, if there was no session created. I deleted this part instead I created the function check_oid_return, which returns an error if snmp does not give back any value regarding to the checked oid
Add performance read/write bytes to disks and FCP or ISCSI
in FCPOPS and ISCSIOPS check.
Now is nagioscache files with only necessary values
(for FCPOPS only FCP values, no NFS or CIFS or ISCSI).
On all *OPS check store value to global variable when
writing to nagioscache file. Now getting value only once,
this output is precisely and using less SNMP queries.
On big volumes (bigger then 32bit counter = 4GB) overflow to
negative values. SNMP on version 2c can transport 64bit value,
it's not necessary using Low and High 32bit part of 64bit number.
Fix DISKUSED on SNMPv1 if using more then 4GB of space.
New _ulong64() function for counting 64bit number from its high
and low 32bit parts. This function computing correct 64bit number
on 32bit operating system.
* Add ISCSIOPS and FCPOPS checks type similar to CIFSOPS
* New -V option sets SNMP version (needs 2c for reading 64bit values)
* New -I option return every time OK state (if you needs only performance data)
* redesign help message and append examples
* nagioscache files create only if check needs caching
* each check has own *.nagioscache file (when I using multiple tests I get
inexact values of performance data)
* change version to 1.2
A list of valid UOM (unit of measurement) is available here https://nagios-plugins.org/doc/guidelines.html
Change percent to %
cifs ops/sec or nfs ops/sec are invalid. In this case use no unit.
Invalid units can cause some problems in tools like pnp4nagios which expect a valid unit.
In order for me to be able to check a little more stuff with your script I add some new check against my cluster of paloalto PA5050
The modification that I made are
model
ha
firmware
TCP Sessions
UDP Sessions
UCMP Sessions
Sessions