57 Commits
v1.2 ... v1.4

Author SHA1 Message Date
Markus Opolka
9c0c59d6c1 Fix unittests 2020-01-28 10:22:05 +01:00
Markus Opolka
e2fce71d5a Merge pull request #46 from bb-Ricardo/master
pull improvements from different branches together
2020-01-28 10:15:49 +01:00
Ricardo Bartels
47547951cf fixed minor bugs and added compatibility for RHEL/CentOS 7.x
* change ssl.PROTOCOL_TLS to ssl.PROTOCOL_SSLv23
* fixed bug that response var not passed outside try/except block
* fixed arrer in nagios.append_metrics()
2019-05-09 16:39:41 +02:00
Ricardo Bartels
7858382bbe Added default User-Agent header
* prevent errors for services which require this header (like Cloudflare WAF)
2019-05-09 15:53:59 +02:00
Ricardo Bartels
1173420803 updated README with current cli options 2019-05-09 15:48:29 +02:00
Ricardo Bartels
bcc36a6e95 added version information and improved help text 2019-05-09 15:44:33 +02:00
Ricardo Bartels
d98d0396b2 return more meaningful error message if parsing of data failed 2019-05-09 15:06:52 +02:00
Ricardo Bartels
8437c464e5 refine ssl insecure and client certificate options
* default TLS Protocols are now set to >= TLS1
* --cacert and --cert are no longer mandatory if option -s is used
* proper error messages if parsing of cert or key files fails
2019-05-09 14:55:25 +02:00
Ricardo Bartels
df2bbdbf51 Merge remote-tracking branch 'theicfire/master' into next-release 2019-05-09 13:38:37 +02:00
Ricardo Bartels
823fc275c9 fixed expansion on newly merged command line args 2019-05-09 13:18:34 +02:00
Ricardo Bartels
18b0898e72 Merge remote-tracking branch 'nrobert13/tg' into next-release 2019-05-09 12:39:58 +02:00
Ricardo Bartels
95318954bf fixed indentation and and print statements
* clean up from previous merges
2019-05-09 11:58:50 +02:00
Ricardo Bartels
8e469e3d98 Merge branch 'luban8' into next-release 2019-05-09 11:30:26 +02:00
Ricardo Bartels
29f8d892ee Merge branch 'ack-expand-array' into next-release 2019-05-09 11:17:51 +02:00
luban8
cbdb884dc7 Update README.md 2019-05-07 16:27:56 +02:00
luban8
3a108aef5e Update README.md 2019-05-07 16:25:00 +02:00
Martin Sura
81522fa9ab fix intedation 2019-05-07 16:23:48 +02:00
Martin Sura
27eaaf0842 Add unknown option 2019-05-07 16:15:31 +02:00
Chase Lambert
9dd6323b85 Better failure message for exact keys 2018-04-02 09:34:00 -04:00
Robert Nemeti
67136a4a2b add client ssl cert support 2018-02-15 17:04:04 +01:00
Robert Nemeti
d164a1250c add key,value non equality check, the opposite of the -q and -Q 2018-01-10 10:23:34 +01:00
Robert Nemeti
89f42c15a0 use python2.7 because on centos 6 (icinga) the default python is 2.6 and doesn;t have the required ssl libraries 2017-08-10 15:41:35 +02:00
Robert Nemeti
1e707a4b6a add repo and upstream info 2017-08-10 15:05:53 +02:00
Robert Nemeti
9656265439 print current value in the icinga message 2017-08-10 10:29:45 +02:00
Robert Nemeti
e463369671 added insecure argument for the ssl connections 2017-08-10 10:28:02 +02:00
Drew Kerrigan
357c2240ba Merge pull request #31 from thmshmm/master
fix unknown_message bug
2017-07-19 09:37:49 -07:00
Thomas Hamm
42d1e08037 fix unknown_message bug 2017-01-26 16:26:45 +00:00
Drew Kerrigan
4950225393 adding support for (*) to all flags 2016-07-19 12:43:16 -04:00
Drew Kerrigan
9be6a709a2 syntax cleanup 2016-07-19 11:02:28 -04:00
Drew Kerrigan
06fab10fe2 added (*) syntax 2016-07-18 17:22:41 -04:00
Drew Kerrigan
7bdc802c2d consistent tabs 2016-07-18 14:15:56 -04:00
Drew Kerrigan
ed7bc7175b Merge pull request #20 from berosek/master
Support for custom HTTP Headers added using -A parameter.
2016-02-23 22:11:57 +01:00
Drew Kerrigan
4180ec2066 Merge pull request #22 from artschwagerb/master
fix variable error
2016-02-23 22:10:47 +01:00
Brian Artschwager
a0d0773d1a fix variable error
NameError: global name 'critical_message' is not defined
2016-02-18 10:42:44 -05:00
Beri
fbebf05f76 Support for custom HTTP Headers added using -A parameter. 2016-01-08 16:21:00 +01:00
drewkerrigan
6f9048fc75 updating docs 2015-11-19 13:55:16 -05:00
drewkerrigan
5bb09cd362 updating docs 2015-11-19 13:55:07 -05:00
drewkerrigan
568fa6e4d0 updating docs 2015-11-19 13:54:53 -05:00
drewkerrigan
f63ac180b6 updating docs 2015-11-19 13:54:25 -05:00
drewkerrigan
070047cf55 updating docs 2015-11-19 13:52:54 -05:00
drewkerrigan
8adcf2ff07 updating docs 2015-11-19 13:49:24 -05:00
drewkerrigan
fb4e58b635 Adding support for -E, -Q, -w, -c, fixing threshold checking on -m, added UnitTest task, removed -l, -g 2015-11-18 23:07:50 -05:00
drewkerrigan
32a8884881 added ability to supply an alias for a key re: #10 2015-11-15 23:50:12 -05:00
drewkerrigan
2644151b5f adding header for nagios configuration 2015-10-05 12:40:52 -04:00
drewkerrigan
55b979f3e2 Merge branch 'master' of github.com:drewkerrigan/nagios-http-json 2015-10-05 10:36:46 -04:00
drewkerrigan
369d5115a3 extra debugging 2015-10-05 10:36:40 -04:00
Drew Kerrigan
ea7edf5d01 Merge pull request #17 from billmoritz/metrics-fix
Return metrics no matter the result
2015-08-31 12:55:17 -04:00
Bill Moritz
a4be4d42c6 Return metrics no matter the result 2015-08-31 11:36:27 -04:00
Drew Kerrigan
cb0a5927c2 Merge pull request #15 from billmoritz/http-post
Http post
2015-08-22 18:22:07 -04:00
Bill Moritz
42e75abcad Update README.md 2015-08-22 09:47:25 -04:00
Bill Moritz
3058176ba1 Add data argument
Add an option to HTTP POST data to the host.
2015-08-22 09:42:06 -04:00
Drew Kerrigan
e4334d0c4a Merge pull request #13 from nejec/master
Allow multiple key value specification
2015-07-29 11:45:04 -07:00
Jernej Porenta
b9d03c899f Allow multiple key value specification
Multiple key values can be specified by using colon delimiter.
2015-07-28 09:08:55 +02:00
Drew Kerrigan
15c5075cc1 Merge pull request #12 from MrOppermann/public-readme.me-and-script-help-are-different
synchronized readme.me and help of plugin regarding usage description
2015-07-13 17:22:42 -04:00
frederic.oppermann
1be1b2e5a2 synchronized readme.me and help of plugin regarding usage description 2015-07-13 17:05:27 +02:00
Drew Kerrigan
1772543ee3 Merge pull request #9 from invertigo/master
add arguments for http timeout and tcp port
2015-05-07 19:48:51 -04:00
root
fe2e830bf7 add arguments for http timeout and tcp port 2015-05-07 22:19:23 +00:00
4 changed files with 1262 additions and 424 deletions

405
README.md
View File

@@ -2,12 +2,198 @@
This is a generic plugin for Nagios which checks json values from a given HTTP endpoint against argument specified rules and determines the status and performance data for that service. This is a generic plugin for Nagios which checks json values from a given HTTP endpoint against argument specified rules and determines the status and performance data for that service.
### Installation ## Links
#### Requirements * [CLI Usage](#cli-usage)
* [Examples](#examples)
* [Riak Stats](docs/RIAK.md)
* [Docker](docs/DOCKER.md)
* [Nagios Installation](#nagios-installation)
* Nagios ## CLI Usage
* Python
Executing `./check_http_json.py -h` will yield the following details:
```
usage: check_http_json.py [-h] [-d] [-s] -H HOST [-k] [-V] [--cacert CACERT]
[--cert CERT] [--key KEY] [-P PORT] [-p PATH]
[-t TIMEOUT] [-B AUTH] [-D DATA] [-A HEADERS]
[-f SEPARATOR]
[-w [KEY_THRESHOLD_WARNING [KEY_THRESHOLD_WARNING ...]]]
[-c [KEY_THRESHOLD_CRITICAL [KEY_THRESHOLD_CRITICAL ...]]]
[-e [KEY_LIST [KEY_LIST ...]]]
[-E [KEY_LIST_CRITICAL [KEY_LIST_CRITICAL ...]]]
[-q [KEY_VALUE_LIST [KEY_VALUE_LIST ...]]]
[-Q [KEY_VALUE_LIST_CRITICAL [KEY_VALUE_LIST_CRITICAL ...]]]
[-u [KEY_VALUE_LIST_UNKNOWN [KEY_VALUE_LIST_UNKNOWN ...]]]
[-y [KEY_VALUE_LIST_NOT [KEY_VALUE_LIST_NOT ...]]]
[-Y [KEY_VALUE_LIST_NOT_CRITICAL [KEY_VALUE_LIST_NOT_CRITICAL ...]]]
[-m [METRIC_LIST [METRIC_LIST ...]]]
Check HTTP JSON Nagios Plugin
Generic Nagios plugin which checks json values from a given endpoint against
argument specified rules and determines the status and performance data for
that service.
Version: 1.4.0 (2019-05-09)
optional arguments:
-h, --help show this help message and exit
-d, --debug debug mode
-s, --ssl use TLS to connect to remote host
-H HOST, --host HOST remote host to query
-k, --insecure do not check server SSL certificate
-V, --version print version of this plugin
--cacert CACERT SSL CA certificate
--cert CERT SSL client certificate
--key KEY SSL client key ( if not bundled into the cert )
-P PORT, --port PORT TCP port
-p PATH, --path PATH Path
-t TIMEOUT, --timeout TIMEOUT
Connection timeout (seconds)
-B AUTH, --basic-auth AUTH
Basic auth string "username:password"
-D DATA, --data DATA The http payload to send as a POST
-A HEADERS, --headers HEADERS
The http headers in JSON format.
-f SEPARATOR, --field_separator SEPARATOR
JSON Field separator, defaults to "."; Select element
in an array with "(" ")"
-w [KEY_THRESHOLD_WARNING [KEY_THRESHOLD_WARNING ...]], --warning [KEY_THRESHOLD_WARNING [KEY_THRESHOLD_WARNING ...]]
Warning threshold for these values
(key1[>alias],WarnRange key2[>alias],WarnRange).
WarnRange is in the format [@]start:end, more
information at nagios-plugins.org/doc/guidelines.html.
-c [KEY_THRESHOLD_CRITICAL [KEY_THRESHOLD_CRITICAL ...]], --critical [KEY_THRESHOLD_CRITICAL [KEY_THRESHOLD_CRITICAL ...]]
Critical threshold for these values
(key1[>alias],CriticalRange
key2[>alias],CriticalRange. CriticalRange is in the
format [@]start:end, more information at nagios-
plugins.org/doc/guidelines.html.
-e [KEY_LIST [KEY_LIST ...]], --key_exists [KEY_LIST [KEY_LIST ...]]
Checks existence of these keys to determine status.
Return warning if key is not present.
-E [KEY_LIST_CRITICAL [KEY_LIST_CRITICAL ...]], --key_exists_critical [KEY_LIST_CRITICAL [KEY_LIST_CRITICAL ...]]
Same as -e but return critical if key is not present.
-q [KEY_VALUE_LIST [KEY_VALUE_LIST ...]], --key_equals [KEY_VALUE_LIST [KEY_VALUE_LIST ...]]
Checks equality of these keys and values
(key[>alias],value key2,value2) to determine status.
Multiple key values can be delimited with colon
(key,value1:value2). Return warning if equality check
fails
-Q [KEY_VALUE_LIST_CRITICAL [KEY_VALUE_LIST_CRITICAL ...]], --key_equals_critical [KEY_VALUE_LIST_CRITICAL [KEY_VALUE_LIST_CRITICAL ...]]
Same as -q but return critical if equality check
fails.
-u [KEY_VALUE_LIST_UNKNOWN [KEY_VALUE_LIST_UNKNOWN ...]], --key_equals_unknown [KEY_VALUE_LIST_UNKNOWN [KEY_VALUE_LIST_UNKNOWN ...]]
Same as -q but return unknown if equality check fails.
-y [KEY_VALUE_LIST_NOT [KEY_VALUE_LIST_NOT ...]], --key_not_equals [KEY_VALUE_LIST_NOT [KEY_VALUE_LIST_NOT ...]]
Checks equality of these keys and values
(key[>alias],value key2,value2) to determine status.
Multiple key values can be delimited with colon
(key,value1:value2). Return warning if equality check
succeeds
-Y [KEY_VALUE_LIST_NOT_CRITICAL [KEY_VALUE_LIST_NOT_CRITICAL ...]], --key_not_equals_critical [KEY_VALUE_LIST_NOT_CRITICAL [KEY_VALUE_LIST_NOT_CRITICAL ...]]
Same as -q but return critical if equality check
succeeds.
-m [METRIC_LIST [METRIC_LIST ...]], --key_metric [METRIC_LIST [METRIC_LIST ...]]
Gathers the values of these keys (key[>alias],
UnitOfMeasure,WarnRange,CriticalRange,Min,Max) for
Nagios performance data. More information about Range
format and units of measure for nagios can be found at
nagios-plugins.org/doc/guidelines.html Additional
formats for this parameter are: (key[>alias]),
(key[>alias],UnitOfMeasure),
(key[>alias],UnitOfMeasure,WarnRange, CriticalRange).
```
## Examples
### Key Naming
**Data for key** `value`:
{ "value": 1000 }
**Data for key** `capacity.value`:
{
"capacity": {
"value": 1000
}
}
**Data for key** `(0).capacity.value`:
[
{
"capacity": {
"value": 1000
}
}
]
**Data for separator** `-f _` **and key** `(0)_gauges_jvm.buffers.direct.capacity_value`:
[
{
"gauges": {
"jvm.buffers.direct.capacity":
"value": 1000
}
}
}
]
**Data for keys** `ring_members(0)`, `ring_members(1)`, `ring_members(2)`:
{
"ring_members": [
"riak1@127.0.0.1",
"riak2@127.0.0.1",
"riak3@127.0.0.1"
]
}
### Thresholds and Ranges
**Data**:
{ "metric": 1000 }
#### Relevant Commands
* **Warning:** `./check_http_json.py -H <host>:<port> -p <path> -w "metric,RANGE"`
* **Critical:** `./check_http_json.py -H <host>:<port> -p <path> -c "metric,RANGE"`
* **Metrics with Warning:** `./check_http_json.py -H <host>:<port> -p <path> -w "metric,RANGE"`
* **Metrics with Critical:**
./check_http_json.py -H <host>:<port> -p <path> -w "metric,,,RANGE"
./check_http_json.py -H <host>:<port> -p <path> -w "metric,,,,MIN,MAX"
#### Range Definitions
* **Format:** [@]START:END
* **Generates a Warning or Critical if...**
* **Value is less than 0 or greater than 1000:** `1000` or `0:1000`
* **Value is greater than or equal to 1000, or less than or equal to 0:** `@1000` or `@0:1000`
* **Value is less than 1000:** `1000:`
* **Value is greater than 1000:** `~:1000`
* **Value is greater than or equal to 1000:** `@1000:`
More info about Nagios Range format and Units of Measure can be found at [https://nagios-plugins.org/doc/guidelines.html](https://nagios-plugins.org/doc/guidelines.html).
#### Using Headers
* `./check_http_json.py -H <host>:<port> -p <path> -A '{"content-type": "application/json"}' -w "metric,RANGE"`
## Nagios Installation
### Requirements
* Python 2.7
### Configuration
Assuming a standard installation of Nagios, the plugin can be executed from the machine that Nagios is running on. Assuming a standard installation of Nagios, the plugin can be executed from the machine that Nagios is running on.
@@ -35,222 +221,13 @@ Add the following command definition to your commands config (`commands.config`)
define command{ define command{
command_name <command_name> command_name <command_name>
command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H <host>:<port> -p <path> [-e|-q|-l|-g <rules>] [-m <metrics>] command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H <host>:<port> -p <path> [-e|-q|-w|-c <rules>] [-m <metrics>]
} }
``` ```
More info about options in Usage. More info about options in Usage.
### CLI Usage
Executing `./check_http_json.py -h` will yield the following details:
```
usage: check_http_json.py [-h] -H HOST [-B AUTH] [-p PATH]
[-e [KEY_LIST [KEY_LIST ...]]]
[-q [KEY_VALUE_LIST [KEY_VALUE_LIST ...]]]
[-l [KEY_LTE_LIST [KEY_LTE_LIST ...]]]
[-g [KEY_GTE_LIST [KEY_GTE_LIST ...]]]
[-m [METRIC_LIST [METRIC_LIST ...]]] [-s]
[-f SEPARATOR] [-d]
Nagios plugin which checks json values from a given endpoint against argument
specified rules and determines the status and performance data for that
service
optional arguments:
-h, --help show this help message and exit
-H HOST, --host HOST Host.
-B AUTH, --basic-auth AUTH
Basic auth string "username:password"
-p PATH, --path PATH Path.
-e [KEY_LIST [KEY_LIST ...]], --key_exists [KEY_LIST [KEY_LIST ...]]
Checks existence of these keys to determine status.
-q [KEY_VALUE_LIST [KEY_VALUE_LIST ...]], --key_equals [KEY_VALUE_LIST [KEY_VALUE_LIST ...]]
Checks equality of these keys and values (key,value
key2,value2) to determine status.
-l [KEY_LTE_LIST [KEY_LTE_LIST ...]], --key_lte [KEY_LTE_LIST [KEY_LTE_LIST ...]]
Checks that these keys and values (key,value
key2,value2) are less than or equal to the returned
json value to determine status.
-g [KEY_GTE_LIST [KEY_GTE_LIST ...]], --key_gte [KEY_GTE_LIST [KEY_GTE_LIST ...]]
Checks that these keys and values (key,value
key2,value2) are greater than or equal to the returned
json value to determine status.
-m [METRIC_LIST [METRIC_LIST ...]], --key_metric [METRIC_LIST [METRIC_LIST ...]]
Gathers the values of these keys
(key,UnitOfMeasure,Min,Max,WarnRange,CriticalRange)
for Nagios performance data. More information about
Range format and units of measure for nagios can be
found at https://nagios-
plugins.org/doc/guidelines.html Additional formats for
this parameter are: (key), (key,UnitOfMeasure),
(key,UnitOfMeasure,Min,Max).
-s, --ssl HTTPS mode.
-f SEPARATOR, --field_separator SEPARATOR
Json Field separator, defaults to "." ; Select element
in an array with "(" ")"
-d, --debug Debug mode.
```
Access a specific JSON field by following this syntax: `alpha.beta.gamma(3).theta.omega(0)`
Dots are field separators (changeable), parantheses are for entering arrays.
If the root of the JSON data is itself an array like the following:
```
[
{ "gauges": { "jvm.buffers.direct.capacity": {"value": 215415}}}
]
```
The beginning of the key should start with ($index) as in this example:
```
./check_http_json.py -H localhost:8081 -p metrics --key_exists "(0)_gauges_jvm.buffers.direct.capacity_value" -f _
```
More info about Nagios Range format and Units of Measure can be found at [https://nagios-plugins.org/doc/guidelines.html](https://nagios-plugins.org/doc/guidelines.html).
### Docker Info Example Plugin
#### Description
Let's say we want to use `check_http_json.py` to read from Docker's `/info` HTTP API endpoint with the following parameters:
##### Connection information
* Host = 127.0.0.1:4243
* Path = /info
##### Rules for "aliveness"
* Verify that the key `Containers` exists in the outputted JSON
* Verify that the key `IPv4Forwarding` has a value of `1`
* Verify that the key `Debug` has a value less than or equal to `2`
* Verify that the key `Images` has a value greater than or equal to `1`
* If any of these criteria are not met, report a WARNING to Nagios
##### Gather Metrics
* Report value of the key `Containers` with a MinValue of 0 and a MaxValue of 1000 as performance data
* Report value of the key `Images` as performance data
* Report value of the key `NEventsListener` as performance data
* Report value of the key `NFd` as performance data
* Report value of the key `NGoroutines` as performance data
* Report value of the key `SwapLimit` as performance data
#### Service Definition
`localhost.cfg`
```
define service {
use local-service
host_name localhost
service_description Docker info status checker
check_command check_docker
}
```
#### Command Definition with Arguments
`commands.cfg`
```
define command{
command_name check_docker
command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H 127.0.0.1:4243 -p info -e Containers -q IPv4Forwarding,1 -l Debug,2 -g Images,1 -m Containers,,0,1000 Images NEventsListener NFd NGoroutines SwapLimit
}
```
#### Sample Output
```
OK: Status OK.|'Containers'=1;0;1000 'Images'=11;0;0 'NEventsListener'=3;0;0 'NFd'=10;0;0 'NGoroutines'=14;0;0 'SwapLimit'=1;0;0
```
### Docker Container Monitor Example Plugin
`check_http_json.py` is generic enough to read and evaluate rules on any HTTP endpoint that returns JSON. In this example we'll get the status of a specific container using it's ID which camn be found by using the list containers endpoint (`curl http://127.0.0.1:4243/containers/json?all=1`).
##### Connection information
* Host = 127.0.0.1:4243
* Path = /containers/2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615/json
##### Rules for "aliveness"
* Verify that the key `ID` exists and is equal to the value `2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615`
* Verify that the key `State.Running` has a value of `True`
#### Service Definition
`localhost.cfg`
```
define service {
use local-service
host_name localhost
service_description Docker container liveness check
check_command check_my_container
}
```
#### Command Definition with Arguments
`commands.cfg`
```
define command{
command_name check_my_container
command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H 127.0.0.1:4243 -p /containers/2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615/json -q ID,2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615 State.Running,True
}
```
#### Sample Output
```
WARNING: Status check failed, reason: Value True for key State.Running did not match.
```
The plugin threw a warning because the Container ID I used on my system has the following State object:
```
u'State': {...
u'Running': False,
...
```
If I change the command to have the parameter -q parameter `State.Running,False`, the output becomes:
```
OK: Status OK.
```
### Dropwizard / Fieldnames Containing '.' Example
Simply choose a separator to deal with data such as this:
```
{ "gauges": { "jvm.buffers.direct.capacity": {"value": 215415}}}
```
In this example I've chosen `_` to separate `guages` from `jvm` and `capacity` from `value`. The CLI invocation then becomes:
```
./check_http_json.py -H localhost:8081 -p metrics --key_exists gauges_jvm.buffers.direct.capacity_value -f _
```
## License ## License
Copyright 2014-2015 Drew Kerrigan. Copyright 2014-2015 Drew Kerrigan.

View File

@@ -1,47 +1,84 @@
#!/usr/bin/python #!/usr/bin/python2.7
plugin_description = \
""" """
Check HTTP JSON Nagios Plugin Check HTTP JSON Nagios Plugin
Generic Nagios plugin which checks json values from a given endpoint against argument specified rules Generic Nagios plugin which checks json values from a given endpoint against
and determines the status and performance data for that service. argument specified rules and determines the status and performance data for
that service.
""" """
import httplib, urllib, urllib2, base64 import urllib2
import base64
import json import json
import argparse import argparse
import sys
import ssl
from pprint import pprint from pprint import pprint
from urllib2 import HTTPError from urllib2 import HTTPError
from urllib2 import URLError from urllib2 import URLError
OK_CODE = 0
WARNING_CODE = 1
CRITICAL_CODE = 2
UNKNOWN_CODE = 3
__version__ = '1.4.0'
__version_date__ = '2019-05-09'
class NagiosHelper: class NagiosHelper:
"""Help with Nagios specific status string formatting.""" """Help with Nagios specific status string formatting."""
code = 0 message_prefixes = {OK_CODE: 'OK',
message_prefixes = {0: 'OK', 1: 'WARNING', 2: 'CRITICAL', 3: 'UNKNOWN'} WARNING_CODE: 'WARNING',
message_text = '' CRITICAL_CODE: 'CRITICAL',
UNKNOWN_CODE: 'UNKNOWN'}
performance_data = '' performance_data = ''
warning_message = ''
critical_message = ''
unknown_message = ''
def getMessage(self): def getMessage(self):
"""Build a status-prefixed message with optional performance data generated externally""" """Build a status-prefixed message with optional performance data
text = "%s" % self.message_prefixes[self.code] generated externally"""
if self.message_text: text = "%s: Status %s." % (self.message_prefixes[self.getCode()],
text += ": %s" % self.message_text self.message_prefixes[self.getCode()])
text += self.warning_message
text += self.critical_message
text += self.unknown_message
if self.performance_data: if self.performance_data:
text += "|%s" % self.performance_data text += "|%s" % self.performance_data
return text return text
def setCodeAndMessage(self, code, text): def getCode(self):
self.code = code code = OK_CODE
self.message_text = text if (self.warning_message != ''):
code = WARNING_CODE
if (self.critical_message != ''):
code = CRITICAL_CODE
if (self.unknown_message != ''):
code = UNKNOWN_CODE
return code
def append_warning(self, warning_message):
self.warning_message += warning_message
def append_critical(self, critical_message):
self.critical_message += critical_message
def append_unknown(self, unknown_message):
self.unknown_message += unknown_message
def append_metrics(self, (performance_data,
warning_message, critical_message)):
self.performance_data += performance_data
self.append_warning(warning_message)
self.append_critical(critical_message)
def ok(self, text): self.setCodeAndMessage(0, text)
def warning(self, text): self.setCodeAndMessage(1, text)
def critical(self, text): self.setCodeAndMessage(2, text)
def unknown(self, text): self.setCodeAndMessage(3, text)
class JsonHelper: class JsonHelper:
"""Perform simple comparison operations against values in a given JSON dict""" """Perform simple comparison operations against values in a given
JSON dict"""
def __init__(self, json_data, separator): def __init__(self, json_data, separator):
self.data = json_data self.data = json_data
self.separator = separator self.separator = separator
@@ -59,11 +96,11 @@ class JsonHelper:
def getSubArrayElement(self, key, data): def getSubArrayElement(self, key, data):
subElemKey = key[:key.find(self.arrayOpener)] subElemKey = key[:key.find(self.arrayOpener)]
index = int(key[key.find(self.arrayOpener) + 1:key.find(self.arrayCloser)]) index = int(key[key.find(self.arrayOpener) +
1:key.find(self.arrayCloser)])
remainingKey = key[key.find(self.arrayCloser + self.separator) + 2:] remainingKey = key[key.find(self.arrayCloser + self.separator) + 2:]
if key.find(self.arrayCloser + self.separator) == -1: if key.find(self.arrayCloser + self.separator) == -1:
remainingKey = key[key.find(self.arrayCloser) + 1:] remainingKey = key[key.find(self.arrayCloser) + 1:]
if subElemKey in data: if subElemKey in data:
if index < len(data[subElemKey]): if index < len(data[subElemKey]):
return self.get(remainingKey, data[subElemKey][index]) return self.get(remainingKey, data[subElemKey][index])
@@ -75,21 +112,36 @@ class JsonHelper:
else: else:
return (None, 'not_found') return (None, 'not_found')
def equals(self, key, value): return self.exists(key) and str(self.get(key)) == value def equals(self, key, value):
def lte(self, key, value): return self.exists(key) and float(self.get(key)) <= float(value) return self.exists(key) and \
def gte(self, key, value): return self.exists(key) and float(self.get(key)) >= float(value) str(self.get(key)) in value.split(':')
def exists(self, key): return (self.get(key) != (None, 'not_found'))
def lte(self, key, value):
return self.exists(key) and float(self.get(key)) <= float(value)
def lt(self, key, value):
return self.exists(key) and float(self.get(key)) < float(value)
def gte(self, key, value):
return self.exists(key) and float(self.get(key)) >= float(value)
def gt(self, key, value):
return self.exists(key) and float(self.get(key)) > float(value)
def exists(self, key):
return (self.get(key) != (None, 'not_found'))
def get(self, key, temp_data=''): def get(self, key, temp_data=''):
"""Can navigate nested json keys with a dot format (Element.Key.NestedKey). Returns (None, 'not_found') if not found""" """Can navigate nested json keys with a dot format
(Element.Key.NestedKey). Returns (None, 'not_found') if not found"""
if temp_data: if temp_data:
data = temp_data data = temp_data
else: else:
data = self.data data = self.data
if len(key) <= 0: if len(key) <= 0:
return data return data
if key.find(self.separator) != -1 and \
if key.find(self.separator) != -1 and key.find(self.arrayOpener) != -1 : key.find(self.arrayOpener) != -1:
if key.find(self.separator) < key.find(self.arrayOpener): if key.find(self.separator) < key.find(self.arrayOpener):
return self.getSubElement(key, data) return self.getSubElement(key, data)
else: else:
@@ -106,158 +158,625 @@ class JsonHelper:
else: else:
return (None, 'not_found') return (None, 'not_found')
def expandKey(self, key, keys):
if '(*)' not in key:
keys.append(key)
return keys
subElemKey = ''
if key.find('(*)') > 0:
subElemKey = key[:key.find('(*)')-1]
remainingKey = key[key.find('(*)')+3:]
elemData = self.get(subElemKey)
if elemData is (None, 'not_found'):
keys.append(key)
return keys
if subElemKey is not '':
subElemKey = subElemKey + '.'
for i in range(len(elemData)):
newKey = subElemKey + '(' + str(i) + ')' + remainingKey
newKeys = self.expandKey(newKey, [])
for j in newKeys:
keys.append(j)
return keys
def _getKeyAlias(original_key):
key = original_key
alias = original_key
if '>' in original_key:
keys = original_key.split('>')
if len(keys) == 2:
key, alias = keys
return key, alias
class JsonRuleProcessor: class JsonRuleProcessor:
"""Perform checks and gather values from a JSON dict given rules and metrics definitions""" """Perform checks and gather values from a JSON dict given rules
and metrics definitions"""
def __init__(self, json_data, rules_args): def __init__(self, json_data, rules_args):
self.data = json_data self.data = json_data
self.rules = rules_args self.rules = rules_args
separator = '.' separator = '.'
if self.rules.separator: separator = self.rules.separator if self.rules.separator:
separator = self.rules.separator
self.helper = JsonHelper(self.data, separator) self.helper = JsonHelper(self.data, separator)
debugPrint(rules_args.debug, "rules:%s" % rules_args)
debugPrint(rules_args.debug, "separator:%s" % separator) debugPrint(rules_args.debug, "separator:%s" % separator)
self.metric_list = self.expandKeys(self.rules.metric_list)
self.key_threshold_warning = self.expandKeys(
self.rules.key_threshold_warning)
self.key_threshold_critical = self.expandKeys(
self.rules.key_threshold_critical)
self.key_value_list = self.expandKeys(self.rules.key_value_list)
self.key_value_list_not = self.expandKeys(
self.rules.key_value_list_not)
self.key_list = self.expandKeys(self.rules.key_list)
self.key_value_list_critical = self.expandKeys(
self.rules.key_value_list_critical)
self.key_value_list_not_critical = self.expandKeys(
self.rules.key_value_list_not_critical)
self.key_list_critical = self.expandKeys(self.rules.key_list_critical)
self.key_value_list_unknown = self.expandKeys(
self.rules.key_value_list_unknown)
def isAlive(self): def expandKeys(self, src):
"""Return a tuple with liveness and reason for not liveness given existence, equality, and comparison rules""" if src is None:
reason = '' return
dest = []
for key in src:
newKeys = self.helper.expandKey(key, [])
for k in newKeys:
dest.append(k)
return dest
if self.rules.key_list != None: def checkExists(self, exists_list):
for k in self.rules.key_list: failure = ''
if (self.helper.exists(k) == False): for k in exists_list:
reason += " Key %s did not exist." % k key, alias = _getKeyAlias(k)
if (self.helper.exists(key) is False):
failure += " Key %s did not exist." % alias
return failure
if self.rules.key_value_list != None: def checkEquality(self, equality_list):
for kv in self.rules.key_value_list: failure = ''
for kv in equality_list:
k, v = kv.split(',') k, v = kv.split(',')
if (self.helper.equals(k, v) == False): key, alias = _getKeyAlias(k)
reason += " Value %s for key %s did not match." % (v, k) if (self.helper.equals(key, v) == False):
failure += " Key %s mismatch. %s != %s" % (alias, v,
self.helper.get(key))
return failure
if self.rules.key_lte_list != None: def checkNonEquality(self, equality_list):
for kv in self.rules.key_lte_list: failure = ''
for kv in equality_list:
k, v = kv.split(',') k, v = kv.split(',')
if (self.helper.lte(k, v) == False): key, alias = _getKeyAlias(k)
reason += " Value %s was not less than or equal to value for key %s." % (v, k) if (self.helper.equals(key, v) == True):
failure += " Key %s match found. %s == %s" % (alias, v,
self.helper.get(key))
return failure
if self.rules.key_gte_list != None: def checkThreshold(self, key, alias, r):
for kv in self.rules.key_gte_list: failure = ''
k, v = kv.split(',') invert = False
if (self.helper.gte(k, v) == False): start = 0
reason += " Value %s was not greater than or equal to value for key %s." % (v, k) end = 'infinity'
if r.startswith('@'):
invert = True
r = r[1:]
vals = r.split(':')
if len(vals) == 1:
end = vals[0]
if len(vals) == 2:
start = vals[0]
if vals[1] != '':
end = vals[1]
if(start == '~'):
if (invert and self.helper.lte(key, end)):
failure += " Value (%s) for key %s was less than or equal to %s." % \
(self.helper.get(key), alias, end)
elif (not invert and self.helper.gt(key, end)):
failure += " Value (%s) for key %s was greater than %s." % \
(self.helper.get(key), alias, end)
elif(end == 'infinity'):
if (invert and self.helper.gte(key, start)):
failure += " Value (%s) for key %s was greater than or equal to %s." % \
(self.helper.get(key), alias, start)
elif (not invert and self.helper.lt(key, start)):
failure += " Value (%s) for key %s was less than %s." % \
(self.helper.get(key), alias, start)
else:
if (invert and self.helper.gte(key, start) and
self.helper.lte(key, end)):
failure += " Value (%s) for key %s was inside the range %s:%s." % \
(self.helper.get(key), alias, start, end)
elif (not invert and (self.helper.lt(key, start) or
self.helper.gt(key, end))):
failure += " Value (%s) for key %s was outside the range %s:%s." % \
(self.helper.get(key), alias, start, end)
is_alive = (reason == '') return failure
return (is_alive, reason) def checkThresholds(self, threshold_list):
failure = ''
for threshold in threshold_list:
k, r = threshold.split(',')
key, alias = _getKeyAlias(k)
failure += self.checkThreshold(key, alias, r)
return failure
def getMetrics(self): def checkWarning(self):
"""Return a Nagios specific performance metrics string given keys and parameter definitions""" failure = ''
if self.key_threshold_warning is not None:
failure += self.checkThresholds(self.key_threshold_warning)
if self.key_value_list is not None:
failure += self.checkEquality(self.key_value_list)
if self.key_value_list_not is not None:
failure += self.checkNonEquality(self.key_value_list_not)
if self.key_list is not None:
failure += self.checkExists(self.key_list)
return failure
def checkCritical(self):
failure = ''
if self.key_threshold_critical is not None:
failure += self.checkThresholds(self.key_threshold_critical)
if self.key_value_list_critical is not None:
failure += self.checkEquality(self.key_value_list_critical)
if self.key_value_list_not_critical is not None:
failure += self.checkNonEquality(self.key_value_list_not_critical)
if self.key_list_critical is not None:
failure += self.checkExists(self.key_list_critical)
return failure
def checkUnknown(self):
unknown = ''
if self.key_value_list_unknown is not None:
unknown += self.checkEquality(self.key_value_list_unknown)
return unknown
def checkMetrics(self):
"""Return a Nagios specific performance metrics string given keys
and parameter definitions"""
metrics = '' metrics = ''
warning = ''
if self.rules.metric_list != None: critical = ''
for metric in self.rules.metric_list: if self.metric_list is not None:
for metric in self.metric_list:
key = metric key = metric
minimum = maximum = warn_range = crit_range = 0 minimum = maximum = warn_range = crit_range = None
uom = '' uom = ''
if ',' in metric: if ',' in metric:
vals = metric.split(',') vals = metric.split(',')
if len(vals) == 2: if len(vals) == 2:
key, uom = vals key, uom = vals
if len(vals) == 4: if len(vals) == 4:
key,uom,minimum,maximum = vals key, uom, warn_range, crit_range = vals
if len(vals) == 6: if len(vals) == 6:
key,uom,minimum,maximum,warn_range,crit_range = vals key, uom, warn_range, crit_range, \
minimum, maximum = vals
key, alias = _getKeyAlias(key)
if self.helper.exists(key): if self.helper.exists(key):
metrics += "'%s'=%s" % (key, self.helper.get(key)) metrics += "'%s'=%s" % (alias, self.helper.get(key))
if uom: metrics += uom if uom:
metrics += uom
if warn_range is not None:
warning += self.checkThreshold(key, alias, warn_range)
metrics += ";%s" % warn_range
if crit_range is not None:
critical += self.checkThreshold(key, alias, crit_range)
metrics += ";%s" % crit_range
if minimum is not None:
critical += self.checkThreshold(key, alias, minimum +
':')
metrics += ";%s" % minimum metrics += ";%s" % minimum
if maximum is not None:
critical += self.checkThreshold(key, alias, '~:' +
maximum)
metrics += ";%s" % maximum metrics += ";%s" % maximum
if warn_range: metrics += ";%s" % warn_range
if crit_range: metrics += ";%s" % crit_range
metrics += ' ' metrics += ' '
return ("%s" % metrics, warning, critical)
return "%s" % metrics
def parseArgs(): def parseArgs():
parser = argparse.ArgumentParser(description= parser = argparse.ArgumentParser(
'Nagios plugin which checks json values from a given endpoint against argument specified rules\ description = plugin_description + '\n\nVersion: %s (%s)'
and determines the status and performance data for that service') %(__version__, __version_date__),
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('-H', '--host', dest='host', required=True, help='Host.') parser.add_argument('-d', '--debug', action='store_true',
parser.add_argument('-B', '--basic-auth', dest='auth', required=False, help='Basic auth string "username:password"') help='debug mode')
parser.add_argument('-p', '--path', dest='path', help='Path.') parser.add_argument('-s', '--ssl', action='store_true',
help='use TLS to connect to remote host')
parser.add_argument('-H', '--host', dest='host',
required=not ('-V' in sys.argv or '--version' in sys.argv),
help='remote host to query')
parser.add_argument('-k', '--insecure', action='store_true',
help='do not check server SSL certificate')
parser.add_argument('-V', '--version', action='store_true',
help='print version of this plugin')
parser.add_argument('--cacert',
dest='cacert', help='SSL CA certificate')
parser.add_argument('--cert',
dest='cert', help='SSL client certificate')
parser.add_argument('--key', dest='key',
help='SSL client key ( if not bundled into the cert )')
parser.add_argument('-P', '--port', dest='port', help='TCP port')
parser.add_argument('-p', '--path', dest='path', help='Path')
parser.add_argument('-t', '--timeout', type=int,
help='Connection timeout (seconds)')
parser.add_argument('-B', '--basic-auth', dest='auth',
help='Basic auth string "username:password"')
parser.add_argument('-D', '--data', dest='data',
help='The http payload to send as a POST')
parser.add_argument('-A', '--headers', dest='headers',
help='The http headers in JSON format.')
parser.add_argument('-f', '--field_separator', dest='separator',
help='''JSON Field separator, defaults to ".";
Select element in an array with "(" ")"''')
parser.add_argument('-w', '--warning', dest='key_threshold_warning',
nargs='*',
help='''Warning threshold for these values
(key1[>alias],WarnRange key2[>alias],WarnRange).
WarnRange is in the format [@]start:end, more
information at
nagios-plugins.org/doc/guidelines.html.''')
parser.add_argument('-c', '--critical', dest='key_threshold_critical',
nargs='*',
help='''Critical threshold for these values
(key1[>alias],CriticalRange key2[>alias],CriticalRange.
CriticalRange is in the format [@]start:end, more
information at
nagios-plugins.org/doc/guidelines.html.''')
parser.add_argument('-e', '--key_exists', dest='key_list', nargs='*', parser.add_argument('-e', '--key_exists', dest='key_list', nargs='*',
help='Checks existence of these keys to determine status.') help='''Checks existence of these keys to determine
status. Return warning if key is not present.''')
parser.add_argument('-E', '--key_exists_critical',
dest='key_list_critical',
nargs='*',
help='''Same as -e but return critical if key is
not present.''')
parser.add_argument('-q', '--key_equals', dest='key_value_list', nargs='*', parser.add_argument('-q', '--key_equals', dest='key_value_list', nargs='*',
help='Checks equality of these keys and values (key,value key2,value2) to determine status.') help='''Checks equality of these keys and values
parser.add_argument('-l', '--key_lte', dest='key_lte_list', nargs='*', (key[>alias],value key2,value2) to determine status.
help='Checks that these keys and values (key,value key2,value2) are less than or equal to\ Multiple key values can be delimited with colon
the returned json value to determine status.') (key,value1:value2). Return warning if equality
parser.add_argument('-g', '--key_gte', dest='key_gte_list', nargs='*', check fails''')
help='Checks that these keys and values (key,value key2,value2) are greater than or equal to\ parser.add_argument('-Q', '--key_equals_critical',
the returned json value to determine status.') dest='key_value_list_critical', nargs='*',
help='''Same as -q but return critical if
equality check fails.''')
parser.add_argument('-u', '--key_equals_unknown',
dest='key_value_list_unknown', nargs='*',
help='''Same as -q but return unknown if
equality check fails.''')
parser.add_argument('-y', '--key_not_equals',
dest='key_value_list_not', nargs='*',
help='''Checks equality of these keys and values
(key[>alias],value key2,value2) to determine status.
Multiple key values can be delimited with colon
(key,value1:value2). Return warning if equality
check succeeds''')
parser.add_argument('-Y', '--key_not_equals_critical',
dest='key_value_list_not_critical', nargs='*',
help='''Same as -q but return critical if equality
check succeeds.''')
parser.add_argument('-m', '--key_metric', dest='metric_list', nargs='*', parser.add_argument('-m', '--key_metric', dest='metric_list', nargs='*',
help='Gathers the values of these keys (key,UnitOfMeasure,Min,Max,WarnRange,CriticalRange) for Nagios performance data.\ help='''Gathers the values of these keys (key[>alias],
More information about Range format and units of measure for nagios can be found at https://nagios-plugins.org/doc/guidelines.html\ UnitOfMeasure,WarnRange,CriticalRange,Min,Max) for
Additional formats for this parameter are: (key), (key,UnitOfMeasure), (key,UnitOfMeasure,Min,Max).') Nagios performance data. More information about Range
parser.add_argument('-s', '--ssl', action='store_true', help='HTTPS mode.') format and units of measure for nagios can be found at
parser.add_argument('-f', '--field_separator', dest='separator', help='Json Field separator, defaults to "." ; Select element in an array with "(" ")"') nagios-plugins.org/doc/guidelines.html
parser.add_argument('-d', '--debug', action='store_true', help='Debug mode.') Additional formats for this parameter are:
(key[>alias]), (key[>alias],UnitOfMeasure),
(key[>alias],UnitOfMeasure,WarnRange,
CriticalRange).''')
return parser.parse_args() return parser.parse_args()
def debugPrint(debug_flag, message, pretty_flag=False): def debugPrint(debug_flag, message, pretty_flag=False):
if debug_flag: if debug_flag:
if pretty_flag: if pretty_flag:
pprint(message) pprint(message)
else: else:
print message print(message)
if __name__ == "__main__" and len(sys.argv) >= 2 and sys.argv[1] == 'UnitTest':
import unittest
class RulesHelper:
separator = '.'
debug = False
key_threshold_warning = None
key_value_list = None
key_value_list_not = None
key_list = None
key_threshold_critical = None
key_value_list_critical = None
key_value_list_not_critical = None
key_value_list_unknown = None
key_list_critical = None
metric_list = None
def dash_m(self, data):
self.metric_list = data
return self
def dash_e(self, data):
self.key_list = data
return self
def dash_E(self, data):
self.key_list_critical = data
return self
def dash_q(self, data):
self.key_value_list = data
return self
def dash_Q(self, data):
self.key_value_list_critical = data
return self
def dash_y(self, data):
self.key_value_list_not = data
return self
def dash_Y(self, data):
self.key_value_list_not_critical = data
return self
def dash_w(self, data):
self.key_threshold_warning = data
return self
def dash_c(self, data):
self.key_threshold_critical = data
return self
class UnitTest(unittest.TestCase):
rules = RulesHelper()
def check_data(self, args, jsondata, code):
data = json.loads(jsondata)
nagios = NagiosHelper()
processor = JsonRuleProcessor(data, args)
nagios.append_warning(processor.checkWarning())
nagios.append_critical(processor.checkCritical())
nagios.append_metrics(processor.checkMetrics())
self.assertEqual(code, nagios.getCode())
def test_metrics(self):
self.check_data(RulesHelper().dash_m(['metric,,1:4,1:5']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_m(['metric,,1:5,1:4']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_m(['metric,,1:5,1:5,6,10']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_m(['metric,,1:5,1:5,1,4']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_m(['metric,s,@1:4,@6:10,1,10']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_m(['(*).value,s,1:5,1:5']),
'[{"value": 5},{"value": 100}]', CRITICAL_CODE)
def test_exists(self):
self.check_data(RulesHelper().dash_e(['nothere']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_E(['nothere']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_e(['metric']),
'{"metric": 5}', OK_CODE)
def test_equality(self):
self.check_data(RulesHelper().dash_q(['metric,6']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_Q(['metric,6']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_q(['metric,5']),
'{"metric": 5}', OK_CODE)
def test_non_equality(self):
self.check_data(RulesHelper().dash_y(['metric,6']),
'{"metric": 6}', WARNING_CODE)
self.check_data(RulesHelper().dash_Y(['metric,6']),
'{"metric": 6}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_y(['metric,5']),
'{"metric": 6}', OK_CODE)
def test_warning_thresholds(self):
self.check_data(RulesHelper().dash_w(['metric,5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,5:']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,~:5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,1:5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,@5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,@5:']),
'{"metric": 4}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,@~:5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,@1:5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_w(['metric,5']),
'{"metric": 6}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,5:']),
'{"metric": 4}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,~:5']),
'{"metric": 6}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,1:5']),
'{"metric": 6}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,@5']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,@5:']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,@~:5']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['metric,@1:5']),
'{"metric": 5}', WARNING_CODE)
self.check_data(RulesHelper().dash_w(['(*).value,@1:5']),
'[{"value": 5},{"value": 1000}]', WARNING_CODE)
def test_critical_thresholds(self):
self.check_data(RulesHelper().dash_c(['metric,5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,5:']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,~:5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,1:5']),
'{"metric": 5}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,@5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,@5:']),
'{"metric": 4}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,@~:5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,@1:5']),
'{"metric": 6}', OK_CODE)
self.check_data(RulesHelper().dash_c(['metric,5']),
'{"metric": 6}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,5:']),
'{"metric": 4}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,~:5']),
'{"metric": 6}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,1:5']),
'{"metric": 6}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,@5']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,@5:']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,@~:5']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['metric,@1:5']),
'{"metric": 5}', CRITICAL_CODE)
self.check_data(RulesHelper().dash_c(['(*).value,@1:5']),
'[{"value": 5},{"value": 1000}]', CRITICAL_CODE)
def test_separator(self):
rules = RulesHelper()
rules.separator = '_'
self.check_data(
rules.dash_q(
['(0)_gauges_jvm.buffers.direct.capacity(1)_value,1234']),
'''[{ "gauges": { "jvm.buffers.direct.capacity": [
{"value": 215415},{"value": 1234}]}}]''',
OK_CODE)
self.check_data(
rules.dash_q(
['(*)_gauges_jvm.buffers.direct.capacity(1)_value,1234']),
'''[{ "gauges": { "jvm.buffers.direct.capacity": [
{"value": 215415},{"value": 1234}]}},
{ "gauges": { "jvm.buffers.direct.capacity": [
{"value": 215415},{"value": 1235}]}}]''',
WARNING_CODE)
unittest.main()
exit(0)
"""Program entry point""" """Program entry point"""
if __name__ == "__main__": if __name__ == "__main__":
args = parseArgs() args = parseArgs()
nagios = NagiosHelper() nagios = NagiosHelper()
if args.version:
print('Version: %s - Date: %s' % (__version__, __version_date__))
exit(0)
if args.ssl: if args.ssl:
url = "https://%s" % args.host url = "https://%s" % args.host
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.options |= ssl.OP_NO_SSLv2
context.options |= ssl.OP_NO_SSLv3
if args.insecure:
context.verify_mode = ssl.CERT_NONE
else:
context.verify_mode = ssl.CERT_OPTIONAL
if args.cacert:
try:
context.load_verify_locations(args.cacert)
except ssl.SSLError:
nagios.append_unknown(
''' Error loading SSL CA cert "%s"!'''
% args.cacert)
if args.cert:
try:
context.load_cert_chain(args.cert,keyfile=args.key)
except ssl.SSLError:
if args.key:
nagios.append_unknown(
''' Error loading SSL cert. Make sure key "%s" belongs to cert "%s"!'''
% (args.key, args.cert))
else:
nagios.append_unknown(
''' Error loading SSL cert. Make sure "%s" contains the key as well!'''
% (args.cert))
if nagios.getCode() != OK_CODE:
print(nagios.getMessage())
exit(nagios.getCode())
else: else:
url = "http://%s" % args.host url = "http://%s" % args.host
if args.port:
if args.path: url += "/%s" % args.path url += ":%s" % args.port
if args.path:
url += "/%s" % args.path
debugPrint(args.debug, "url:%s" % url) debugPrint(args.debug, "url:%s" % url)
json_data = ''
# Attempt to reach the endpoint
try: try:
req = urllib2.Request(url) req = urllib2.Request(url)
req.add_header("User-Agent", "check_http_json")
if args.auth: if args.auth:
base64str = base64.encodestring(args.auth).replace('\n', '') base64str = base64.encodestring(args.auth).replace('\n', '')
req.add_header('Authorization', 'Basic %s' % base64str) req.add_header('Authorization', 'Basic %s' % base64str)
response = urllib2.urlopen(req) if args.headers:
except HTTPError as e: headers = json.loads(args.headers)
nagios.unknown("HTTPError[%s], url:%s" % (str(e.code), url)) debugPrint(args.debug, "Headers:\n %s" % headers)
except URLError as e: for header in headers:
nagios.critical("URLError[%s], url:%s" % (str(e.reason), url)) req.add_header(header, headers[header])
if args.timeout and args.data:
response = urllib2.urlopen(req, timeout=args.timeout,
data=args.data, context=context)
elif args.timeout:
response = urllib2.urlopen(req, timeout=args.timeout,
context=context)
elif args.data:
response = urllib2.urlopen(req, data=args.data, context=context)
else: else:
jsondata = response.read() response = urllib2.urlopen(req, context=context)
data = json.loads(jsondata)
json_data = response.read()
except HTTPError as e:
nagios.append_unknown(" HTTPError[%s], url:%s" % (str(e.code), url))
except URLError as e:
nagios.append_critical(" URLError[%s], url:%s" % (str(e.reason), url))
try:
data = json.loads(json_data)
except ValueError as e:
nagios.append_unknown(" Parser error: %s" % str(e))
else:
debugPrint(args.debug, 'json:') debugPrint(args.debug, 'json:')
debugPrint(args.debug, data, True) debugPrint(args.debug, data, True)
# Apply rules to returned JSON data # Apply rules to returned JSON data
processor = JsonRuleProcessor(data, args) processor = JsonRuleProcessor(data, args)
is_alive, reason = processor.isAlive() nagios.append_warning(processor.checkWarning())
nagios.append_critical(processor.checkCritical())
if is_alive: nagios.append_metrics(processor.checkMetrics())
# Rules all passed, attempt to get performance data nagios.append_unknown(processor.checkUnknown())
nagios.performance_data = processor.getMetrics()
nagios.ok("Status OK.")
else:
nagios.warning("Status check failed, reason:%s" % reason)
# Print Nagios specific string and exit appropriately # Print Nagios specific string and exit appropriately
print nagios.getMessage() print(nagios.getMessage())
exit(nagios.code) exit(nagios.getCode())
#EOF

115
docs/DOCKER.md Normal file
View File

@@ -0,0 +1,115 @@
### Docker Info Example Plugin
#### Description
Let's say we want to use `check_http_json.py` to read from Docker's `/info` HTTP API endpoint with the following parameters:
##### Connection information
* Host = 127.0.0.1:4243
* Path = /info
##### Rules for "aliveness"
* Verify that the key `Containers` exists in the outputted JSON
* Verify that the key `IPv4Forwarding` has a value of `1`
* Verify that the key `Debug` has a value less than or equal to `2`
* Verify that the key `Images` has a value greater than or equal to `1`
* If any of these criteria are not met, report a WARNING to Nagios
##### Gather Metrics
* Report value of the key `Containers` with a MinValue of 0 and a MaxValue of 1000 as performance data
* Report value of the key `Images` as performance data
* Report value of the key `NEventsListener` as performance data
* Report value of the key `NFd` as performance data
* Report value of the key `NGoroutines` as performance data
* Report value of the key `SwapLimit` as performance data
#### Service Definition
`localhost.cfg`
```
define service {
use local-service
host_name localhost
service_description Docker info status checker
check_command check_docker
}
```
#### Command Definition with Arguments
`commands.cfg`
```
define command{
command_name check_docker
command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H 127.0.0.1:4243 -p info -e Containers -q IPv4Forwarding,1 -w Debug,2:2 -c Images,1:1 -m Containers,0:250,0:500,0,1000 Images NEventsListener NFd NGoroutines SwapLimit
}
```
#### Sample Output
```
OK: Status OK.|'Containers'=1;0;1000 'Images'=11;0;0 'NEventsListener'=3;0;0 'NFd'=10;0;0 'NGoroutines'=14;0;0 'SwapLimit'=1;0;0
```
### Docker Container Monitor Example Plugin
`check_http_json.py` is generic enough to read and evaluate rules on any HTTP endpoint that returns JSON. In this example we'll get the status of a specific container using it's ID which camn be found by using the list containers endpoint (`curl http://127.0.0.1:4243/containers/json?all=1`).
##### Connection information
* Host = 127.0.0.1:4243
* Path = /containers/2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615/json
##### Rules for "aliveness"
* Verify that the key `ID` exists and is equal to the value `2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615`
* Verify that the key `State.Running` has a value of `True`
#### Service Definition
`localhost.cfg`
```
define service {
use local-service
host_name localhost
service_description Docker container liveness check
check_command check_my_container
}
```
#### Command Definition with Arguments
`commands.cfg`
```
define command{
command_name check_my_container
command_line /usr/bin/python /usr/local/nagios/libexec/plugins/check_http_json.py -H 127.0.0.1:4243 -p /containers/2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615/json -q ID,2356e8ccb3de8308ccb16cf8f5d157bc85ded5c3d8327b0dfb11818222b6f615 State.Running,True
}
```
#### Sample Output
```
WARNING: Status check failed, reason: Value True for key State.Running did not match.
```
The plugin threw a warning because the Container ID I used on my system has the following State object:
```
u'State': {...
u'Running': False,
...
```
If I change the command to have the parameter -q parameter `State.Running,False`, the output becomes:
```
OK: Status OK.
```

227
docs/RIAK.md Normal file
View File

@@ -0,0 +1,227 @@
# Riak Stats Example
## Description
For this example we're going to use `check_http_json.py` as a pure CLI tool to read Riak's `/stats` endpoint
## Connection information
* Host = 127.0.0.1:8098
* Path = /stats
## JSON Stats Data
* Full Riak HTTP Stats information can be found here: [http://docs.basho.com/riak/latest/dev/references/http/status/](http://docs.basho.com/riak/latest/dev/references/http/status/)
* Information related to specific interesting stats can be found here: [http://docs.basho.com/riak/latest/ops/running/stats-and-monitoring/](http://docs.basho.com/riak/latest/ops/running/stats-and-monitoring/)
## Connectivity Check
* `ring_members`: We can use an existence check to monitor the number of ring members
* `connected_nodes`: Similarly we can check the number of nodes that are in communication with this node, but this list will be empty in a 1 node cluster
#### Sample Command
For a single node dev "cluster", you might have a `ring_members` value like this:
```
"ring_members": [
"riak@127.0.0.1"
],
```
To validate that we have a single node, we can use this check:
```
$ ./check_http_json.py -H localhost -P 8098 -p stats -E "ring_members(0)"
OK: Status OK.
```
If we were expecting at least 2 nodes in the cluster, we would use this check:
```
$ ./check_http_json.py -H localhost -P 8098 -p stats -E "ring_members(1)"
CRITICAL: Status CRITICAL. Key ring_members(1) did not exist.
```
Obviously this fails because we only had a single `ring_member`. If we prefer to only get a warning instead of a critical for this check, we just use the correct flag:
```
$ ./check_http_json.py -H localhost -P 8098 -p stats -e "ring_members(1)"
WARNING: Status WARNING. Key ring_members(1) did not exist.
```
## Gather Metrics
The thresholds for acceptable values for these metrics will vary from system to system, following are the stats we'll be checking:
### Throughput Metrics:
* `node_gets`
* `node_puts`
* `vnode_counter_update`
* `vnode_set_update`
* `vnode_map_update`
* `search_query_throughput_one`
* `search_index_throughtput_one`
* `consistent_gets`
* `consistent_puts`
* `vnode_index_reads`
#### Sample Command
```
./check_http_json.py -H localhost -P 8098 -p stats -m \
"node_gets" \
"node_puts" \
"vnode_counter_update" \
"vnode_set_update" \
"vnode_map_update" \
"search_query_throughput_one" \
"search_index_throughtput_one" \
"consistent_gets" \
"consistent_puts" \
"vnode_index_reads"
```
#### Sample Output
```
OK: Status OK.|'node_gets'=0 'node_puts'=0 'vnode_counter_update'=0 'vnode_set_update'=0 'vnode_map_update'=0 'search_query_throughput_one'=0 'consistent_gets'=0 'consistent_puts'=0 'vnode_index_reads'=0
```
### Latency Metrics:
* `node_get_fsm_time_mean,_median,_95,_99,_100`
* `node_put_fsm_time_mean,_median,_95,_99,_100`
* `object_counter_merge_time_mean,_median,_95,_99,_100`
* `object_set_merge_time_mean,_median,_95,_99,_100`
* `object_map_merge_time_mean,_median,_95,_99,_100`
* `search_query_latency_median,_min,_95,_99,_999`
* `search_index_latency_median,_min,_95,_99,_999`
* `consistent_get_time_mean,_median,_95,_99,_100`
* `consistent_put_time_mean,_median,_95,_99,_100`
#### Sample Command
```
./check_http_json.py -H localhost -P 8098 -p stats -m \
"node_get_fsm_time_mean,,0:100,0:1000" \
"node_get_fsm_time_median,,0:100,0:1000" \
"node_get_fsm_time_95,,0:100,0:1000" \
"node_get_fsm_time_99,,0:100,0:1000" \
"node_get_fsm_time_100,,0:100,0:1000" \
"node_put_fsm_time_mean,,0:100,0:1000" \
"node_put_fsm_time_median,,0:100,0:1000" \
"node_put_fsm_time_95,,0:100,0:1000" \
"node_put_fsm_time_99,,0:100,0:1000" \
"node_put_fsm_time_100,,0:100,0:1000" \
"object_counter_merge_time_mean,,0:100,0:1000" \
"object_counter_merge_time_median,,0:100,0:1000" \
"object_counter_merge_time_95,,0:100,0:1000" \
"object_counter_merge_time_99,,0:100,0:1000" \
"object_counter_merge_time_100,,0:100,0:1000" \
"object_set_merge_time_mean,,0:100,0:1000" \
"object_set_merge_time_median,,0:100,0:1000" \
"object_set_merge_time_95,,0:100,0:1000" \
"object_set_merge_time_99,,0:100,0:1000" \
"object_set_merge_time_100,,0:100,0:1000" \
"object_map_merge_time_mean,,0:100,0:1000" \
"object_map_merge_time_median,,0:100,0:1000" \
"object_map_merge_time_95,,0:100,0:1000" \
"object_map_merge_time_99,,0:100,0:1000" \
"object_map_merge_time_100,,0:100,0:1000" \
"consistent_get_time_mean,,0:100,0:1000" \
"consistent_get_time_median,,0:100,0:1000" \
"consistent_get_time_95,,0:100,0:1000" \
"consistent_get_time_99,,0:100,0:1000" \
"consistent_get_time_100,,0:100,0:1000" \
"consistent_put_time_mean,,0:100,0:1000" \
"consistent_put_time_median,,0:100,0:1000" \
"consistent_put_time_95,,0:100,0:1000" \
"consistent_put_time_99,,0:100,0:1000" \
"consistent_put_time_100,,0:100,0:1000" \
"search_query_latency_median,,0:100,0:1000" \
"search_query_latency_min,,0:100,0:1000" \
"search_query_latency_95,,0:100,0:1000" \
"search_query_latency_99,,0:100,0:1000" \
"search_query_latency_999,,0:100,0:1000" \
"search_index_latency_median,,0:100,0:1000" \
"search_index_latency_min,,0:100,0:1000" \
"search_index_latency_95,,0:100,0:1000" \
"search_index_latency_99,,0:100,0:1000" \
"search_index_latency_999,,0:100,0:1000"
```
#### Sample Output
```
OK: Status OK.|'node_get_fsm_time_mean'=0;0:100;0:1000 'node_get_fsm_time_median'=0;0:100;0:1000 'node_get_fsm_time_95'=0;0:100;0:1000 'node_get_fsm_time_99'=0;0:100;0:1000 'node_get_fsm_time_100'=0;0:100;0:1000 'node_put_fsm_time_mean'=0;0:100;0:1000 'node_put_fsm_time_median'=0;0:100;0:1000 'node_put_fsm_time_95'=0;0:100;0:1000 'node_put_fsm_time_99'=0;0:100;0:1000 'node_put_fsm_time_100'=0;0:100;0:1000 'object_counter_merge_time_mean'=0;0:100;0:1000 'object_counter_merge_time_median'=0;0:100;0:1000 'object_counter_merge_time_95'=0;0:100;0:1000 'object_counter_merge_time_99'=0;0:100;0:1000 'object_counter_merge_time_100'=0;0:100;0:1000 'object_set_merge_time_mean'=0;0:100;0:1000 'object_set_merge_time_median'=0;0:100;0:1000 'object_set_merge_time_95'=0;0:100;0:1000 'object_set_merge_time_99'=0;0:100;0:1000 'object_set_merge_time_100'=0;0:100;0:1000 'object_map_merge_time_mean'=0;0:100;0:1000 'object_map_merge_time_median'=0;0:100;0:1000 'object_map_merge_time_95'=0;0:100;0:1000 'object_map_merge_time_99'=0;0:100;0:1000 'object_map_merge_time_100'=0;0:100;0:1000 'consistent_get_time_mean'=0;0:100;0:1000 'consistent_get_time_median'=0;0:100;0:1000 'consistent_get_time_95'=0;0:100;0:1000 'consistent_get_time_99'=0;0:100;0:1000 'consistent_get_time_100'=0;0:100;0:1000 'consistent_put_time_mean'=0;0:100;0:1000 'consistent_put_time_median'=0;0:100;0:1000 'consistent_put_time_95'=0;0:100;0:1000 'consistent_put_time_99'=0;0:100;0:1000 'consistent_put_time_100'=0;0:100;0:1000 'search_query_latency_median'=0;0:100;0:1000 'search_query_latency_min'=0;0:100;0:1000 'search_query_latency_95'=0;0:100;0:1000 'search_query_latency_99'=0;0:100;0:1000 'search_query_latency_999'=0;0:100;0:1000 'search_index_latency_median'=0;0:100;0:1000 'search_index_latency_min'=0;0:100;0:1000 'search_index_latency_95'=0;0:100;0:1000 'search_index_latency_99'=0;0:100;0:1000 'search_index_latency_999'=0;0:100;0:1000
```
### Erlang Resource Usage Metrics:
* `sys_process_count`
* `memory_processes`
* `memory_processes_used`
#### Sample Command
```
./check_http_json.py -H localhost -P 8098 -p stats -m \
"sys_process_count,,0:5000,0:10000" \
"memory_processes,,0:50000000,0:100000000" \
"memory_processes_used,,0:50000000,0:100000000"
```
#### Sample Output
```
OK: Status OK.|'sys_process_count'=1637;0:5000;0:10000 'memory_processes'=46481112;0:50000000;0:100000000 'memory_processes_used'=46476880;0:50000000;0:100000000
```
### General Riak Load / Health Metrics:
* `node_get_fsm_siblings_mean,_median,_95,_99,_100`
* `node_get_fsm_objsize_mean,_median,_95,_99,_100`
* `riak_search_vnodeq_mean,_median,_95,_99,_100`
* `search_index_fail_one`
* `pbc_active`
* `pbc_connects`
* `read_repairs`
* `list_fsm_active`
* `node_get_fsm_rejected`
* `node_put_fsm_rejected`
#### Sample Command
```
./check_http_json.py -H localhost -P 8098 -p stats -m \
"node_get_fsm_siblings_mean,,0:100,0:1000" \
"node_get_fsm_siblings_median,,0:100,0:1000" \
"node_get_fsm_siblings_95,,0:100,0:1000" \
"node_get_fsm_siblings_99,,0:100,0:1000" \
"node_get_fsm_siblings_100,,0:100,0:1000" \
"node_get_fsm_objsize_mean,,0:100,0:1000" \
"node_get_fsm_objsize_median,,0:100,0:1000" \
"node_get_fsm_objsize_95,,0:100,0:1000" \
"node_get_fsm_objsize_99,,0:100,0:1000" \
"node_get_fsm_objsize_100,,0:100,0:1000" \
"riak_search_vnodeq_mean,,0:100,0:1000" \
"riak_search_vnodeq_median,,0:100,0:1000" \
"riak_search_vnodeq_95,,0:100,0:1000" \
"riak_search_vnodeq_99,,0:100,0:1000" \
"riak_search_vnodeq_100,,0:100,0:1000" \
"search_index_fail_one,,0:100,0:1000" \
"pbc_active,,0:100,0:1000" \
"pbc_connects,,0:100,0:1000" \
"read_repairs,,0:100,0:1000" \
"list_fsm_active,,0:100,0:1000" \
"node_get_fsm_rejected,,0:100,0:1000" \
"node_put_fsm_rejected,,0:100,0:1000"
```
#### Sample Output
```
OK: Status OK.|'node_get_fsm_siblings_mean'=0;0:100;0:1000 'node_get_fsm_siblings_median'=0;0:100;0:1000 'node_get_fsm_siblings_95'=0;0:100;0:1000 'node_get_fsm_siblings_99'=0;0:100;0:1000 'node_get_fsm_siblings_100'=0;0:100;0:1000 'node_get_fsm_objsize_mean'=0;0:100;0:1000 'node_get_fsm_objsize_median'=0;0:100;0:1000 'node_get_fsm_objsize_95'=0;0:100;0:1000 'node_get_fsm_objsize_99'=0;0:100;0:1000 'node_get_fsm_objsize_100'=0;0:100;0:1000 'search_index_fail_one'=0;0:100;0:1000 'pbc_active'=0;0:100;0:1000 'pbc_connects'=0;0:100;0:1000 'read_repairs'=0;0:100;0:1000 'list_fsm_active'=0;0:100;0:1000 'node_get_fsm_rejected'=0;0:100;0:1000 'node_put_fsm_rejected'=0;0:100;0:1000
```