Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add advanced threshold documentation

Table of Contents

Prerequisites

NMIS version 8.3.18G or greater.

...

Basic Alerts

An alert is a custom event generated by testing the value of an OID against or custom variable and producing a boolean operationresult (true or false).  If the test returns true, an event is raised and it will run through the escalation system, false will not raise an alert.  Later on, when the test that was returning true once again returns false again the event will be cleared.

...

Adding the alert also adds the information to the "Device Details" panel, so you get the last polled value displayed all the time.

Example

The following is an example of the layout of an alert (in this example serialNum is taken from Model-CiscoRouter.nmis) and uses a string based (stringwise) comparison:

...

Code Block
'cipSecGlobalActiveTunnels' => {
  'oid' => 'cipSecGlobalActiveTunnels',
  'title' => 'Global Active Tunnels',
  'alert' => {
    'test' => '$r == 0',
    'event' => 'No tunnels present',
    'level' => 'Critical'
  }
}       

More Advanced Alerts

Alerts can also be created in the 'alerts' section of the model (if the model does not have that section it can be added at the lowest level of the model, e.g. along side '-common=' and 'system'.  Alerts created in this section have the advantage of being able to use values from a whole section of data to determine if the alert should be triggered or not.  A concrete example always makes things more clear.

Code Block
%hash = (
 '-common-' => { 
   -- snip -- 
 },
 'system' => {
   -- snip -- 
 },
 'storage' => {
   -- snip --
 },
 'alerts' => {
   'services' => {
     'HighProcessMemoryUsage' => {
       'type' => 'test',
       'test' => 'CVAR1=hrSWRunPerfMem;$CVAR1 > 300000',
       'value' => 'CVAR1=hrSWRunPerfMem;$CVAR1 * 1',
       'unit' => 'KBytes',
       'element' => 'hrSWRunName',
       'event' => 'High Process Memory Usage',
       'level' => 'Warning'
     }
   }
 }
);

Let's break down the above example.

'services' =>  -- defines what section the values being used for the alert are taken from.  In this case services cannot be found in the model because it is a special section just for servers.  Normally you will not need to worry about special sections.
'HighProcessMemoryUsage' => {  – this creates a label/id for the alert
'type' => 'test' – this means the alert will test a single condition.  The options are ['test', 'threshold-rising', 'threshold-falling']
'test' => 'CVAR1=hrSWRunPerfMem;$CVAR1 > 300000', – defines a custom variable and then uses that variable to perform a boolean test.  NOTE: in test mode only one custom variable can be used
'value' => 'CVAR1=hrSWRunPerfMem;$CVAR1 * 1', – the value triggered the alert and that will be displayed when the alert is shown in the GUI
'unit' => 'KBytes', – the unit that the above value will be displayed with
'element' => 'hrSWRunName', – which OID/value that has the problem, a descriptor or identifier.  In this case it is showing the name of the process that has high memory usage.
'event' => 'High Process Memory Usage', – the name of the event
'level' => 'Warning' – the level the event will be triggered as.  When using thresholding this is not used as the threshold defines the level

Thresholds

While boolean tests are nice it is often much more useful to specify levels of acceptance instead of just on or off, thresholds allow us to do this.  Another example:

Code Block
'alerts' => {
 'storage' => {
  'HighDiskUsage' => {
   'type' => 'threshold-rising',
    'threshold' => {
     'Normal' => '70',
     'Warning' => '75',
     'Minor' => '80',
     'Major' => '95',
     'Critical' => '98',
     'Fatal' => '99',
    },
   'test' => '',
   'value' => 'CVAR1=hrStorageSize;CVAR2=hrStorageUsed;$CVAR2 / $CVAR1 * 100',
   'element' => 'hrStorageDescr',
   'unit' => '%',
   'event' => 'High Disk Usage',
   'level' => '',
   'control' => 'CVAR=hrStorageType;$CVAR =~ /Fixed Disk/',
  },
 }

I will just outline the differences here.

'type' => -- is set to 'threshold-rising', this means that test will be ignored and the value in 'value' will be compared against the threshold provided and will define the level of the alert.

'threshold' => – defines a set of threshold values , the values in this hash must make sense when compared against the value defined below
'value' => – this defines a single value that will determine the level based threshold.  2 custom variables ($CVAR1/$CVAR2) may be defined and used to calculate the value.  They can hold any OID from within the 'storage' section defined in the model
'test' => '' – notice that this is blank, it does not make sense to define this as it is a boolean value and we want a non-boolean result
'control' => – defines a boolean, when true the threshold is run against the specific item, in this case only "Fixed Disk" items should match this alert, if you look in the net-snmp model you will see another alert for Memory that defines different threshold values

Also notice that 'level' is missing, threshold determines this value.