Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Baseline Tool now ships with the latest versions of opCharts for NMIS8 and NMIS9.

Why we need a Dynamic Baseline and Thresholding Tool

...

When a metric remains to the same level for an extended period, it is called a flatline detection. This means, the standard deviation is 0.

  • '"threshold_period' => " : "-60 minutes" # Default -15 min
  • '"threshold_std_deviation' => " :  0.001, # Or 0. It checks the standard deviation (stddev)
  • '"threshold_exceeds' => " : 2, # Or ignored. If not set, it will create an event every time it detects a flatline.
  • '"threshold_level' => 'critical' " : "critical" # Or Major by default

Flatline example: 

...

Flatline example with threshold exceed: 

Example:

Code Block
'"ifInErrors'" =>: {
    '"baseline'" =>: '"flatline'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"ifInErrors'",
    '"type'" =>: '"pkts_hc'",
    '"nodeModel'" =>: '"CiscoRouter|CatalystIOS|CiscoNXOS'",
    '"use_index'" =>: '"interface'",
    '"event'" =>: '"Proactive Output Discards (flatline)'",
    '"indexed'" =>: '"true'",
    '"threshold_std_deviation'" =>: 0.001,
    '"threshold_period'" =>: "-60 minutes",
    '"threshold_exceeds'" =>: "20"
  },

Simple Baseline

The simple baseline just detects when the average of a selected period raises a threshold level. 

...

Example:

Example:

Code Block
  '"ifInErrors'" =>: {
    '"baseline'" =>: '"simplethreshold'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"ifInErrors'",
    '"type'" =>: '"pkts_hc'",
    '"nodeModel'" =>: '"CiscoRouter|CatalystIOS|CiscoNXOS'",
    '"use_index'" =>: '"interface'",
    '"event'" =>: '"Proactive Output Discards (simplethreshold)'",
    '"indexed'" =>: '"true'",
    '"threshold_period'" =>: "-120 minutes",
    '"levels'" =>: {
      '"Warning'" =>: 10,
      '"Minor'" =>: 20,
      '"Major'" =>: 30,
      '"Critical'" =>: 40,
      '"Fatal'" =>: 50
    }
  }, 

In the above graph, that would be a Fatal alert. 

...

Configuration of the baseline tool is done in the file /usr/local/omk/conf/Baseline.nmis json the default configuration should be installed when the tool is installed.

...

Here is what the configuration file would look like, this example is a Same-Day Baseline:

Code Block
  '"RouteNumber'" =>: {
    '"active'" =>: '"true'",
    '"metric'" =>: '"RouteNumber'",
    '"type'" =>: '"RouteNumber'",
    '"nodeModel'" =>: '"CiscoRouter'",
    '"event'" =>: '"Proactive Route Number Change'",
    '"indexed'" =>: '"false'",
    '"threshold_exceeds'" =>: undef,
    '"threshold_period'" =>: "-5 minutes",
    '"multiplier'" =>: 1,
    '"weeks'" =>: 0,
    '"hours'" =>: 8,
  },

Multi-Day Dynamic Baseline Configuration Example

Another configuration option using the BGP Prefixes being exchanged with BGP peers, is from systemHealth modelling and this is a multi-day baseline:

Code Block
  '"cbgpAcceptedPrefix'" =>: {
    '"active'" =>: '"true'",
    '"metric'" =>: '"cbgpAcceptedPrefix'",
    '"type'" =>: '"bgpPrefix'",
    '"section'" =>: '"bgpPrefix'",
    '"nodeModel'" =>: '"CircuitMonitor|CiscoRouter'",
    '"event'" =>: '"Proactive BGP Peer Prefix Change'",
    '"indexed'" =>: '"true'",
    '"multiplier'" =>: 1,
    '"weeks'" =>: 4,
    '"hours'" =>: 1,
  },

Delta Baseline Configuration Example

Currently delta baselines do not support multi-day, but the hours value can be very large if required.

Code Block
  '"hrSystemProcesses'" =>: {
    '"baseline'" =>: '"delta'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"hrSystemProcesses'",
    '"type'" =>: '"Host_Health'",
    '"nodeModel'" =>: '"net-snmp'",
    '"indexed'" =>: '"false'",
    '"hours'" =>: 4,
    '"threshold_period'" =>: "-15 minutes",
    '"levels'" =>: {
      '"Warning'" =>: 10,
      '"Minor'" =>: 20,
      '"Major'" =>: 30,
      '"Critical'" =>: 40,
      '"Fatal'" =>: 50
    }
  },

Delta Baseline for Output Packets Discarded Configuration Example

Currently delta baselines do not support multi-day, but the hours value can be very large if required.

Code Block
  '"ifOutDiscards'" =>: {
    '"baseline'" =>: '"delta'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"ifOutDiscards'",
    '"type'" =>: '"pkts_hc'",
    '"use_index'" =>: '"interface'",
    '"nodeModel'" =>: 'CiscoRouter'",
    '"event'" =>: '"Proactive Output Discards (Delta)'",
    '"indexed'" =>: '"true'",
    '"hours'" =>: 1,
    '"threshold_period'" =>: "-15 minutes",
    '"levels'" =>: {
      'Warning'" =>: 1,
      'Minor'" =>: 2,
      'Major'" =>: 3,
      'Critical'" =>: 4,
      'Fatal'" =>: 7
    }
  },

Running the Baseline Tool

...

Code Block
/usr/local/omk/bin/baseline.plexe act=run

There are some debug options to see a little more detail, debug=true, debug=2 or debug=3 are the current levels of verbosity.

...

Code Block
#
# this cron schedule runs the baseline system every 5 minutes.
#
#
# if you DON'T want any NMIS cron mails to go to root, 
# uncomment and adjust the next line
#MAILTO=prefered@domain.com
#
# m h dom month dow user command
#
# run the baseline every 5 minutes starting at 4 minutes offset from the hour.
4-59/5 * * * * root "/usr/local/omk/bin/baseline.exe" act=run > "/usr/local/omk/log/baseline.log" 2>&1

Using Group Regex and Cron for Parallel Processing.

...

Code Block
# run the baseline every 5 minutes starting at 3 and 4 minutes offset from the hour.
3-58/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Core|Dist" > /usr/local/omk/log/baseline1.log 2>&1
4-59/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Access" > /usr/local/omk/log/baseline2.log 2>&1

Image Removed

...