...
The Baseline Tool now ships with the latest versions of opCharts for NMIS8 and NMIS9.
Why we need a Dynamic Baseline and Thresholding Tool
...
When a metric remains to the same level for an extended period, it is called a flatline detection. This means, the standard deviation is 0.
- '"threshold_period' => " : "-60 minutes" # Default -15 min
- '"threshold_std_deviation' => " : 0.001, # Or 0. It checks the standard deviation (stddev)
- '"threshold_exceeds' => " : 2, # Or ignored. If not set, it will create an event every time it detects a flatline.
- '"threshold_level' => 'critical' " : "critical" # Or Major by default
Flatline example:
...
Flatline example with threshold exceed:
Example:
Code Block |
---|
'"ifInErrors'" =>: { '"baseline'" =>: '"flatline'", '"active'" =>: '"true'", '"metric'" =>: '"ifInErrors'", '"type'" =>: '"pkts_hc'", '"nodeModel'" =>: '"CiscoRouter|CatalystIOS|CiscoNXOS'", '"use_index'" =>: '"interface'", '"event'" =>: '"Proactive Output Discards (flatline)'", '"indexed'" =>: '"true'", '"threshold_std_deviation'" =>: 0.001, '"threshold_period'" =>: "-60 minutes", '"threshold_exceeds'" =>: "20" }, |
Simple Baseline
The simple baseline just detects when the average of a selected period raises a threshold level.
...
Example:
Example:
Code Block |
---|
'"ifInErrors'" =>: { '"baseline'" =>: '"simplethreshold'", '"active'" =>: '"true'", '"metric'" =>: '"ifInErrors'", '"type'" =>: '"pkts_hc'", '"nodeModel'" =>: '"CiscoRouter|CatalystIOS|CiscoNXOS'", '"use_index'" =>: '"interface'", '"event'" =>: '"Proactive Output Discards (simplethreshold)'", '"indexed'" =>: '"true'", '"threshold_period'" =>: "-120 minutes", '"levels'" =>: { '"Warning'" =>: 10, '"Minor'" =>: 20, '"Major'" =>: 30, '"Critical'" =>: 40, '"Fatal'" =>: 50 } }, |
In the above graph, that would be a Fatal alert.
...
Configuration of the baseline tool is done in the file /usr/local/omk/conf/Baseline.nmis json the default configuration should be installed when the tool is installed.
...
Here is what the configuration file would look like, this example is a Same-Day Baseline:
Code Block |
---|
'"RouteNumber'" =>: { '"active'" =>: '"true'", '"metric'" =>: '"RouteNumber'", '"type'" =>: '"RouteNumber'", '"nodeModel'" =>: '"CiscoRouter'", '"event'" =>: '"Proactive Route Number Change'", '"indexed'" =>: '"false'", '"threshold_exceeds'" =>: undef, '"threshold_period'" =>: "-5 minutes", '"multiplier'" =>: 1, '"weeks'" =>: 0, '"hours'" =>: 8, }, |
Multi-Day Dynamic Baseline Configuration Example
Another configuration option using the BGP Prefixes being exchanged with BGP peers, is from systemHealth modelling and this is a multi-day baseline:
Code Block |
---|
'"cbgpAcceptedPrefix'" =>: { '"active'" =>: '"true'", '"metric'" =>: '"cbgpAcceptedPrefix'", '"type'" =>: '"bgpPrefix'", '"section'" =>: '"bgpPrefix'", '"nodeModel'" =>: '"CircuitMonitor|CiscoRouter'", '"event'" =>: '"Proactive BGP Peer Prefix Change'", '"indexed'" =>: '"true'", '"multiplier'" =>: 1, '"weeks'" =>: 4, '"hours'" =>: 1, }, |
Delta Baseline Configuration Example
Currently delta baselines do not support multi-day, but the hours value can be very large if required.
Code Block |
---|
'"hrSystemProcesses'" =>: { '"baseline'" =>: '"delta'", '"active'" =>: '"true'", '"metric'" =>: '"hrSystemProcesses'", '"type'" =>: '"Host_Health'", '"nodeModel'" =>: '"net-snmp'", '"indexed'" =>: '"false'", '"hours'" =>: 4, '"threshold_period'" =>: "-15 minutes", '"levels'" =>: { '"Warning'" =>: 10, '"Minor'" =>: 20, '"Major'" =>: 30, '"Critical'" =>: 40, '"Fatal'" =>: 50 } }, |
Delta Baseline for Output Packets Discarded Configuration Example
Currently delta baselines do not support multi-day, but the hours value can be very large if required.
Code Block |
---|
'"ifOutDiscards'" =>: { '"baseline'" =>: '"delta'", '"active'" =>: '"true'", '"metric'" =>: '"ifOutDiscards'", '"type'" =>: '"pkts_hc'", '"use_index'" =>: '"interface'", '"nodeModel'" =>: 'CiscoRouter'", '"event'" =>: '"Proactive Output Discards (Delta)'", '"indexed'" =>: '"true'", '"hours'" =>: 1, '"threshold_period'" =>: "-15 minutes", '"levels'" =>: { 'Warning'" =>: 1, 'Minor'" =>: 2, 'Major'" =>: 3, 'Critical'" =>: 4, 'Fatal'" =>: 7 } }, |
Running the Baseline Tool
...
Code Block |
---|
/usr/local/omk/bin/baseline.plexe act=run |
There are some debug options to see a little more detail, debug=true, debug=2 or debug=3 are the current levels of verbosity.
...
Code Block |
---|
# # this cron schedule runs the baseline system every 5 minutes. # # # if you DON'T want any NMIS cron mails to go to root, # uncomment and adjust the next line #MAILTO=prefered@domain.com # # m h dom month dow user command # # run the baseline every 5 minutes starting at 4 minutes offset from the hour. 4-59/5 * * * * root "/usr/local/omk/bin/baseline.exe" act=run > "/usr/local/omk/log/baseline.log" 2>&1 |
Using Group Regex and Cron for Parallel Processing.
...
Code Block |
---|
# run the baseline every 5 minutes starting at 3 and 4 minutes offset from the hour. 3-58/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Core|Dist" > /usr/local/omk/log/baseline1.log 2>&1 4-59/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Access" > /usr/local/omk/log/baseline2.log 2>&1 |
...