Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What is really key here is that the baseline tool will detect downward changes as well as upward changes, so if your traffic was reducing outside the baseline you would be alerted.

Establishing a Dynamic Baseline

Current Value

Firstly I want to calculate my current value, I could use the last value collected, but depending on the stability of the metric this might cause false positives, as NMIS has always supported, using a larger threshold period when calculating the current value can result in more relevant results.

...

If this was a weekly pattern the multi-day baseline would be a better option, but if this happens more randomly, using the same-day would generate an initial event on the increase, then the event would clear as the ~8Mbps became normal, and then when the value dropped again another alert would be generated.

Delta Baseline

The delta baseline is only concerned with the amount of change in the baseline, for example

Working with the Dynamic Baseline and Thresholding Tool

...

Configuration OptionDescriptionExample
baselineWhich type of baseline are we using, "dynamic" or "delta", the default is dynamic, if undefined, dynamic will be used.delta
activeIs baselining this metric active or not, values are true or falsetrue
metricWhich NMIS data point or variable, equates to an RRD DSRouteNumber
typeWhich NMIS model section or metricRouteNumber
sectionWhat is the section name in the node info, just run it, otherwise the section must exist.
nodeModelThis is a regex which defines which NMIS models should be matchedCiscoRouter
eventThe name of the event to use, will default to Proactive Baseline type metric if none provided.Proactive Route Number Change
indexedIs this variable indexed or notfalse
threshold_exceedsIgnored if undef otherwise the value must ALSO exceed this threshold to raise an eventundef
threshold_periodHow many minutes should the value to be baselined be averaged, e.g. -5 minutes is the last poll, -15 minutes would be the average of the last 15 minutes, -1 hour would be the last 60 minutes.-5 minutes
multiplierHow many standard deviations to vary the baseline by.1
weeksThe number of weeks to look back0
hoursThe number of hours to include in the baseline metrics8
levelsThe levels section is used by the delta baseline method to define when an amount of change will trigger an event and what level that event will be.

Same-Day Dynamic Baseline Configuration Example

Here is what the configuration file would look like, this example is a Same-Day Baseline:

Code Block
  'RouteNumber' => {
    'active' => 'true',
    'metric' => 'RouteNumber',
    'type' => 'RouteNumber',
    'nodeModel' => 'CiscoRouter',
    'event' => 'Proactive Route Number Change',
    'indexed' => 'false',
    'threshold_exceeds' => undef,
    'threshold_period' => "-5 minutes",
    'multiplier' => 1,
    'weeks' => 0,
    'hours' => 8,
  },

Multi-Day Dynamic Baseline Configuration Example

Another configuration option using the BGP Prefixes being exchanged with BGP peers, is from systemHealth modelling and this is a multi-day baseline:

Code Block
  'cbgpAcceptedPrefix' => {
    'active' => 'true',
    'metric' => 'cbgpAcceptedPrefix',
    'type' => 'bgpPrefix',
    'section' => 'bgpPrefix',
    'nodeModel' => 'CircuitMonitor|CiscoRouter',
    'event' => 'Proactive BGP Peer Prefix Change',
    'indexed' => 'true',
    'multiplier' => 1,
    'weeks' => 4,
    'hours' => 1,
  },

Delta Baseline Configuration Example

Currently delta baselines do not support multi-day, but the hours value can be very large if required.

Code Block
  'hrSystemProcesses' => {
    'baseline' => 'delta',
    'active' => 'true',
    'metric' => 'hrSystemProcesses',
    'type' => 'Host_Health',
    'nodeModel' => 'net-snmp',
    'indexed' => 'false',
    'hours' => 4,
    'levels' => {
      'Warning' => 10,
      'Minor' => 20,
      'Major' => 30,
      'Critical' => 40,
      'Fatal' => 50
    }
  },

Running the Baseline Tool

...

Additional options will be added, running the tool with no arguments will tell you the currently supported options.


Command Line options for Node and Group

To have the tool only run for a subset of devices you can use node_regex and group_regex options.  These are useful for only running the tool for a single node while testing new baseline configurations or in the case of the group_regex, you may only require the baseline tool to run for a subset of your devices.

Running for a couple of nodes using regular expressions.

Code Block
/usr/local/omk/bin/baseline.exe act=run node_regex="router1|server2"

Running for a couple of groups using regular expressions.

Code Block
/usr/local/omk/bin/baseline.exe act=run group_regex="HQ|Data Center|West Coast"

Automatic Processing using Cron

The baseline tool should have created a cron.d configuration /etc/cron.d/baseline, which will contain the following.

Code Block
#
# this cron schedule runs the baseline system every 5 minutes.
#
#
# if you DON'T want any NMIS cron mails to go to root, 
# uncomment and adjust the next line
#MAILTO=prefered@domain.com
#
# m h dom month dow user command
#
# run the baseline every 5 minutes starting at 4 minutes offset from the hour.
4-59/5 * * * * root /usr/local/omk/bin/baseline.exe act=run > /usr/local/omk/log/baseline.log 2>&1

Using Group Regex and Cron for Parallel Processing.

The group regex option can be used to provide parallel processing if the baseline tool is taking longer than 5 minutes to run.  A simple example would be using the baseline tool for all core and distribution devices in one processing run and a second one for all access devices.

Code Block
# run the baseline every 5 minutes starting at 3 and 4 minutes offset from the hour.
3-58/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Core|Dist" > /usr/local/omk/log/baseline1.log 2>&1
4-59/5 * * * * root /usr/local/omk/bin/baseline.exe act=run group_regex="Access" > /usr/local/omk/log/baseline2.log 2>&1