Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: updated docs wrt autoacknowledge plus inhibit

 

Table of Contents

opEvents does not just automatically suppress duplicate events (stateful or custom-matched); it can also create new events based on correlating recent event occurrences.
In opEvents versions 2.0.4 and up newer you can use more fine-grained controls to deal with the triggering events, and from version 2.2 onwards the contents of synthetic events are configurable, too.

This  page describes how to configure event correlation.

 

Table of Contents

General Configuration

This event correlation and synthesis feature is configured in the same way as the duplicate suppression, namely by putting event creation rules into conf/EventRules.nmis.

...

  • an event name, which specifies the name of the newly created event,
  • a list of events (more precisely, their names), which are the events to consider for correlation,
  • a (minimum) count of events that have to be detected to trigger the rule,
  • an optional list of groupby clauses, which define whether the count is interpreted globally for all named events, or separately within smaller groups,
  • optional delayedaction and autoacknowledge clauses, which define how the triggering events should be handled,
  • an optional enrich clause, which adjusts the content of the newly created event,
  • from version 2.2 onwards, optional copy_firstcopy_lastcopy_highest and copy_groupby clauses which further control the contents of the newly created event,
  • from version 2.2 onwards, an optional inhibit parameter, which disables correlation temporarily after a rule has fired,
  • and finally a window parameter, which defines the time window to examine.

...

  1. If no copy_firstcopy_lastcopy_highestcopy_present or copy_groupby directives are present, then a backwards-compatible directive 'copy_last => [ qr// ]' is added.
  2. (opEvents 2.2 and newer only) copy_firstpresent is evaluated first, and . It specifies which event properties should be copied over from the earliest trigger event.
    Each listed property is copied over; if the directive contains a regular expression (e.g. qr/cust.*/, then all properties with names matching the regular expression are copied.
  3. copy_last is checked next, and properties listed here are copied over from the most recent trigger event.
    The property copying does overwrite all properties that were set earlier (by copy_first).
  4. copy_highest is checked next, and its properties are sourced from the trigger event with the highest priority.
    Again overwriting of properties may happen.
  5. copy_groupby controls whether any of the grouping property values should be saved in the new event.
    The format is different for this directive: It must be a list of property target names (or the word 'undef'), in the same order as the groupby directive.
    For each element  in the groupby list, the value of the grouping property is saved as the target name in the new event, if a target name is available in the copy_groupby list.
    If no groupby is given for this rule, then a copy_groupby directive has no effect.
  6. Now the enrich clause is checked, and each of  its property name - value pairs indicates which properties should be set to (or overwritten with) a particular static value.
  7. Now  the  nodestateful and element properties are automatically adjusted if required (see below for details).
  8. Finally, the event name is set to the rule name, certain undesirable properties are removed, an audit trail of triggering events is added (by adding the properties nodes and eventids), the event is marked as synthetic and is inserted into the database.

Automatic Event Node for Synthetic Events

If no copy_* or enrich clause has caused the the node property to be set explicitly, then the global default node is used instead.

'Set explicitly' means a copy_* or enrich clause did include the node property, i.e. not if the node property copying happened because of a regular expression.

The global default node in opEvents 2.2 is configurable using the configuration item opevents_correlation_node, and it's normally called "global".
This virtual node is automatically (re)created if missing.

This behaviour is different from opEvents before 2.2, where all synthetic events were attached to the last trigger event's node. To emulate the old behaviour you have to change your correlation rules, so that they include the directive

copy_last => [qr//, 'node']

which causes a blanket copy of all properties from the last trigger event and an explicit copy of the node property (to disable the automatic event node choice).

Example Rule

Here is an example rule demonstrating the new directives:

  1. set from their first occurrence.
    This rule must contain explicit property names only, i.e. no regular expressions.
    opEvents checks all trigger events in chronological order, and when it finds an event that has a value for the desired property, it copies that value over and stops looking for that property. Any later events that might have the property as well do  not contribute to the result.
    A rule like copy_present => [ 'alpha', 'beta' ] will pull the alpha and beta properties from wherever they are present for the first time, but independent of each other: a  trigger event can contribute none, either or both properties.
  2. copy_first is evaluated next, and specifies which event properties should be copied over from the earliest trigger event.
    Each listed property is copied over; if the directive contains a regular expression (e.g. qr/cust.*/, then all properties with names matching the regular expression are copied.
  3. copy_last is checked next, and properties listed here are copied over from the most recent trigger event.
    The property copying does overwrite all properties that were set earlier (by copy_first).
  4. copy_highest is checked next, and its properties are sourced from the trigger event with the highest priority.
    Again overwriting of properties may happen.
  5. copy_groupby controls whether any of the grouping property values should be saved in the new event.
    The format is different for this directive: It must be a list of property target names (or the word 'undef'), in the same order as the groupby directive.
    For each element  in the groupby list, the value of the grouping property is saved as the target name in the new event, if a target name is available in the copy_groupby list.
    If no groupby is given for this rule, then a copy_groupby directive has no effect.
  6. Now the enrich clause is checked, and each of  its property name - value pairs indicates which properties should be set to (or overwritten with) a particular static value.
  7. Now  the  nodestateful and element properties are automatically adjusted if required (see below for details).
  8. Finally, the event name is set to the rule name, certain undesirable properties are removed, an audit trail of triggering events is added (by adding the properties nodes and eventids), the event is marked as synthetic and is inserted into the database.

Please note that "earliest event" in step 2, 3 and 4 refers to the event with the earliest event timestamp, which does not necessarily reflect its processing order. opEvents processes inputs mostly - but not always - in chronological order. If you have multiple 'earliest' events (all with the same timestamp) then their order is undefined and copy_first will pick a random event. The same caveat applies for the "most recent event".

Automatic Event Node for Synthetic Events

If no copy_* or enrich clause has caused the the node property to be set explicitly, then the global default node is used instead.

'Set explicitly' means a copy_* or enrich clause did include the node property, i.e. not if the node property copying happened because of a regular expression.

The global default node in opEvents 2.2 is configurable using the configuration item opevents_correlation_node, and it's normally called "global".
This virtual node is automatically (re)created if missing.

This behaviour is different from opEvents before 2.2, where all synthetic events were attached to the last trigger event's node. To emulate the old behaviour you have to change your correlation rules, so that they include the directive

copy_last => [qr//, 'node']

which causes a blanket copy of all properties from the last trigger event and an explicit copy of the node property (to disable the automatic event node choice).

Example Rule

Here is an example rule demonstrating the new directives:

Code Block
'1' => {
   name => "Very Sick Node",
   events => [ "Node Down", "SNMP Down", "Interface Down", 
Code Block
'1' => {
   name => "Very Sick Node",
   events => [ "Node Down", "SNMP Down", "Interface Down", "Service Down",
               "Service Degraded", "Interface Flap", "Node Flap", "WMI Down" ],
   window => 120,
   count => 3,
   groupby => [ 'node.name' ], # we want separate events for each node of course
   enrich => { stateful => "Very Sick Node", priority => 5, state => 'down', element => undef }, # new event is stateful only if stateful is set or copied by name
   copy_last => [ qr//, 'node' ], # can set from node here (all events share it)
   copy_groupby => [ 'node' ], # or from here; must set it explicitely somewhere, or the event goes to opevents_correlation_node
},

...

  • If your rule contains a delayedaction clause (with a numeric value), then all potentially triggering events will have their action processing delayed by the given number of seconds.
    This affects all events whose name is in the event list of your rule, no matter whether the limits for triggering a synthetic event have been met or not. The delayaction value should therefore be set to a relatively small value.
  • If your rule has the property autoacknowledge set to "true" or 1, then all triggering events will be automatically acknowledged and all action processing for them will be aborted.

...

The net effect is that the current events view would show only the new synthetic event as 'current' and all the underlying triggering events would be categorized as closed (and optionally acknowledged), and thus be mostly hidden. 

Synthetic Events and Storm Control

All synthesis rules are applied independently, thus a single event could be a trigger for multiple synthetic events. This is desirable for example for detecting both per-customer problems and global issues at the same time: a few problem events can trigger a customer-specific action, while the same events could be counted together with others for detecting and reacting to a major outage.

However, great care has been taken to avoid event storms caused by synthetic events: When a synthesis rule fires because there were more than count matching events in the time window, then all the matching events are marked as consumed and will not be considered for any future synthesis for this rule. In other words, there is no overlap between successful synthesis time windows.

Here is a practical example for the consequences of this design: Let's assume a rule that specifies 5 event matches in 120 seconds as trigger. At some time T1 we count 25 such recent events, therefore the rule fires, a new event is created, and the 25 matches are consumed (not just the 5 that the trigger requires!). The count of triggering events thus starts from scratch at time T1. Let's assume that at T2, four seconds later, event correlation is performed next, and now only the events since T1 are considered as potential triggers. Assuming there were  3 bad events in these four seconds, no synthetic event will be created. Another 4 in the next few seconds, the count is now at 7 and the rule fires. On the other hand, if there had been 200 events between T1 and T2, then only one synthetic event would be created at time T2.

multiple synthetic events. This is desirable for example for detecting both per-customer problems and global issues at the same time: a few problem events can trigger a customer-specific action, while the same events could be counted together with others for detecting and reacting to a major outage.

Great care has been taken to avoid event storms caused by synthetic events: When a synthesis rule fires because there were more than count matching events in the time window, then all the matching events are marked as consumed and will not be considered for any future synthesis for this rule. In other words, there is no overlap between successful synthesis time windows. If a rule does not trigger because there are fewer than count trigger events, then naturally these events remain potential triggers until the time window moves past them.

However, synthetic event creation currently happens immediately as soon as a sufficient number of triggers are detected: assuming a trigger of a minimum 20 events in 60 seconds, receiving 100 events in that time frame will cause a new synthetic event for each of the 20 sufficient triggers.

Inhibiting Correlation (Version 2.2 and newer)

Version 2.2 provides a new capability for fine-tuning storm control: the inhibit timer.

If a correlation rule fires, and if that rule contains a numeric inhibit parameter greater than zero, then opEvents will temporarily disable the  rule with its particular groupby context for that many seconds.

The primary application of this feature is to stop 'nuisance' repeat synthetics if a very large number of triggers arrives in a very short time frame: it lets you tell opEvents to generate at most one instance of a particular event every inhibit seconds.

Here is an example scenario: let's assume a rule for raising a 'Group Outage' event if 20 instances of a particular event are seen within a window of 60 seconds. A major outage happens, and 100 such trigger events for group A arrive within just a seconds, and a further 25 triggers for group B.

  1. Without inhibit, after the first 20 events for group A you'll get one synthetic event for group A; another after the next 20 and so on.
    For group B, one synthetic event will be generated for the first 20; the remaining 5 are too few to trigger anything.
  2. With inhibit set to 40 seconds (for example), you'll get the very first group A synthetic event as before, but then no synthetic events for this rule and group A for the next 40s;
    After that correlation for group A resumes 'from scratch' and any events received from then onwards are counted and correlated as normal.
    For group B with its fewer triggers the inhibit behaviour doesn't change anything visibly, there's still just one synthetic event.
    Note that the inhibit timer for group A is totally independent of any inhibit for group B: inhibit applies to a particular rule and its full groupby context.

In opEvents version 2.2, the combination of the options autoacknowledge and inhibit does not acknowledge trigger events that occur during the inhibit period; only 'successful' triggers are acknowledged. This has been changed for greater consistency and better  storm control in versions 2.2.1 and newer, where successful triggers and any trigger events occurring during the inhibit period are also acknowledged automaticallyIf a rule does not trigger because there are fewer than count trigger events, then naturally these events remain potential triggers until the time window moves past the events in question.