Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added content control and stateful synthetic documentation

 

Table of Contents

opEvents does not just automatically suppress duplicate events (stateful or custom-matched); it can also create new events based on correlating recent event occurrences.
In opEvents versions 2.0.4 ( and newer) up you can use more fine-grained controls to deal with the triggering events, and from version 2.2 onwards the contents of synthetic events are configurable, too.

General Configuration

This event correlation and synthesis feature is configured in the same way as the duplicate suppression, namely by putting event creation rules into conf/EventRules.nmis.

...

  • an event name, which specifies the name of the newly created event,
  • a list of events (more precisely, their names), which are the events to consider for correlation,
  • a (minimum) count of events that have to be detected to trigger the rule,
  • an optional list of groupby clauses, which define whether the count is interpreted globally for all named events, or separately within smaller groups,
  • optional delayedaction and autoacknowledge clauses, which define how the triggering events should be handled,
  • an optional enrich clause, which adjusts the handling content of the newly created event,
  • from version 2.2 onwards, optional copy_firstcopy_lastcopy_highest and copy_groupby clauses which further control the contents of the newly created event,
  • and finally a window parameter, which defines the time window to examine.

(If you compare suppression and synthesis rules closely, you'll see that the main difference is the lack of a suppress clause for synthesis rules, whereas the suppression rules don't  have enrich or copy_* clauses.)

Here is an example rule:

...

This rule causes opEvents to look for Node Down and SNMP Down events in the last 60 seconds, separate them into per-customer groups (see grouping below); if it counts 5 or more such events in a group, then a new event called Customer Outage is created.

Grouping

If no groupby clause is present, then potential trigger events are selected solely by event name and event time (within the window), without any further scope limiting, i.e. globally across all nodes. For many common scenarios this may be too broad a selection; for example creating new events for a particular customer or service group only wouldn't be possible.

Grouping solves this problem: the set of potential triggering events is split into groups with matching property values and the count threshold is applied to those groups.

The groupby clause has the form of a list of node.X or event.Y property specifications (e.g. node.customer or node.group), which are used to group events into buckets for counting: only events that share the same values for all the listed grouping properties will be counted together.

For example, the groupby clause [ 'node.customer', 'event.priority' ] would cause this correlation rule to be applied independently for all combinations of customer and event priority. The clause given in the example block above will create a Customer Outage event for any individual customer with 5 outages in 60 seconds; without the groupby any 5 outages anywhere would cause a synthetic event to be created.

The groupby clause can make use of all common node properties which are listed here and  the standard event properties which are documented on this page. Please note, however, that only event properties that were set during the event parsing stage are accessible when correlation is performed. For example policy actions can change an event (e.g. tagging, script execution) but policy actions are performed after correlation.

Event Content and Enrichment

Before opEvents version 2.2, synthetic events are always cloned from the most recent triggering event, then they get a new name from the synthesis rule name, and finally any static enrich clauses are evaluated. Synthetic events could not be stateful events, i.e. they were not subject to deduplication and could not be acknowledged (or 'closed') by any later 'opposite' event.

In version 2.2 this limitation has been removed, and much more precise control of the event content

...

is possible.

Content Control Directives (Version 2.2 and newer)

When a synthesis rule creates a new event, the following steps are performed:

  1. If no copy_firstcopy_lastcopy_highest or copy_groupby directives are present, then a backwards-compatible directive 'copy_last => [ qr// ]' is added.
  2. copy_first

...

  1. is evaluated first, and specifies which event properties should be copied over from the earliest trigger event.
    Each listed property is copied over; if the directive contains a regular expression (e.g. qr/cust.*/, then all properties with names matching the regular expression are copied.
  2. copy_last is checked next, and properties listed here are copied over from the most recent trigger event.
    The property copying does overwrite all properties that were set earlier (by copy_first).
  3. copy_highest is checked next, and its properties are sourced from the trigger event with the highest priority.
    Again overwriting of properties may happen.
  4. copy_groupby controls whether any of the grouping property values should be saved in the new event.
    The format is different for this directive: It must be a list of property target names (or the word 'undef'), in the same order as the groupby directive.
    For each element  in the groupby list, the value of the grouping property is saved as the target name in the new event, if a target name is available in the copy_groupby list.
    If no groupby is given for this rule, then a copy_groupby directive has no effect.
  5. Now the enrich clause is checked, and each of  its property name - value pairs indicates which properties should be set to (or overwritten with) a particular static value.
  6. Now  the  nodestateful and element properties are automatically adjusted if required (see below for details).
  7. Finally, the event name is set to the rule name, certain undesirable properties are removed, an audit trail of triggering events is added (by adding the properties nodes and eventids), the event is marked as synthetic and is inserted into the database.

Automatic Event Node for Synthetic Events

If no copy_* or enrich clause has caused the the node property to be set explicitly, then the global default node is used instead.

'Set explicitly' means a copy_* or enrich clause did include the node property, i.e. not if the node property copying happened because of a regular expression.

The global default node in opEvents 2.2 is configurable using the configuration item opevents_correlation_node, and it's normally called "global".
This virtual node is automatically (re)created if missing.

This behaviour is different from opEvents before 2.2, where all synthetic events were attached to the last trigger event's node. To emulate the old behaviour you have to change your correlation rules, so that they include the directive

copy_last => [qr//, 'node']

which causes a blanket copy of all properties from the last trigger event and an explicit copy of the node property (to disable the automatic event node choice).

Example Rule

Here is an example rule demonstrating the new directives:

Code Block
'1' => {
   name => "Very Sick Node",
   events => [ "Node Down", "SNMP Down", "Interface Down", "Service Down",
               "Service Degraded", "Interface Flap", "Node Flap", "WMI Down" ],
   window => 120,
   count => 3,
   groupby => [ 'node.name' ], # we want separate events for each node of course
   enrich => { stateful => "Very Sick Node", priority => 5, state => 'down', element => undef }, # new event is stateful only if stateful is set or copied by name
   copy_last => [ qr//, 'node' ], # can set from node here (all events share it)
   copy_groupby => [ 'node' ], # or from here; must set it explicitely somewhere, or the event goes to opevents_correlation_node
},

Stateful Synthetic Events (Version 2.2 and newer)

By default, and finally any enrichment entries from the rule are added in.Please note that synthetic events are not stateful events, i.e. they are not subject to deduplication and they cannot be acknowledged (or 'closed') by any future 'opposite' event.

However, in 2.2 and newer it is possible to enable stateful handling for synthetic events:

  1. Your rule must explicitly set the stateful property.
    Copying with a regular expression in copy_* does not meet this requirement, and a thusly copied stateful property is deleted before event creation.
  2. Your rule must ensure that a suitable state property value is present.
  3. Your rule should ensure that a suitable  element property value is present, or opEvents will automatically create one from groupby information if that is available.
    As described in the documentation for Stateful Events, the combination of node, stateful  and element properties must uniquely identify the stateful 'thing', and the value of the state property describes the new state.

The example rule above shows how a stateful 'very sick node' event can be created: the node name is set from the grouping criteria (i.e. all related triggers share the same node name), the stateful property is set with a static enrich clause, and there is no  element, so at most one 'very sick node' stateful thing can exist for a single node.

If we wanted to acknowledge this event from a different correlation rule, we'd have to ensure that node, stateful and element properties with the same  value are generated, but the state would have to be 'up' or 'closed' or 'ok'.

Here is another example, for a group-level stateful event:

Code Block
8 => {
    name=>'sick group',
    events=>["Service Down","SNMP Down", "Node Down"],
    groupby => [ 'node.group' ],
    window=>150,
    count=>3,
    enrich => { stateful => "sick group", state => "down" },  # node will be opevents_correlation_node, element will be group
    copy_last => [ qr// ],
},
9 => {
    name=>'happy group',
    events=>["Service Up", "some nice event" ],
    groupby => [ 'node.group' ],
    window=>300,
    count=>1,
    enrich => { stateful => "sick group", state => "up" },
    copy_last => [ qr// ],
},

Rule 8 specifies that three of the listed 'down' events in a single group should cause a new event that sets the 'sick group' state to down for this one group; the element property is auto-generated from the groupby data,  and all such events are attached to the virtual node 'global'. Any repeat 'sick group' events would be statefully deduplicated. Because element is  set to the group in question, every single group would have its own 'sick group' state.

As soon as a single positive event from the list in rule 9 arrives, the 'sick group' event is acknowledged and closed.

Event Processing for Synthetic Events

At this point the new event is inserted into the database, and is ready for further action processing. This action processing (e.g. escalation, mail notification, custom logging) is performed immediately.

Handling of the Triggering Events

Please note : that this feature is only available in opEvents 2.0.4 and newer.

...

The net effect is that the current events view would show only the new synthetic event as 'current' and all the underlying triggering events would be categorized as closed (and optionally acknowledged), and thus be mostly hidden.

Grouping

If no groupby clause is present, then the set of matching events is counted directly, which may be too generic for many common scenarios. For example creating new events for a particular customer or service group only wouldn't be possible. Grouping solves this problem: the set is split into groups with matching property values and the count threshold is applied to those groups.

The groupby clause has the form of a list of node.X or event.Y property specifications (e.g. node.customer or node.group), which are used to group events into buckets for counting: only events that share the same values for all the listed grouping properties will be counted together. For example, the groupby clause [ 'node.customer', 'event.priority' ] would cause this suppression rule to be applied independently for all combinations of customer and event priority. The clause given in the example block above will create a Customer Outage event for any individual customer with 5 outages in 60 seconds; without the groupby any 5 outages would cause a synthetic event to be created.

The common node properties are listed here, and the standard event properties are documented on this page.

 

Synthetic Events

...

and Storm Control

All synthesis rules are applied independently, thus a single event could be a trigger for multiple synthetic events. This is desirable for example for detecting both per-customer problems and global issues at the same time: a few problem events can trigger a customer-specific action, while the same events could be counted together with others for detecting and reacting to a major outage.

...

If a rule does not trigger because there are fewer than count trigger events, then naturally these events remain potential triggers until the time window moves past the events in question.