Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: updated docs for stateful event handling

opEvents provides two mechanisms to handle repeated event occurrences in a practical fashion, namely stateful event deduplication and programmable event suppression.

Stateful Deduplication

All events that are related to stateful entities (e.g. a node which can be in state up or down, an interface etc.) are automatically checked against the recent history of events and the known previous state of this entity. If the new  event reports the same state as the already known one, then the new event is suppressed completely: no event record is created  (except for raw logging, if that is enabled).

...

Related to that is the concept of a Flap, which in opEvents is defined as a sequence of state down and back up transitions within a short time frame. opEvents uses the configuration option state_flap_window to define this window, by default 90 seconds. In a flap situation, the up event is marked as flap event, and its event name is changed to "<state entity> Flap"; it is also marked as associated to the previous down event, and any repeat events that don't convey a new state are suppressed.

This behaviour can be fine-tuned using the configuration option opevents_no_action_on_flap (default: true): when set to true opEvents will automatically acknowledge the related down event and set the down event action_required to false. This causes any actions defined in policies for the down event to be stopped. If opevents_no_action_on_flap is false, then the down event is not modified and remains open when a flap is detected.

Involved Event Properties

This section outlines certain internal details, mostly relevant If you are using a custom parser to feed opEvents.

Stateful event handling relies on three core event properties: statefulelement and state.

  • The stateful property indicates the type of state source, and is a free-form string.
    For example, if the event is related to an interface, stateful should be set to "Interface"; if it's about a service, the value "Service" would be most appropriate, etc.
    You may use any state source type you want in your parser rules, but avoid overloading already existing ones like  "Node" and "Interface".
  • The  element property indicates which (of potentially many) state sources the event relates to.
    For state type interface, a unique interface identifier should be used (i.e. the ifDescr).
    Like above your parser rules may capture or set the element to anything you desire, as long as the combination of node name, stateful and element is a suitably unique identifier for the particular stateful thing you're trying to track.
  • The state property indicates whether the observed state is "good" or "bad".
    opEvents treats the values  up, okgoodnormal or closed as "good", anything else as "bad".
    This comparison is made case-insensitively, i.e. "Good" will work just as well as "OK".

For state tracking opEvents then combines the node name and the values of stateful and element into a lookup key, and associates that key with the state value.
Any repeat events with the same lookup key and the same state value are ignored.

Programmable Suppression

To provide fine-grained control of how to handle repeated events of any kind, opEvents also supports programmable event suppression. Using this facility the administrator can define flexible rules for when to suppress repeat events, based on the recent event history and some further refinement criteria. Please note, however, that programmable suppression is available only for classes or groups of events and cannot be enabled specifically for a single node only.

...

If the suppression clause contains no min parameter, then a minimum of 1 is assumed. If no max is present, then infinity is used. Both min and max include the current event, so a min of 2 will suppress the first and further repeats.

Delaying and Closing of Trigger Events

In opEvents 2.0.4 and newer, suppression rules can optionally specify a number for the  delayedaction property, to delay all policy action processing for potential trigger events. If the criteria for suppression are met within the delay period, then all action processing will be aborted and skipped for these suppressed events. If the autoacknowledge property is also set, then the suppression includes not just aborting action processing but also marking the event as acknowledged.

Grouping

If no groupby clause is present, then the set of matching events is counted directly, which may be too generic for many common scenarios. For example suppressing events for a particular customer or service group wouldn't be possible. Grouping solves this problem: the set is split into groups with matching property values and the thresholds are applied to those groups.

...