Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added documentation for reorder protection

...

For state tracking opEvents then combines the node name and the values of stateful and element into a lookup key, and associates that key with the state value.
Any repeat events with the same lookup key and the same state value are ignored.

Stateful Deduplication, Forwarded Events and Reorder Protection

If you use an Event Action or Escalation Policy with create_remote_events to forward events to another opEvents server elsewhere, then you might occasionally find that such forwarded events arrive out of order, i.e. an earlier 'down' event might be received after the later 'up' event. This can happen because of network congestion, action processing on the sending side being asynchronous and subject to process limits and similar reasons.

Out of order reception of stateful events can cause state desynchronisation at the receiving server, as the up event would be processed first and thus be deduplicated and discarded, while the down event later on causes a transition to state down which isn't cleared.

opEvents versions 2.4.2 and newer provide a reorder protection mechanism to handle such out of order situations better - which comes at the cost of temporarily delaying the processing of some forwarded stateful events.

To enable reorder protection, two steps need to be taken:

  • you need to set the configuration property state_reorder_window to a positive number (e.g. 30) on the receiving server,
  • and you must make sure that your forwarded events do carry an authority property, to denote the event as originating from a remote authoritative source.

If both of these conditions are met, opEvents on the receiving server will temporarily postpone processing of a forwarded stateful event, if the event would be discarded by stateful deduplication.
This allows earlier but externally delayed related events to enter the processing queue in the correct sequence, if any such do arrive within the configured time window after the out-of-order postponed event.

If a state-changing remote event does arrive within the time window configured by state_reorder_window, then the correct sequencing of transitions is restored and processing of postponed events resumes immediately. Otherwise, processing resumes after the time window elapses.

The state_reorder_window should not be set too large as it causes undesirable event processing delays; a value of 10 to 30 seconds should suffice in most environments.

Programmable Suppression

To provide fine-grained control of how to handle repeated events of any kind, opEvents also supports programmable event suppression. Using this facility the administrator can define flexible rules for when to suppress repeat events, based on the recent event history and some further refinement criteria. Please note, however, that programmable suppression is available only for classes or groups of events and cannot be enabled specifically for a single node only.

...