Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Demonstrate the practical application of event consolidation based on location, the principles here could be applied to other shared properties of events and nodes, like Business Service, Application, Customer, Group, Interface, and many others.  Combinations can also be made, especially useful for nodes in Data Centres.

Related Pages

Event Correlation - Highly recommended, a must read!

...

Tasked with managing a large network that is either geographically seperated or the topology is such that 'fault domains' are easy to recognize, we would like to summarize consolidate events to prevent network management noise and reduce troubleshooting time.  With this in mind it would be desirable to have a single alert that notifies us that site "X" is experiencing a problem, versus many (10 ~ 500+) alerts from individual nodes.  This not only cuts down on the noise, it also automates a component of the troubleshooting process, enabling operations to vector in on a common symptom in order to crush the problem. 

A simple example would be a remote office, with several managed nodes, any problem with the WAN or with power would result in many events being seen in opEvents, enabling this feature would reduce that to a single event.

Image Added

What is a synthetic event?

...

There needs to be a common way to identify nodes such as location, group,  business service, etc.  The common attribute is assigned when the node is provisioned in NMIS.  For example, if it's determined that all the nodes at the San Jose data center can be grouped into a single fault domain, then they should all have a the same location attribute of 'San_Jose_Data_Center'.  This gives opEvents something to grip onto for correlation.

...

Event names that should be considered for consolidation.  This is where were we define what specific events should be part of the fault domain.  For something network centric 'Node Up/Down' is a logical choice.

...

This value is in seconds.  Once all conditions of the rules match, the rule fires.  The inhibit value is the time in seconds that must elapse before the rule is eligible to be fired again.  This prevents excessive 'synthetic events' in the event of a major outage.  Matching individual node events that are received during this inhibit period will be associated with the previously fired synthetic event and shall be suppressed.   

 Count

This value represents the minimum number of matching events that must be received in order for the rule to fire.

...