Demonstrate the practical application of event summarization based on location.
Event Correlation - Highly recommended, a must read!
Tasked with managing a large network that is either geographically seperated or the topology is such that 'fault domains' are easy to recognize. With this in mind it would be desirable to have a single alert that notifies us that site "X" is experiencing a problem, versus many (10 ~ 500+) alerts from individual nodes. This not only cuts down on the noise, it also automates a component of the troubleshooting process, enabling operations to vector in on a common symptom in order to crush the problem.
What is a synthetic event?
This single event that summarizes many events is refereed to as a 'synthetic event'.
There needs to be a common way to identify nodes such as location, group, business service, etc. The common attribute is assigned when the node is provisioned in NMIS. For example, if it's determined that all the nodes at the San Jose data center can be grouped into a single fault domain, then they should all have a the same location attribute of 'San_Jose_Data_Center'. This gives opEvents something to grip onto for correlation.
Defining Synthetic Events
The configuration of this feature is done in EventRules.nmis which is found in /usr/local/omk/conf. Here is an example of event summarization based on location.
This is the event name that will appear if the event summarization rule is triggered.
Event names that should be considered for summarization. This where were we define what specific events should be part of the fault domain. For something network centric 'Node Up/Down' is a logical choice.
This value in seconds. This defines a window of time in which the conditions must be met in order for the rule to fire.
This value is in seconds. This defines how long to wait before the rule is eligible to be fired again. This prevents excessive 'synthetic events' in the event of a major outage. Matching individual node events that are received during this inhibit period will be associated with the previously fired synthetic event and shall be suppressed. (is this true?)
This value represents the minimum number of matching events that must be received in order for the rule to fire.
This along with the 'Events' defined above makes the hash for summarization. In this example we are focusing on location, so the value will be node.location. This location value is assigned when the node is provisioned in NMIS, and important field to populate in order for this feature to work properly. Please reference the related links above in order to better understand all the functionality provided by this value.