Node Health Report

The Node Health report display health-related attributes for all selected nodes for a given period. Attributes displayed are: Status, Device, Availability, Interface Availability, %CPU, 95th% CPU, Max %CPU, CPU Exc., %Mem Free, 95th% Mem Used, Max %Mem Used, %Mem Util, %IO/VIR Mem Free, 95th% IO Mem Used, Max %IO Mem Used, %IO/VIR Mem Util.

The report also includes two columns with the detected (abnormal) Conditions and the recommended Actions.

If you pass this report the option exceptions=true, then only nodes with exceptional conditions present are shown; the default is to show all nodes.

The formulas used for calculation of the reporting conditions can be tuned and adjusted by the user:

The section opreports_rules (in conf/opCommon.nmis in opReports 3.x, or opReports.nmis in version 2.x) defines the threshold values for the following conditions:

Device Availability = Condition: "Device has LOW or VERY LOW availability"
Action: Investigate causes for low availability
Formula used for Calculation:

Interface Availability = Condition: "Device has LOW or VERY LOW interface availability"
Action: Investigate causes for low interface availability
Formula used for Calculation:

CPU Utilisation = Condition: "Device has VERY HIGH, HIGH or MODERATE CPU utilisation"
Action: Investigate causes for CPU utilisation
Formula used for Calculation:

If the node has multiple CPUs then the utilisation measure is averaged over all CPUs.

 

CPU Exceptions
The count of times the CPU utilisation exceeded the "CPU Exception Threshold" of 20%. If the node has multiple CPUs then this is the sum of the exception counts of all CPUs.


Memory Utilisation = Condition: "Device has VERY LOW or LOW main memory free"
Action: Investigate causes for free low main memory
Formula used for Calculation:

IO or Virtual Memory Utilisation = Condition: "Device has VERY LOW or LOW IO or Virtual memory free"
Action: Investigate causes for low free IO or Virtual memory
Formula used for Calculation: