Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Node Health Report

The Node Health report display health-related attributes for all selected nodes for a given period. Attributes displayed are: Status, Device, Availability, Interface Availability, %CPU, 95th% CPU, Max %CPU, CPU Exc., %Mem Free, 95th% Mem Used, Max %Mem Used, %Mem Util, %IO/VIR Mem Free, 95th% IO Mem Used, Max %IO Mem Used, %IO/VIR Mem Util. As of version 3.1.4 when this report is exported to XLSX and CSV formats the following columns of information are also displayed: Group, %IO Mem Free.

The report also includes two columns with the detected (abnormal) Conditions and the recommended Actions.

...

Below shows the outcome of a default Node Health Report or where exceptions=false. The full report can be viewed by downloading the ZIP file HERE

Image RemovedImage Added

To create a Node Health Report showing exceptions only, click the box that the arrow points to in the image below.

...

A Node Health Report using the same devices where exceptions=true looks similar to the image below. The full report can be viewed by downloading the ZIP file Here HERE

Image RemovedImage Added

The formulas used for calculation of the reporting conditions can be tuned and adjusted by the user:

The section opreportsopreport_rules (in conf/opCommon.nmis in opReports 3.x, or opReports.nmis in version 2.x) defines the threshold values for the following conditions:

...

If the node has multiple CPUs then the utilisation measure is averaged over all CPUs. 


CPU Exceptions
The count of times the CPU utilisation exceeded the "CPU Exception Threshold" of 20%. If the node has multiple CPUs then this is the sum of the exception counts of all CPUs.

...

  • Low free main memory less than 25


opReports 3.5.1 and newer include improvements to memory related fields in the Node Health Report

/path/to/omk/conf/opCommon.json has a new setting withdefault being /opreports/on_invalid_hrcachemem_use_only_hrmem=0.

  • With /path/to/omk/conf/opCommon.json set at opreports/opreports/on_invalid_hrcachemem_use_only_hrmem=1,
    opReports attempts to detect situations where invalid hrCacheMemUsed and hrCacheMemSize
    values are being reported that would cause the memory related fields in the Node Health Report to return as negative values.
    • In such a case (detect memory related fields in the Node Health Report would negative values),
      hrCacheMemUsed and hrCacheMemSize will not be used in the calculation of memory related fields
      and a suitable entry to this effect will be logged in opReports.log.
    • This issue has been detected in docker instances where hrCacheMemUsed and hrCacheMemSize values
      were that of the docker host and not that of the docker instance itself.
  • With /path/to/omk/conf/opCommon.json set at opreports/opreports/on_invalid_hrcachemem_use_only_hrmem=0:
    • In such a case (detect memory related fields in the Node Health Report would negative values),
      affected memory related fields will return N/A and a suitable entry to this effect will be logged in opReports.log.