Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this tutorial, we will walk you through creating a new threshold for an the "System Load Avg" of a currently installed device on NMIS8.

...

Background Information

The Linux system load is a measurement of the computational work the system is performing. A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average. So, if your system has a load of 5, five processes are either using or waiting for the CPU.

On its own, the load number doesn’t mean too much. A computer might have a load of 0 one split-second, and a load of 5 the next split-second as several processes use the CPU. Even if you could see the load at any given time, that number would be basically meaningless.

That’s why Unix-like systems don’t display the current load. They display the load average. This allows you to see how much work your computer has been performing.

It is important to clarify that on Linux at least, the load average and CPU utilisation are actually two different things. Load average is a measurement of how many tasks are waiting  for the CPU (not just CPU time but also disk activity) over a period of time. CPU utilisation is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyper-threading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.

If you are not familiar with these concepts, here you can find detailed information about Understanding Linux CPU Load and Linux Load Averages.

2. Set up your environment

...

If you haven’t done it yet. Download our latest Virtual Appliance from Opmantek.com or from here: http://dl-nmis.opmantek.com/Opmantek-191212c-VM-Debian9.11-64bit.ova 

Import it into your favourite hypervisor, run the VM and get the IP address assigned to it.

For this tutorial, my NMIS8 environment is using 192.168.1.100, this IP address may be different on your set up.

1821pxNMIS NMIS adds the server where it is hosted as “localhost” and the model used is “Net-snmp”.

Image Removed

, the model and the related files are usually stored in: /usr/local/nmis8/models/


Image Added


Access to the VM via SSH to edit the configuration files, the default credentials for the VM are: 


Code Block
username: root
password: NM1$88


Warning
iconfalse
titleImportant
It is important to verify the sintaxis of the edited files, remember to use: perl -c filename  after saving the changes.

3. Creating a new thresholds

As mentioned before, we will be creating a threshold for the Linux “System Load”Load Averages”, we want to generate events when the current values exceed the thresholds.

...

First, we need to open the Model, in this case the /usr/local/nmis8/model/Model-net-snmp.nmis and identify the item that we need to modify.Image Removed

The item to modify is “laload” (Linux Average Load) that is under the “system → rrd” section.


Image Added



On the “laload” item we have to add the “threshold” item as follow:

...

Code Block
'threshold' => 'laload_threshold', 


We In this particular case, we have named the threshold “ laload_threshold” . It is important to name make it properly to avoid confusion while setting it up.

Image Removed

The item to modify is “laload” that is under the “system → rrd” section. On the “laload” item we have to add the “threshold” item as follow:

'threshold' => 'laload_threshold',

616pxWe have named the threshold “ laload_threshold”. It is important to name it properly to avoid confusion while setting it up.

Image Removed

more noticeable and to be more clear, however this is not a rule.


Code Block
title/usr/local/nmis8/model/Model-net-snmp.nmis
--snip--
      'laload' => {
        'threshold' => 'laload_threshold',
        'graphtype' => 'laload',
        'snmp' => {
          'laLoad1' => {
            'oid' => 'laLoad.1',
            'option' => 'gauge,0:U'
          },
          'laLoad5' => {
            'oid' => 'laLoad.2',
            'option' => 'gauge,0:U'
          }
        }   
      },
--snip--


Image Added


We add the threshold values to /usr/local/nmis8/models/Common-threshold.nmis, using the name specified before. The event name must include "Proactive" at the beginning.

Code Block
%hash = (
  'threshold' => {
    'name' => {
      'laload_threshold' => {
        'item' => 'laLoad5',
        'event' => 'Proactive System Load',
        'select' => {
          'default' => {
            'value' => {
              'fatal' => '5',
              'critical' => '2',
              'major' => '1',          
              'minor' => '0.8',
              'warning' => '0.7'
            }
          }
        }
      },
--snip--




Image Added



Next we add statistics extraction to /usr/local/nmis8/models/Next we add statistics extraction to Common-stats.nmis

Adding it inside the “type” section.

- In green, we have to use the name used in the /usr/local/nmis8/models/Common-database.nmis, because the stats need to know which database to read.

...

- In magenta, the name of the data source specified in the model inside the rrd section, in this case is “laLoad5"

Image Added

Code Block
%hash = (
  'stats' => {
    'type' => {
      'laload' => [
        'DEF:laLoad5=$database:laLoad5:AVERAGE',
        'PRINT:laLoad5:AVERAGE:laLoad5=%1.2lf'
      ],
--snip--


We use the name of the database specified on /usr/local/nmis8/models/Common-database.nmis (in green)Image Removed


Once we have created the threshold, it is time to tested. The best way to test if it is working as desired, is by using running: /usr/local/nmis8/bin/nmis.pl .and using debug=1


Code Block
$ /usr/local/nmis8/bin/nmis.pl type=thresholdsthreshold debug=91 node=localhost


As we can see, the output show, that the threshold has been applied, in this case we see that the current value is 0.11 and the lowest value for the threshold is 50, so the level is normal.

Image Removed

In this other view, we have modified the values of the threshold to make it trigger. The current value is 0.81 and the threshold is set to 0.3 as minimum for a major level. It triggers the threshold.

Image Removed

3.64 which has a current level of Critical because it has exceeded the threshold value of 2. (in green).


Info
Note: I have put the system under stress to achieve a high load value. If you need to force your threshold to trigger, use a lower value in your threshold implementation (e.g: 0.01).



Image Added


And finally, Now, If we go to NMIS GUI and check the event for the node "localhost", we can see that the alert have has been risen created and is showing the details as we expected.


Image Added


Now, we have set up properly out thresholds for "Linux System Load"Image Removed