Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • abort_afterFrom NMIS 8.6.8G there is a new command line option, abort_after, that prevents the main thread to run for a long time, preventing it to collide with the next cron job. By default, this parameter is 60 seconds, as the cron job is set to run every 60 minutes by default. 

    Also, this option needs to always have also the option mthreads=true. 


    Code Block
    nmis8/bin/nmis.pl type=collect abort_after=60 mthread=true ignore_running=true;
    


  • max_thread: The other important configuration option is max_thread, that will prevent the number of children of the main process to grow too big. Considerations:
    • If the collect operation has a lot of nodes to process, the number of children won't reach the limit instantly. While the main thread is forking, the children complete their jobs and will exit. Also, the main process will wait for them to change their state so the number will increase slowly.
    • NMIS can have more than one instance of the main process running, and the number of children could be higher that than max_threads, as the limit is only per instance. 
  • sort_due_nodes: When NMIS decides what to poll it can do so in a pseudo-random order which is the default, if your server is overloaded you will likely see some nodes never getting polled, hence pseudo-random, so for heavily loaded servers, enable sort_due_nodes, in the NMIS configuration add with the value set to 1.
  • ReferenceNMIS 8 - Configuration Options for Server Performance Tuning

...

Here we will proceed to verify the data collection configuration towards the devices, so we validate the Collect, maxthreads and mthread parameters.

In the nmis NMIS Cron file we see the following:

...

Processes in a "D" or uninterruptible sleep state are usually waiting on I/O.


Code Block
[root@nmisslvcc5 log]# ps -auxf | egrep " D| Z"
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      1563  0.1 root@8/
root     13417  0.6  0.8 565512 306812 ?       D    10:38   0:37  \_ opmantek.pl webserver             -
root     17833  9.8  0.0      0     0 ?        D   Z    Mar17 12:19   100:47 00      \_ [jbd2/dm-2-8]
root      1565  0.0 opeventsd.pl] <defunct>
root     17838 10.3  0.0      0     0 ?           D       Z    Mar17  12:19   0:43 00      \_ [jbd2/dm-3-8]
root      1615  0.3 opeventsd.pl] <defunct>
root     17842 10.6  0.0      0     0 ?        D   Z    Mar17 12:19   390:26 00      \_ [flush-253:2]
root      1853  0.0  0.0  29764   736 ?        D<sl Mar17   0:04 auditd
root     17898  0.0  0.0 103320   872 pts/5    S+   12:20   0:00  |       \_ egrep  D| Z
apache   17856 91.0  0.2 205896 76212opeventsd.pl] <defunct>nmisslvcc5 log]# ps -auxf | egrep " D| Z"
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      1563  0.1  0.0      0     0 ?        D    Mar17  10:47  \_ [jbd2/dm-2-8]
root      1565  0.0  0.0      0     0 ?        D    12:19  Mar17   0:01  |  43  \_ /usr/bin/perl /usr/local/nmis8/
root     13417 [jbd2/dm-3-8]
root      1615  0. 0.80      5655120     3068120 ?              D    10:38  Mar17  039:37 26  \_ opmantek.pl webserver             -
root     17833  9.8 [flush-253:2]
root      1853  0.0  0.0      0    29764   0736 ?        Z   D<sl 12:19  Mar17   0:00      \_ [opeventsd.pl] <defunct>04 auditd
root     1783817898  100. 0.0     0 0    103320   0 ?        Z   872 pts/5    S+   12:19  20   0:00     00  |       \_ [opeventsd.pl] <defunct>
root     17842 10.6 egrep  D| Z
apache   17856 91.0  0.0     2 0    205896 076212 ?        Z   D    12:19   0:00     01  |   \_ [opeventsd.pl] <defunct>/usr/bin/perl /usr/local/nmis

Test Disk I/O Performance With dd Command

...