Running the NMIS 8.6.7G appliance, I'm trying to find how to restart NMIS without restarting the entire VM, nothing found by searching. Thank you
The process outlined below should resolve the CPU issue you're seeing and improve overall polling performance -
If you are installing NMIS 8.6.7G and you have a busy server, then it is important that you modify the cron entry for
NMIS and change it. The setting is found in /etc/cron.d/nmis and the default is this:
* * * * * root /usr/local/nmis8/bin/nmis.pl type=collect mthread=true ; /usr/local/nmis8/bin/nmis.pl type=services mthread=true
Opmantek recommend changing that to:
*/1 * * * * root /usr/local/nmis8/bin/nmis.pl type=collect mthread=true
*/2 * * * * root /usr/local/nmis8/bin/nmis.pl type=services mthread=true
It is important to note that the NMIS polling engine had an overhaul in NMIS 8.6.6 and NMIS 8.6.7 to improve how
parallel threads were handled and that polling was kept up to date. So when NMIS starts a poll every 1 minute, not all nodes will be polled, it will poll as many as it can in that time and then leave the others for the next poll cycle, this has the result of spreading the polling (and load on the server) out over 5 minutes.
If you are not getting all nodes polled in 5 minutes, you will need more threads.
If you are polling many nodes every 1 minute, then you will need to size the server accordingly.
thank you, I just made these changes!
The "metrics" section of NMIS dashboard keeps showing a warning about CPU is over 60% and when I run top I noticed a nmis.pl with a large chunk of cpu use, up to 99-100% otherwise everything works fine. I was just trying to find a way to get the metrics section working again...
How much CPU and RAM do you have assigned to the server, what OS is installed, and how many devices are you polling?
Also, check System -> Host Diagnostics -> NMIS Runtime Graph, you want to make sure your Collect time is less than the polling cycle. So, if you're polling every 5 minutes the Collect time needs to be < 300s.
Next, check Reports -> Current -> Collect/Update Time, first sort this by the Collect Time column and look for devices with the highest Collect times. Anything higher than 30s or so need to be investigated. What's the latency, how many hops to the device, is the device overloaded itself? After that, sort on Update Time column; any devices not updating?
Collect Time = 31.33 seconds
I found one higher than 30s, it's 164+ seconds. It's a physical server running docker. Will look at that system. thanks
Oh and the virtual appliance is running CentOS 6 with 6 vCPU & 10GB of memory with @ 130 systems in NMIS
You should not be having ANY CPU issues with those specs and only 130 devices in NMIA. Has the CentOS been patched and updated? Should be running 6.9 I think? How is storage space (df -h) and swap space on the server?
The 1 system with the high Collect time is under heavy load, may be causing perf issues in NMIS. OS is 6.9
Filesystem Size Used Avail Use% Mounted on
16G 4.1G 11G 28% /
tmpfs 4.9G 0 4.9G 0% /dev/shm
/dev/sda1 477M 182M 270M 41% /boot
40G 23G 15G 60% /data
20G 2.6G 16G 14% /var
The processes NMIS uses to collect device fault and performance data are all started and stopped by a cron job located in /etc/cron.d/nmis
These processes should not need to be restarted as NMIS monitors their performance and kills processes that overrun collection time or have stopped responding.
What are you seeing that leads you to want to stop or restart these?
Powered by a free Atlassian Confluence Open Source Project License granted to Opmantek. Evaluate Confluence today.