Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: slight updates

Scaling NMIS polling is something many large enterprise and service provider customers want to be able to do, and one have to deal with; One of the biggest issues with polling devices , is when devices themselves start responding too slowly, either from network latency or congestion or because they the devices themselves are overloaded. Depending on your configuration, a single slow or unresponsive device can substantially impede NMIS' progress.

NMIS has added configurable features to ensure that the polling keeps operations keep up to date and complete timely while not overloading the polling server is not getting overloaded.

Table of Contents

NMIS Poll Cycle Overview

NMIS has performs two main functionsoperations periodically, a collect and an update.  An An update is really a node discovery, it and determines what a node can do and how it should be managed.  The collect is the poll cycle when . Updates are performed infrequently, usually just once a day. The collect operation on the other hand is the 'work horse' of NMIS where the main SNMP and other protocol polling activity happens. Collects are performed every 5 minutes by default.

Depending on your configuration, the thresholding and summary processes will run with the main collect operation or can be run seperately, ; details about that separation can be found in the article Scaling NMIS Polling.

...

To prevent individual nodes from holding up NMIS, a system to manage long running processes was added.  These were This feature was summarised in the release notes for 8.5.4G as follows:

NMIS now handles critical sections and long-running NMIS processes much better than before:

...

By default this is set to undef, and the feature is disabled if the value is undef or 0 (zero).

When enabled, this feature will set sets up a process alarm on the threads (child processes) which will terminate the child if it is timeout (alarm) for each NMIS child process (in the documentation also often called a "thread"), which terminates the process if it remains stuck or running for too long. This serves as a safeguard for nodes that are totally unresponsive or much too slow in responding, and which would otherwise keep the collect operation from completing.

NMIS Event "NMIS runtime exceeded"

By default, NMIS 8.5.4 and greater will monitor all the processes on the NMIS server and when starting a new collect cycle (polling cycle), NMIS will check if there are still processes running from the last poll cycle.  If it finds any "old" processes it will politely ask them to stop (kill "TERM"), the child processes will receive this request and complete what they are doing fairly quickly and die peacefully.  The event "NMIS runtime exceeded" will be generated for the NMIS server node and added to the NMIS Event Log, so as to inform you of the unexpectedly slow or delayed process.

If you do not want to use this feature NMIS to terminate old processes at all, then you can add ignore_running=true to your command line in the cron setup, e.g. "nmis.pl type=collect ignorecollect ignore_running=true etc..."

If you like the feature but do not what the events to be generated, you can disable the events with the configuration option option disable_nmis_process_events=false, this which is found in the Config.nmis or can be modified using the NMIS Configuration GUI in the "System -> System Configuration" menu.

 Please note that the operation of the max_child_runtime feature is independent of ignore_running; that is, you can use neither one or both of these features at the same time.