SNMP is a fairly complex protocol, and the fact that it primarily operates over UDP does not exactly help matters. As a consequence, there are a number of potential problems that affect NMIS' ability to collect information from SNMP agents efficiently and quickly.
snmp_timeout and snmp_retries
By default NMIS has a 5 second SNMP timeout and will retry once before it considers SNMP to have failed. The default settings work in 99% of circumstances, some devices and/or networks require increased timeout or retries to work better, so these settings can be increased, however it is important to remember that when SNMP is not responding the polling process will now have to wait for the multiple of the timeout and retries, so by default 5 seconds. If the retries were set to 3 then 5 seconds and 3 retries would be 15 seconds before NMIS considers that SNMP is down.
For servers with many nodes, it is not recommended for the multiple of timeout and retries to exceed 20 seconds.
The primary tunable NMIS configuration setting for SNMP is
snmp_max_msg_size, which controls how large a single SNMP packet may be.
This can be set as a system-wide default (in the System menu, under System Configuration), or as a per-host setting (in the Edit Node menu, under Advanced Options).
The default for
snmp_max_msg_size is 1472 bytes, just below the 1500 byte packet limit for normal Ethernets. In LAN-only scenarios it is possible to increase this past 1500 bytes: this causes IP fragments and packet reassembly, but unless your LAN is saturated and starving for bandwidth fragmentation is not a problem. The benefit of a larger SNMP packet would be that the data to be collected fits into fewer packets.
This option was added in NMIS 8.5G. It controls how many SNMP PDUs will be packaged into a single SNMP packet. The
max_repetitions setting is named a bit oddly - that comes from the SNMP module that NMIS uses: Net::SNMP calls it
This option can only be set for specific hosts, and is not available for SNMP version 1.
Its primary purpose is to overrule Net::SNMP's heuristic for maximising the efficiency of bulk transfers: the goal is to fit the maximum number of PDUs into each packet, which of course depends on the size of the PDUs (and their sizes are unknown until the operation is attempted). Like any other heuristic this one can fail under certain circumstances: If large SNMP tables are collected then it may be necessary to reduce this setting to 10-20 (when used with the default packet size). We have observed this problem in a small number of situations, for example when collecting virtual machine info from VMware ESXi hosts - the strings contained in these tables are really long.
If you observe SNMP error messages in the logs which look similar to "SNMP ERROR (X) (Y) The message size exceeded the buffer maxMsgSize of N", then you should set a lower
max_repetitions value (or increase the
snmp_max_msg_size if you're operating in a LAN-only scenario). Otherwise, a value of 40-50 minimizes the number of SNMP packets and thus speeds up collection. Not setting this option at all leaves it to the Net::SNMP module to guess a suitable value.
There is one special setup for
max_repetitions: if it is set to 0 it will behave with the default of the NET-SNMP Perl library, which appears to be 25, or if set to 1 the efficient bulk transfer is disabled and a slower but more robust transfer mechanism is employed.
The setting max_repetitions should be added to a node entry in the Nodes.nmis file and is an option in the NMIS8 GUI when editing nodes.
New in NMIS 8.6: automatic
As outlined in the NMIS 8 Release Notes, from version 8.6.0 onwards NMIS will dynamically reduce the
max_repetitions parameter if necessary.
If a "message size exceeded" error is encountered, the issue is logged and the current
max_repetitions value is reduced by 25% before the request is retried. If that retry works, the updated value is used for the SNMP session lifetime, i.e. the remainder of this node's collect of update operation. Up to four reduce-and-retry iterations are performed before NMIS gives up on the request and returns an error.
If you have not set a
max_repetitions value, the first retry will use the value 20.
Whenever such an automatic adjustment is attempted, NMIS logs a warning message similar to this example:
"WARNING (servername) SNMP get_table failed with message size exceeded, retrying with maxrepetitions reduced to 36"