Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Sometimes with NMIS and Network Management in general, you get these funny products, like wierd devices which don't really conform to the best practices and standards for SNMP.  They can be a pain to troubleshoot.  Here are some tips for things we have found.

SNMP Working, but not finding Interfaces with ifIndex

When running an NMIS update, e.g. nmis.pl type=update node=NODENAME debug=true, you might stop at this line "SNMP ERROR" see below:

Code Block
01:35:35 getIntfInfo, Get Interface Info of node STRANGENODENAME, model TELDAT
01:36:23 checkResult, SNMP ERROR (STRANGENODENAME) (ifIndex) No response from remote host "STRANGENODENAME"
01:36:23 getIntfInfo, ERROR (STRANGENODENAME) on get interface index table
01:36:23 notify, Start of Notify
01:36:23 eventAdd, event added, node=STRANGENODENAME, event=SNMP Down, level=Critical, element=, details=SNMP error

This looks odd because SNMP is working, but this very important operation is failing.  So the problem is likely to be with support for the maximum SNMP packet size which is controlled by something called max repetition, which is actually how many SNMP PDU's will be packed into the SNMP packet.  

So to troubleshoot the above you might run an SNMPWALK like this:

Code Block
NMIS# snmpwalk -v 2c -c COMMUNITYSTRING STRANGENODENAME ifIndex
IF-MIB::ifIndex.1 = INTEGER: 1
IF-MIB::ifIndex.2 = INTEGER: 2
IF-MIB::ifIndex.3 = INTEGER: 3
IF-MIB::ifIndex.4 = INTEGER: 4
IF-MIB::ifIndex.5 = INTEGER: 5
IF-MIB::ifIndex.6 = INTEGER: 6
IF-MIB::ifIndex.7 = INTEGER: 7
IF-MIB::ifIndex.8 = INTEGER: 8
IF-MIB::ifIndex.9 = INTEGER: 9
IF-MIB::ifIndex.10 = INTEGER: 10
IF-MIB::ifIndex.11 = INTEGER: 11
IF-MIB::ifIndex.12 = INTEGER: 12


If you ran a TCP DUMP which you would run with this command, you will need to make sure you are using TCPDUMP on the interface you are sending packets out of, check the route table on the server if you have multiple interfaces:

Code Block
tcpdump -i INTERFACE host 2.3.4.5


You would see this:

Code Block
01:31:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp:  C=COMMUNITYSTRING GetBulk(29)  N=0 M=10 interfaces.ifTable.ifEntry.ifIndex
01:31:37.115557 IP 2.3.4.5.snmp > 1.2.3.4.48560:  C=COMMUNITYSTRING GetResponse(185)  interfaces.ifTable.ifEntry.ifIndex.1=1 interfaces.ifTable.ifEntry.ifIndex.2=2 interfaces.ifTable.ifEntry.ifIndex.3=3 interfaces.ifTable.ifEntry.ifIndex.4=4 interfaces.ifTable.ifEntry.ifIndex.5=5 interfaces.ifTable.ifEntry.ifIndex.6=6 interfaces.ifTable.ifEntry.ifIndex.7=7 interfaces.ifTable.ifEntry.ifIndex.8=8 interfaces.ifTable.ifEntry.ifIndex.9=9 interfaces.ifTable.ifEntry.ifIndex.10=10
01:31:37.116194 IP 1.2.3.4.48560 > 2.3.4.5.snmp:  C=COMMUNITYSTRING GetBulk(30)  N=0 M=10 interfaces.ifTable.ifEntry.ifIndex.10
01:31:37.139792 IP 2.3.4.5.snmp > 1.2.3.4.48560:  C=COMMUNITYSTRING GetResponse(241)  interfaces.ifTable.ifEntry.ifIndex.11=11 interfaces.ifTable.ifEntry.ifIndex.12=12 interfaces.ifTable.ifEntry.ifDescr.1="ethernet0/0" interfaces.ifTable.ifEntry.ifDescr.2="ethernet0/1" interfaces.ifTable.ifEntry.ifDescr.3="serial0/0" interfaces.ifTable.ifEntry.ifDescr.4="bri0/0" interfaces.ifTable.ifEntry.ifDescr.5="x25-node" interfaces.ifTable.ifEntry.ifDescr.6="voip1/0" interfaces.ifTable.ifEntry.ifDescr.7="serial2/0" interfaces.ifTable.ifEntry.ifDescr.8="fr2"


What is interesting here is this: GetBulk(29) N=0 M=10 interfaces.ifTable.ifEntry.ifIndex, this is using a maximum of 10 SNMP PDU's in a packet, NET-SNMP on the command line appears to use 10 as a default OR not use bulk walks.

If you have not configured max repetitions in NMIS, you would see this:

Code Block
01:41:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp:  C=COMMUNITYSTRING GetBulk(29)  N=0 M=25 interfaces.ifTable.ifEntry.ifIndex
01:51:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp:  C=COMMUNITYSTRING GetBulk(29)  N=0 M=25 interfaces.ifTable.ifEntry.ifIndex

Then NMIS would give you the errors above.  This is using a default of M=25 which set in the Perl NET-SNMP libraries or somewhere even more obscure.

Net Result, you will need to configure your NMIS Node with

'max_repetitions' => ’10',

You can find more details about SNMP things @ SNMP Tuning

snmpd returns "invalid(4)" process state (hrSWRunStatus) for process names containing spaces

When querying the hrSWRunStatus table via SNMP when using snmpd, it should generally return 1 or 2 for processes that are running or runnable.
However, if the process name contains a space, snmpd return 4 (invalid) for the process state.
This appears to be because it's reading /proc/$PID/stat and simply splitting on space and then grabbing the third element,
which would normally be the process status, but when the process name contains a space, this is no longer true.

net-snmp version 5.7.2 is known to be affected:
https://bugzilla.redhat.com/show_bug.cgi?id=1782180

A consequence of this issue is that NMIS will report a monitored service as 'down' when it is 'running' if the process name contains a space.