Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is intended to provide a troubleshooting and validation process to the SNMP configuration on the servers where NMIS is installed.



Network-Management-Information-SystemImage Added

What is SNMP?

SNMP stands for Simple Network Management Protocol and consists of three key components: managed devices, agents and network management systems. The protocol is a set of standards for communicating with devices on a TCP/IP network. It can be defined as an application-level protocol designed to monitor network infrastructure and provides administrators with device-centric visibility. SNMP monitoring is useful for anyone responsible for servers and network devices such as servers, routers, hubs, switches, ups, etc.

...

There are a number of reasons may not be able to communicate with a device during discovery, or communication could be lost some time later. There are things you can check to verify proper SNMP communication.Network-Management-Information-SystemImage Removed


Device Troubleshooting Process




snmpd daemon status validation

If the snmpd daemon terminates as soon as it is invoked, the following are possible reasons for failure and probable solutions:

  • The reason the snmpd daemon terminated will be logged in the snmpd log file or the configured syslogd log file. Check the log file to see the FATAL error message.

Solution: Correct the problem and restart the snmpd daemon.

  • The snmpd daemon must be invoked by the root user.

    Solution: Switch to the root user and restart the snmpd daemon.

  • The snmpd.conf file must be owned by the root user. The snmpd agent verifies the ownership of the configuration file. If the file is not owned by the root user, the snmpd agent terminates with a fatal error.

    Solution: Make sure you are the root user, change the ownership of the configuration file to the root user, and restart the snmpd daemon.

  • The udp port 161 is already bound. Make sure that the snmpd daemon is not already running. Issue the ps -eaf | grep snmpd command to determine if an snmpd daemon process is already executing. Only one snmpd agent can bind to udp port 161.

    Solution: Either kill the existing snmpd agent or do not try to start up another snmpd daemon process.

NMIS server snmp configuration

Tutorial on how to configure SNMP to monitor our server, we will focus on CentOS as it is one of the most widespread distributions for servers. Except for the installation, the rest is similar in other distributions.

configuration steps.





  1. Identify the problem. The first step in troubleshooting a device issue is to identify the problem, you have to consider if the issue is in NMIS8 or NMIS9 products.
    1. Add to the support the case the product version and the servers/devices/models involved.
  2. What kind of problem are you observing. A device issue can be affected for the next reasons.
    1. Network performance, latency in the network, layer 1,2, and 3 issues.
    2. Device configuration, connectivity, SNMP configuration, and others. 
    3. Server hardware requirements, high resource utilization parameters in the server.
    4. Server configuration options, missing configuration items for server tunning.
    5. Disk performance, slow write/read times for the device collection. 
  3. Gather information, collect all the graphs, images, behaviors that can explain what the problem is.
    1. Collect support tool files The Opmantek Support Tool
      1. Execute the collect command for the support tool

        Code Block
        #General collection.
        /usr/local/nmis8/admin/support.pl action=collect  
        
        #If the file is big, we can add the next parameter.
        /usr/local/nmis8/admin/support.pl action=collect maxzipsize=900000000
        
        #Device collection.
        /usr/local/nmis8/admin/support.pl action=collect node=<node_name> 


    2. If you are using NMIS8, provide the /usr/local/nmis8/var files
      1. go to /usr/local/nmis8/var directory and collect the next files

        Code Block
        -rw-rw----   1 nmis   nmis    4292 Apr  5 18:26 <node_name>-node.json
        -rw-rw----   1 nmis   nmis    2695 Apr  5 18:26 <node_name>-view.json


      2. obtain update/collect outputs this information will upload to the support case:

        Code Block
        /usr/local/nmis8/bin/nmis.pl type=update node=<node_name> model=true debug=9 force=true > /tmp/node_name_update_$(hostname).log
        /usr/local/nmis8/bin/nmis.pl type=collect node=<node_name> model=true debug=9 force=true > /tmp/node_name_collect_$(hostname).log


    3. If you are using NMIS9, include the dump files.


      Code Block
      /usr/local/nmis9/admin/node_admin.pl act=dump
      
      {node=nodeX|uuid=nodeUUID}
      file=<MY PATH> everything=1


  4. Replicate the problem. If possible you have to define, what the steps are to replicate the problem.
  5. Identify symptoms. To this point, you are able to see a specific problem and what the symptoms are.
  6. Determinate if something has changed, is important to verify with your team if something has changed, a good way to see this behavior is monitoring the performance graph for devices and server
  7. It is an individual problem? verify if this behavior is happening in a single device/server.

...