Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is intended to provide a NMIS Device Troubleshooting Process to Identify bad behaviors in collection for NMIS8/9 products, you can break it down into clear steps that anyone can follow and identify what's wrong with the device collection also if we have Gaps In Graphsin Graphs for the nodes managed by NMIS.


Network-Management-Information-System

Device Troubleshooting Process

...

  1. Identify the problem. The first step in troubleshooting a device issue is to identify the problem, you have to consider if the issue is in NMIS8 or NMIS9 products.
    1. Add to the support the case the product version and the servers/devices/models involved.
  2. What kind of problem are you observing. A device issue can be affected for the next reasons.
    1. Network performance, latency in the network, layer 1,2, and 3 issues.
    2. Device configuration, connectivity, SNMP configuration, and others. 
    3. Server hardware requirements, high resource utilization parameters in the server.
    4. Server configuration options, missing configuration items for server tunning.
    5. Disk performance, slow write/read times for the device collection. 
  3. Gather information, collect all the graphs, images, behaviors that can explain what the problem is.
    1. Collect support tool files The Opmantek Support Tool
      1. Execute the collect command for the support tool

        Code Block
        #General collection.
        /usr/local/nmis8/admin/support.pl action=collect  
        
        #If the file is big, we can add the next parameter.
        /usr/local/nmis8/admin/support.pl action=collect maxzipsize=900000000
        
        #Device collection.
        /usr/local/nmis8/admin/support.pl action=collect node=<node_name> maxzipsize=900000000


    2. If you are using NMIS8, provide the /usr/local/nmis8/var files
      1. go to /usr/local/nmis8/var directory and collect the next files

        Code Block
        -rw-rw----   1 nmis   nmis    4292 Apr  5 18:26 <node_name>-node.json
        -rw-rw----   1 nmis   nmis    2695 Apr  5 18:26 <node_name>-view.json


      2. obtain update/collect outputs this information will upload to the support case:

        Code Block
        /usr/local/nmis8/bin/nmis.pl type=update node=<node_name> model=true debug=9 force=true > /tmp/node_name_update_$(hostname).log
        /usr/local/nmis8/bin/nmis.pl type=collect node=<node_name> model=true debug=9 force=true > /tmp/node_name_collect_$(hostname).log


    3. If you are using NMIS9, include the dump files.


      Code Block
      /usr/local/nmis9/admin/node_admin.pl act=dump
      
      {node=nodeX|uuid=nodeUUID}
      file=<MY PATH> everything=1


  4. Replicate the problem. If possible you have to define, what the steps are to replicate the problem.
  5. Identify symptoms. To this point, you are able to see a specific problem and what the symptoms are.
  6. Determinate if something has changed, is important to verify with your team if something has changed, a good way to see this behavior is monitoring the performance graph for devices and serverImage Modified
  7. It is an individual problem?, verify  verify if this behavior is happening in a single device/server.


Network performance - NMIS Server

...

.

This section is focused on performing the review and validation of the server status in general, we will focus on verifying the historical behavior of the main metrics for the server, it is important to review all the metrics related to the good performance between the server and devices

...