Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

RESOURCES FOR TROUBLESHOOTING

Children Display
Addition

PagesADDITIONAL RESOURCES

NMIS File Permissions

Table of Contents

...

TABLE OF CONTENTS

Table of Contents
maxLevel2

Lessons Learned from Support Cases

Does DNS function properly?

...

Code Block
### Check the local systems fqdn
screen [root@demo: ~]# hostname -f
demo.opmantek.com

### can the local system resolve it's own hostname?
screen [root@demo: ~]# dig +short demo.opmantek.com 
192.168.88.44

### Can the system resolve other hosts?
screen [root@demo: ~]# dig +short freebsd.org
8.8.178.110

Why DNS is Important

NMIS/OMK applications expect DNS to work.  Managing individual /etc/hosts files does not scale.  opHA is one module in particular where this is critical.  If the customer does not have a local DNS server for internal hosts consider running BIND on the NMIS master Primary server, other NMIS/OMK servers can use it as a name server.  This is not difficult to do and will save a lot of troubleshooting time moving forward.

...

  • Time stamps not correct on events
  • Graph data not correct
  • Transactions with other systems fail (e.g. cookies could already be expired at the time of issue.)
  • User logs in, then is kicked back to the login screen; the browser cookie is expired because the server time and workstation time is outside the cookie lifespan.

Perl Modules

If NMIS or OMK applications can not locate a perl Perl module it may be missing or it may have the wrong file permissions.  Also check directory file permissions.

NMIS Troubleshooting

Node Troubleshooting

Is the node reachable?

Ping it with a big echo request.

...

.

...

What does nmap think about it?

Code Block
[root@opmantek conf]# nmap 10.10.1.1

Starting Nmap 5.51 ( http://nmap.org ) at 2017-04-04 15:05 KST
Nmap scan report for 10.10.1.1
Host is up (0.011s latency).
Not shown: 998 closed ports
PORT   STATE SERVICE
22/tcp open  ssh
23/tcp open  telnet

Nmap done: 1 IP address (1 host up) scanned in 13.53 seconds
[root@opmantek conf]# 

Node Not Present in GUI

Example Case: 

Suddenly the node cannot be found in the GUI.  When attempting to re-add the node to NMIS via the GUI we receive a 'node already exists' error.

Issue: 

Something has become very corrupt, we need to purge NMIS of all relevant node configuration.

Actions: 

  • Open /usr/local/nmis8/conf/Nodes.nmis with an editor and delete the section for the problem node.
  • Remove the following files:
    • /usr/local/nmis8/var/<node-name>-node.josn
    • /usr/local/nmis8/var/<node-name>-view.json
  • Re-add the problem node via the NMIS GUI
  • Run the following commands:
    • /usr/local/nmis8/bin/nmis.pl type=update node=<node-name> force=true
    • /usr/local/nmis8/bin/nmis.pl type=collect node=<node-name> force=true

Verify

The problem node should now be functioning properly in the NMIS GUI.

Manual Update & Collect Actions

If a node isn't providing the data we think it should sometimes looking at manual update & collect debugs is helpful.  Redirect or tee the output to a file in order to review latter. 

Code Block
[root@opmantek ~]# /usr/local/nmis8/bin/nmis.pl node=asgard debug=9 type=update > nodeUpdate.txt

-or-

[root@opmantek ~]# /usr/local/nmis8/bin/nmis.pl node=asgard debug=9 type=update | tee nodeUpdate.txt

###################

[root@opmantek ~]# /usr/local/nmis8/bin/nmis.pl node=asgard debug=9 type=collect > nodeCollect.txt

-or-

[root@opmantek ~]# /usr/local/nmis8/bin/nmis.pl node=asgard debug=9 type=collect | tee nodeCollect.txt

Email alerts

Contacts.nmis must have the correct DutyTime format.

External Authentication

conf/Config.nmis must have the proper auth_method order as well as that method being provisioned.

If LDAP isnt working tcpdump can be used to see the response code from the LDAP server.

Long collect times

Are we collecting many interfaces that are not necessary?

Check the view.json file for number of interfaces and interface type.  Look for common things such as interface type and description.  Use models or Config.nmis to disable collection.

Syslog

When troubleshooting syslog issues the following script will gather more rsyslog daemon information then the nmis support tool.

getSyslogData.sh

snmptrapd

When troubleshooting snmptrapd issues the following script will gather more snmptrad daemon information then then nmis support tool.

getSnmpTrapdInfo.sh

Models

When troubleshooting models it's important to know if all the OID's that have a 'friendly name' are referenced within Model files have been defined in /usr/local/nmis8/mibs/nmis_mibs.oid.  Some Model files import or call other Model, Graph or Common files.  If an OID 'friendly name' has not been defined in nmis_mibs.oid it may not be obvious which model file is causing the problem.  In order to validate friendly names more easily the script below has been provided.  It will parse all the OID friendly names out of the model files and look for them in nmis_mibs.oid.  If they are not found the operator will be notified.   At some point this script should be converted to perl; this would make it much faster.

checkOid.sh

...

OMK General

Node synchronization with NMIS

...