Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

RESOURCES FOR TROUBLESHOOTING

Children Display

ADDITIONAL RESOURCES

TABLE OF CONTENTS

Table of Contents
maxLevel2

Lessons Learned from Support Cases

Lessons learned from support cases - common things to look for.

...

Does DNS function properly?

...

Code Block
### Check the local systems fqdn
screen [root@demo: ~]# hostname -f
demo.opmantek.com

### can the local system resolve it's own hostname?
screen [root@demo: ~]# dig +short demo.opmantek.com 
192.168.88.44

### Can the system resolve other hosts?
screen [root@demo: ~]# dig +short freebsd.org
8.8.178.110

Why DNS is Important

NMIS/OMK applications expect DNS to work.  Managing individual /etc/hosts files does not scale.  opHA is one module in particular where this is critical.  If the customer does not have a local DNS server for internal hosts consider running BIND on the NMIS Primary server, other NMIS/OMK servers can use it as a name server.  This is not difficult to do and will save a lot of troubleshooting time moving forward.

Does the system have the correct time?  Is it synced with a time server?

Code Block
[nmis@demo var]$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+cachens2.onqnet 13.64.159.31     3 u  426 1024  377    4.845   -0.126   0.458
+ec2-13-54-31-22 54.252.165.245   3 u  352 1024  377   18.036    1.540   1.008
-node01.au.verbn 192.12.19.20     2 u  514 1024  377   18.966  -16.530   1.176
*ntp3.syrahost.c 218.100.43.70    2 u  422 1024  377   63.642   -1.172   0.852

[nmis@demo var]$ date -u
2017. 02. 16. (?) 22:33:31 UTC

...


Compare the system UTC time with actual UTC time.  A site such as https://time.is/UTC will show current UTC time.

...

  • Time stamps not correct on events
  • Graph data not correct
  • Transactions with other systems fail (e.g. cookies could already be expired at the time of issue.)

NMIS Troubleshooting

Email alerts

Contacts.nmis must have the correct DutyTime format.

External Authentication

conf/Config.nmis must have the proper auth_method order as well as that method being provisioned.

Long collect times

Are we collecting many interfaces that are not necessary?

Check the view.json file for number of interfaces and interface type.  Look for common things such as interface type and description.  Use models or Config.nmis to disable collection.

opCharts Troubleshooting

TopN

  • User logs in, then is kicked back to the login screen; the browser cookie is expired because the server time and workstation time is outside the cookie lifespan.

Perl Modules

If NMIS or OMK applications can not locate a Perl module it may be missing or it may have the wrong file permissions.  Also check directory file permissions.

OMK General

Node synchronization with NMIS

Generally customers trust the node data that NMIS learns dynamically and they use this to automatically update the node data for OMK applications.  It's a good idea to install a cron job that automates this synchronization periodically.  The following commands work well for opEvents and opConfig respectively.Use the following utility to troubleshoot why charts are being populated into TopN

Code Block
/usr/local/omk/bin/nmis_topn_exportopevents-cli.exe debug=true timing=1 force=1 > topnDebug.txtact=import_from_nmis [overwrite=0/1] [setstate=0/1]

/usr/local/omk/bin/opconfig-cli.exe act=import_from_nmis [node=nodeX|nodes=nodeA,...] [overwrite=0/1]

Configuration Files

If it's suspected that a particular configuration file is causing a problem, one technique to isolate the problem follows.

  • Backup the suspect configuration file
  • Copy the default configuration file from omk/install into omk/conf
  • Restart the associated daemons and test