You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Who watches the watcher?  It is important when using a monitoring/management system that you know it is operational, with some level of beneficial recursion, NMIS supports many techniques for managing servers and applications.  You should ensure that something in your environment is watching NMIS.

Related Content

Table of Contents

Introduction

NMIS can be used to monitor many services, including the services that it depends on itself. This can be useful in root cause analysis if NMIS has problems. Common services to monitor for opmantek servers are listed below, for up to date Service.nmis file you can go to https://github.com/Opmantek/nmis9/blob/nmis9_dev/conf-default/Services.nmis.  The team updated these services in the last 12 months to cover using HTTPS or HTTP and updated service process checks.

The current services shipping with NMIS9 at this time are:

  • NMIS9 Workers
  • NMIS9 Scheduler
  • MongoDBOpmantek Web Daemon
  • "OMK Stack HTTP" or "OMK Stack HTTPS"
  • SSL Expiry Check
  • opEvents Daemon
  • opConfig Daemon
  • opCharts Daemon

Also, it is good to monitor:

  • crond
  • syslog
  • ssh

opHA Standard

When running multiple servers utilizing opHA common pratice is to have MAIN Primary monitor all the pollers and itself. If available it is also recommend to have a primary "watch the watcher", i.e. monitor the Main Primary and the pollers as well.

Accessing the Services List

Log into your NMIS GUI and navigate to the node you’re interested in.  At the top of the node, click “service list”.

This will bring up a list of the services that particular node is monitoring.

Configuring NMIS to monitor a service

Step 1

Log into your NMIS GUI and click System, System Configuration, NMIS Nodes (devices)

Step 2

Scroll down the list and find the node you wish to monitor services on.  Click “edit” in the actions column.

Step 3

Scroll down in the box that pops up until you reach the Services section and select the services you wish to begin monitoring.  Note that you can select multiple services by holding control (for PC/Linux) or command (Mac).

Step 4

Click “Edit and Update Node”.  After the next polling cycle (usually about five minutes) you should see that the new services are being monitored.

Services Required for NMIS Modules

NMIS

NMIS requires the following services to run:

  • snmpd
  • mongod
  • omkd
  • nmisd
  • httpd or apache (or nginx)
  • crond

OpCharts


OpCharts requires the same services as NMIS, plus

  • opchartsd

OpEvents


OpEvents requires the same services as NMIS, plus:

  • opeventsd

OpConfig


OpConfig requires the same services as NMIS, plus:

  • opconfigd

OpFlow


OpFlow requires the same services as NMIS, plus:

  • opflowd


Issues with Poll Interval

There is a problem the Poll Interval <30m set from the beginning to a Service and then modified to ≥30m.

Recreated problem

We recreated the problem in lab-primary1, using the SSH service of port22 with 1m, adding it to the node testnode3.

https://lab-primary1.opmantek.net/cgi-nmis9/network.pl?act=network_service_view&node=testnode3&refresh=180&widget=false&cluster_id=84e2260c-bb7b-487d-8b5c-c4e9ebc20a65



It is observed that the graphs start to show information, both in NMIS and in opCharts (from 15:35/15:40 to 15:55).


Then, the Poll Interval of the service is modified to 30m.


The rrd update is observed at 15:55.


After 30m, no graphing is observed in the configured Poll Interval, but the rrd update at 16:22 is observed.



We wait another 30m and the service continues without graphing, but we do see a new update in the rrd.


The Poll Interval of the service is set back to 1m (at 16:55).


The charts are recovering, but there is already a 1-hour gap.


New tests with a monitor with 30m from the beginning:



 Poll Interval is changed to 1h and continues to work fine:

Now with 5m:

Conclusion

If a Poll Interval of a service that is 1m, 2m, 10m, 15m is modified to 30m or more, the service graphs are stopped. If the Poll Interval is returned to the initial value (less than 30m), the graphs recover but a gap is observed at the time when it was 30m or more.

If a Poll Interval of a service that is 30m is modified to 60m, and then to 5m, the service graphs continue to work fine. 


  • No labels