Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NMIS can be used to monitor many services, including the services that it depends on itself. This can be useful in root cause analysis if NMIS has problems.

Common services to monitor for opmantek FirstWave NMIS servers are listed below, for . We also keep an up to date Servicecopy of the Services.nmis file you can go to (the mapped file for NMIS services) file, which can be found in our GitHub: https://github.com/Opmantek/nmis9/blob/nmis9_dev/conf-default/Services.nmis.  The team updated these services in the last 12 months to cover using HTTPS or HTTP and updated service process checks.

The current services shipping with NMIS9 at this time are:

  • NMIS9 Workers
  • NMIS9 Scheduler
  • MongoDB
  • Opmantek MongoDBOpmantek Web Daemon
  • "OMK Stack HTTP" or "OMK Stack HTTPS"
  • SSL Expiry Check
  • opEvents Daemon
  • opConfig Daemon
  • opCharts Daemon

Also, it is good to monitor:

  • crond
  • syslog
  • ssh

opHA

...

When running multiple servers utilizing opHA, it is common pratice is practice to have MAIN the Main Primary monitor all the pollers and itself.

If available it is also recommend to have a primary "watch the watcher", i.e. monitor the Main Primary and the pollers as well.

Accessing the Services List

Log into your In the NMIS GUI and navigate to the node you’re interested in.  At the top of the node, click “service list”.

...

Configuring NMIS to monitor a service

Step 1

Log into your When in the NMIS GUI and click System, then System Configuration , then NMIS Nodes (devices)

Step 2

Scroll down the list and find the node you wish to monitor services on.  Click “edit” in the actions column.

Step 3

Scroll down in the box widget that pops up until you reach the Services section and select the services you wish to begin monitoring.  Note that you can select multiple services by holding control Ctrl/Control (for PC/Linux) or command Command (MacmacOS).

Step 4

Click “Edit and Update Node”.  After the next polling cycle (usually about five minutes) you should see that the new services are being monitored.

...

NMIS requires the following services to run:

  • snmpd
  • mongod
  • omkd
  • nmisdnmis9d
  • httpd or /apache (or nginx)
  • crond

...

opCharts

OpCharts opCharts requires the same services as NMIS, pluswith the addition of the below service:

  • opchartsd

...

opEvents

OpEvents opEvents requires the same services as NMIS, pluswith the addition of the below service:

  • opeventsd

...

opConfig

OpConfig opConfig requires the same services as NMIS, pluswith the addition of the below service:

  • opconfigd

...

opFlow

OpFlow opFlow requires the same services as NMIS, plus:

  • opflowd

...

with

...

There is a problem the Poll Interval <30m set from the beginning to a Service and then modified to ≥30m.

Recreated problem

We recreated the problem in lab-primary1, using the SSH service of port22 with 1m, adding it to the node testnode3.

https://lab-primary1.opmantek.net/cgi-nmis9/network.pl?act=network_service_view&node=testnode3&refresh=180&widget=false&cluster_id=84e2260c-bb7b-487d-8b5c-c4e9ebc20a65

Image Removed

It is observed that the graphs start to show information, both in NMIS and in opCharts (from 15:35/15:40 to 15:55).

Image Removed

Image Removed

Then, the Poll Interval of the service is modified to 30m.

Image Removed

The rrd update is observed at 15:55.

Image Removed

After 30m, no graphing is observed in the configured Poll Interval, but the rrd update at 16:22 is observed.

Image Removed

Image Removed

Image Removed

We wait another 30m and the service continues without graphing, but we do see a new update in the rrd.

Image Removed

Image Removed

Image Removed

The Poll Interval of the service is set back to 1m (at 16:55).

Image Removed

Image Removed

The charts are recovering, but there is already a 1-hour gap.

Image Removed

Image Removed

New tests with a monitor with 30m from the beginning:

Image Removed

Image Removed

Image Removed

Image Removed

Image Removed

 Poll Interval is changed to 1h and continues to work fine:

Image Removed

Image Removed

Image Removed

Image Removed

Now with 5m:

Image Removed

Conclusion

If a Poll Interval of a service that is 1m, 2m, 10m, 15m is modified to 30m or more, the service graphs are stopped. If the Poll Interval is returned to the initial value (less than 30m), the graphs recover but a gap is observed at the time when it was 30m or more.

...

the addition of the below service:

  • opflowd