Metrics not populating

-1

Hello again,

Thanks for the help with my previous question.

I have a new issue that effects 5 out of 52 nodes.

It seems as if SNMP data can be polled by nmis if I use snmpwalk:

snmpwalk -v 2c -r 1 -c XYZ IP_ADDRESS

And if I use your script as such:

/usr/local/nmis8/bin/nmis.pl type=update node=Hypervisor01 type=collect debug=true

Then the graphs will reflect that little piece of data, and but then no subsequent polls.

So it seems that whatever does the job of the above is perhaps not working?

Incidentally, I did attempt to fix permissions:

/usr/local/nmis8/admin/fixperms.pl

But the issue is still occuring.

4 of the nodes are ESXi hypervisors/hosts.

The other is a windows 2012 R2 server (there are other nodes with the same OS that generate data flawlessly)

I did try to search this log:

grep Hypervisor01 /usr/local/nmis8/logs/nmis.log

But I could not find anything I could use.

Suggestions on where to start, what to look for etc would be greatly appreciated.

Cheers,

Harry

Comment

CommentAdd your comment...

4 answers

2
1
0
Mark Dueck
Jun 24, 2016
#2a, is probably relevant and could be (at least part of) the issue.

22-Jun-2016 17:10:07,nmis.pl::runPing#1071<br>INFO (Hypervisor01) standard ping system using, no ping info of daemon fpingd

NMIS likes to use fping and fpingd.pl to get ping results. The log message is saying that they can't be found so it reverts to using standard ping, this should work but could be causing some issues.
This link provides some info on NMIS and fping: NMIS8 and fping or just ping
Some things to verify:
The server has fping installed
nmis8/bin/fpingd.pl is executable
Is fpingd.pl already running (NMIS tries to start it)
Try running/restarting fpingd.pl, checking the logs to see if there are any issues
fpingd.pl should be writing the results to nmis8/var/nmis-fping.json (or possibly .nmis)
Info on running fpingd.pl can be found by running it with no argumets:

[root@crash-n-burn bin]# ./fpingd.pl fpingd.pl Version 8.5.10G Usage: fpingd.pl <restart|kill]=[true|false]>[debug=true|false] [logging=true|false] [conf=alt.config] Command line options are: restart=true - kill any running daemon(s) and restarts! debug=true - print status to console and logfile kill=true - kill any running daemon(s) and exit. Does not launch a new daemon! logging=true - creates a log file 'fpingd.log' in the standard nmis log directory conf=*.nmis - specify an alternative Conf.nmis file. a new daemon is started ONLY with restart=true default is no logging, no debug
Comment
CommentAdd your comment...
2
1
0
Mark Dueck
Jun 24, 2016
Hi Harry,
There are a few things to check:
NMIS will only attempt to collect SNMP info from a node if the node configuration has collect set to true, because it is working when run from the CLI
I don't believe this is the problem
NMIS will check the ping results for the node, if there is no response from the most recent check(total ping loss) NMIS will not attempt to collect SNMP info from the device.
This could be your issue, the logs and GUI would show something if this was happening
The affected nodes would probably appear as being down
If NMIS cannot write to the RRD file
Running fixperms.pl should have resolved this issue
NMIS will log if it cannot write to an RRD
ESXi servers can have issues with SNMP message size, you may need to set the max_msg_size config option to a lower value, this issue would appear in the logs as well
This is well documented on this page: SNMP Tuning
This could be your issue.
A thorough look through the logs almost always points to the answer, if you cannot find anything after a good search you can run the support tool and email the output to us at support, customers who purchase support get top priority.
Comment
Harry Milanes
Jun 24, 2016
Hello and thank you! 1a: Correct, collect is set to true 2a: Could this routine entry in the log be relevant? 22-Jun-2016 17:10:07,nmis.pl::runPing#1071<br>INFO (Hypervisor01) standard ping system using, no ping info of daemon fpingd 2b: The node insists it is UP. 3a: Permissions are correct, I checked them all. 3b: There are some routine write permissions errors, but not pertaining to that particular node - or any of the others that are not generating data. 4a+b: I have adjusted the SNMP Max Repetitions to 15 as per the guide (thanks for that). Reducing the message size to 1024 caused errors in the logs, but since no other size errors are reported it doesn't seem as if this is to blame? 5: I'll speak to my manager about this one. I do not have the authority.
CommentAdd your comment...
1
0
-1
Harry Milanes
Oct 17, 2016
Hi,

This is happening on a brand new server again.

I'm convinced it's a bug with the way NMIS calculates it's metrics.

The cause of which is editing the metric weighting in nmis8/conf/Config.nmis

You can see here, the KPI metrics are quite wrong, and the same thing happens when adding a brand new node:

I've done a reboot and tried looking into some cleanup scripts, even run my own watch job to manually update every 40 seconds (watch -n40 "./nmis/bin/nmis.pl type=update mthread=true maxthreads=20"), but the KPI values never settle down - though they do fluctuate.

Any suggestions? I'd rather not keep having to export nodes, rebuild virtual appliance and import every few months

Cheers,

Harry
Comment
CommentAdd your comment...
1
0
-1
Harry Milanes
Jun 24, 2016
OK Cool.

So I executed:

fpingd.pl restart=true

Which successfully restarted the fping daemon.
Since then the logs are clear of the error I was experiencing for the last hour.

On Hypervisor01 node, Availability/Response/CPU/MEM/Interface/Disk are all being populated.
But still no graphical data, and nothing in the KPI graph either - which is weird since the above is populating successfully.

Oh and yes, nmis8/var/nmis-fping.nmis is indeed being populated.

Would a screenshot help?
http://www.tiikoni.com/tis/view/?id=d898b3b

Cheers,

Harry
Comment
Mark Dueck
Jun 24, 2016
Run an update to make sure that missing information isn't causing an issue now: /usr/local/nmis8/bin/nmis.pl type=update mthread=true maxthreads=10 If that doesn't fix it, sadly I'm going to suggest a reboot. And after that emailing support@ with the support files from the support tool in /usr/local/admin/support.pl
Harry Milanes
Jun 27, 2016
Thanks. I have rebooted a few times and it still won't populate graph data. I will submit these logs. Might changing the weighting in Config.nmis create this issue? From 'weight_cpu' => '0.2', To 'weight_cpu' => '0.4',
Harry Milanes
Jun 27, 2016
What else is interesting is that for servers without graph data, I can poll like this for example: watch -n10 'grep "hrCpuLoad" var/hypervisor01-node.json And the values all change. So the data is coming in. It just won't construct any graphs.
CommentAdd your comment...