Thanks for the help with my previous question.
I have a new issue that effects 5 out of 52 nodes.
It seems as if SNMP data can be polled by nmis if I use snmpwalk:
And if I use your script as such:
Then the graphs will reflect that little piece of data, and but then no subsequent polls.
So it seems that whatever does the job of the above is perhaps not working?
Incidentally, I did attempt to fix permissions:
But the issue is still occuring.
4 of the nodes are ESXi hypervisors/hosts.
The other is a windows 2012 R2 server (there are other nodes with the same OS that generate data flawlessly)
I did try to search this log:
But I could not find anything I could use.
Suggestions on where to start, what to look for etc would be greatly appreciated.
#2a, is probably relevant and could be (at least part of) the issue.
NMIS likes to use fping and fpingd.pl to get ping results. The log message is saying that they can't be found so it reverts to using standard ping, this should work but could be causing some issues.
This link provides some info on NMIS and fping: NMIS8 and fping or just ping
Some things to verify:
Info on running fpingd.pl can be found by running it with no argumets:
There are a few things to check:
Hello and thank you! 1a: Correct, collect is set to true 2a: Could this routine entry in the log be relevant? 22-Jun-2016 17:10:07,nmis.pl::runPing#1071<br>INFO (Hypervisor01) standard ping system using, no ping info of daemon fpingd 2b: The node insists it is UP. 3a: Permissions are correct, I checked them all. 3b: There are some routine write permissions errors, but not pertaining to that particular node - or any of the others that are not generating data. 4a+b: I have adjusted the SNMP Max Repetitions to 15 as per the guide (thanks for that). Reducing the message size to 1024 caused errors in the logs, but since no other size errors are reported it doesn't seem as if this is to blame? 5: I'll speak to my manager about this one. I do not have the authority.
This is happening on a brand new server again.
I'm convinced it's a bug with the way NMIS calculates it's metrics.
The cause of which is editing the metric weighting in nmis8/conf/Config.nmis
You can see here, the KPI metrics are quite wrong, and the same thing happens when adding a brand new node:
I've done a reboot and tried looking into some cleanup scripts, even run my own watch job to manually update every 40 seconds (watch -n40 "./nmis/bin/nmis.pl type=update mthread=true maxthreads=20"), but the KPI values never settle down - though they do fluctuate.
Any suggestions? I'd rather not keep having to export nodes, rebuild virtual appliance and import every few months
So I executed:
Which successfully restarted the fping daemon.
Since then the logs are clear of the error I was experiencing for the last hour.
On Hypervisor01 node, Availability/Response/CPU/MEM/Interface/Disk are all being populated.
But still no graphical data, and nothing in the KPI graph either - which is weird since the above is populating successfully.
Oh and yes, nmis8/var/nmis-fping.nmis is indeed being populated.
Would a screenshot help?
Run an update to make sure that missing information isn't causing an issue now: /usr/local/nmis8/bin/nmis.pl type=update mthread=true maxthreads=10 If that doesn't fix it, sadly I'm going to suggest a reboot. And after that emailing support@ with the support files from the support tool in /usr/local/admin/support.pl
Thanks. I have rebooted a few times and it still won't populate graph data. I will submit these logs. Might changing the weighting in Config.nmis create this issue? From 'weight_cpu' => '0.2', To 'weight_cpu' => '0.4',
What else is interesting is that for servers without graph data, I can poll like this for example: watch -n10 'grep "hrCpuLoad" var/hypervisor01-node.json And the values all change. So the data is coming in. It just won't construct any graphs.