1
0
-1

Hello again,

 

Thanks for the help with my previous question.

 

I have a new issue that effects 5 out of 52 nodes.

 

It seems as if SNMP data can be polled by nmis if I use snmpwalk:

snmpwalk -v 2c -r 1 -c XYZ IP_ADDRESS

 

And if I use your script as such:

/usr/local/nmis8/bin/nmis.pl type=update node=Hypervisor01 type=collect debug=true

Then the graphs will reflect that little piece of data, and but then no subsequent polls.

So it seems that whatever does the job of the above is perhaps not working?

 

Incidentally, I did attempt to fix permissions:

/usr/local/nmis8/admin/fixperms.pl

But the issue is still occuring.

 

4 of the nodes are ESXi hypervisors/hosts.

The other is a windows 2012 R2 server (there are other nodes with the same OS that generate data flawlessly)

 

I did try to search this log:

grep Hypervisor01 /usr/local/nmis8/logs/nmis.log

 

But I could not find anything I could use.

 

Suggestions on where to start, what to look for etc would be greatly appreciated.

 

Cheers,

 

Harry

    CommentAdd your comment...

    4 answers

    1.  
      2
      1
      0

      #2a, is probably relevant and could be (at least part of) the issue.

      22-Jun-2016 17:10:07,nmis.pl::runPing#1071<br>INFO (Hypervisor01) standard ping system using, no ping info of daemon fpingd

      NMIS likes to use fping and fpingd.pl to get ping results. The log message is saying that they can't be found so it reverts to using standard ping, this should work but could be causing some issues.

      This link provides some info on NMIS and fping: NMIS8 and fping or just ping

      Some things to verify:

      1. The server has fping installed
      2. nmis8/bin/fpingd.pl is executable 
      3. Is fpingd.pl already running (NMIS tries to start it) 
      4. Try running/restarting fpingd.pl, checking the logs to see if there are any issues 
      5. fpingd.pl should be writing the results to nmis8/var/nmis-fping.json (or possibly .nmis) 

      Info on running fpingd.pl can be found by running it with no argumets:

      [root@crash-n-burn bin]# ./fpingd.pl 
      fpingd.pl Version 8.5.10G
      
      
      Usage: fpingd.pl <restart|kill]=[true|false]>[debug=true|false] [logging=true|false] [conf=alt.config]
      
      Command line options are:
       restart=true   - kill any running daemon(s) and restarts!
       debug=true     - print status to console and logfile
       kill=true      - kill any running daemon(s) and exit. Does not launch a new daemon!
       logging=true   - creates a log file 'fpingd.log' in the standard nmis log directory
       conf=*.nmis    - specify an alternative Conf.nmis file.
      
      a new daemon is started ONLY with restart=true
      default is no logging, no debug

       


        CommentAdd your comment...
      1.  
        2
        1
        0

        Hi Harry,

        There are a few things to check:

        1. NMIS will only attempt to collect SNMP info from a node if the node configuration has collect set to true, because it is working when run from the CLI
          1. I don't believe this is the problem
        2. NMIS will check the ping results for the node, if there is no response from the most recent check(total ping loss) NMIS will not attempt to collect SNMP info from the device.
          1. This could be your issue, the logs and GUI would show something if this was happening
          2. The affected nodes would probably appear as being down
        3. If NMIS cannot write to the RRD file
          1. Running fixperms.pl should have resolved this issue
          2. NMIS will log if it cannot write to an RRD 
        4. ESXi servers can have issues with SNMP message size, you may need to set the max_msg_size config option to a lower value, this issue would appear in the logs as well
          1. This is well documented on this page: SNMP Tuning
          2. This could be your issue.
        5. A thorough look through the logs almost always points to the answer, if you cannot find anything after a good search you can run the support tool and email the output to us at support, customers who purchase support get top priority.

         

        1. Harry Milanes

          Hello and thank you! 1a: Correct, collect is set to true 2a: Could this routine entry in the log be relevant? 22-Jun-2016 17:10:07,nmis.pl::runPing#1071<br>INFO (Hypervisor01) standard ping system using, no ping info of daemon fpingd 2b: The node insists it is UP. 3a: Permissions are correct, I checked them all. 3b: There are some routine write permissions errors, but not pertaining to that particular node - or any of the others that are not generating data. 4a+b: I have adjusted the SNMP Max Repetitions to 15 as per the guide (thanks for that). Reducing the message size to 1024 caused errors in the logs, but since no other size errors are reported it doesn't seem as if this is to blame? 5: I'll speak to my manager about this one. I do not have the authority.

        CommentAdd your comment...
      2.  
        1
        0
        -1

        Hi,

         

        This is happening on a brand new server again.

         

        I'm convinced it's a bug with the way NMIS calculates it's metrics.

         

        The cause of which is editing the metric weighting in nmis8/conf/Config.nmis

         

        You can see here, the KPI metrics are quite wrong, and the same thing happens when adding a brand new node:

         

         

        I've done a reboot and tried looking into some cleanup scripts, even run my own watch job to manually update every 40 seconds (watch -n40 "./nmis/bin/nmis.pl type=update mthread=true maxthreads=20"), but the KPI values never settle down - though they do fluctuate.

         

        Any suggestions? I'd rather not keep having to export nodes, rebuild virtual appliance and import every few months (smile)

         

        Cheers,

         

        Harry

          CommentAdd your comment...
        1.  
          1
          0
          -1

          OK Cool.

           

          So I executed:

          fpingd.pl restart=true

          Which successfully restarted the fping daemon.

          Since then the logs are clear of the error I was experiencing for the last hour.

           

          On Hypervisor01 node, Availability/Response/CPU/MEM/Interface/Disk are all being populated.

          But still no graphical data, and nothing in the KPI graph either - which is weird since the above is populating successfully.

           

          Oh and yes, nmis8/var/nmis-fping.nmis is indeed being populated.

           

          Would a screenshot help?

          http://www.tiikoni.com/tis/view/?id=d898b3b

           

          Cheers,

           

          Harry

          1. Mark Dueck

            Run an update to make sure that missing information isn't causing an issue now: /usr/local/nmis8/bin/nmis.pl type=update mthread=true maxthreads=10 If that doesn't fix it, sadly I'm going to suggest a reboot. And after that emailing support@ with the support files from the support tool in /usr/local/admin/support.pl

          2. Harry Milanes

            Thanks. I have rebooted a few times and it still won't populate graph data. I will submit these logs. Might changing the weighting in Config.nmis create this issue? From 'weight_cpu' => '0.2', To 'weight_cpu' => '0.4',

          3. Harry Milanes

            What else is interesting is that for servers without graph data, I can poll like this for example: watch -n10 'grep "hrCpuLoad" var/hypervisor01-node.json And the values all change. So the data is coming in. It just won't construct any graphs.

          CommentAdd your comment...