2
1
0

Some devices on the network have reachability that is highly irregular, this causes lots of ping down events/emails/tickets/etc. How can we suppress these and stop the noise?

    CommentAdd your comment...

    1 answer

    1.  
      2
      1
      0

      Dave,

      This can happen on nodes with very variable latency and/or connectivity. For a node to be declared down, it needs to have 100% ping loss; a ping is lost when it fails to respond in the timeout period.  By default NMIS sends 3 pings and all 3 pings must fail. When using fping, this is done with an exponential back-off algorithm, so if fping says its down, its down.

      The first step would be to adjust the PING Timeout to a higher value, this will likely resolve your issues, by default this is set to 300 milliseconds. The configuration options for this are:

          'fastping_timeout' => '5000',

          'ping_timeout' => '5000',

      I would set these to 5000 (5000 milliseconds), they are probably set at 300 right now.

      If you wanted to have 5 pings instead of three, which would mean all 5 pings have to fail for it to be considered node down, you can change the following settings.

          'fastping_count' => '5',

          'ping_count' => '5',

      Change these to 5 if you like.  

      More information is available HERE: NMIS8 and fping or just ping, and

      Ping Timeout and NMIS, including fast ping - fping

      Note: The downside to having higher ping timeouts and counts is that during outages, NMIS will take longer to declare nodes down and will get a little more behind.

      Mark H

      1. Nick Day

        Are there any issues with fpingd keeping state about all the pings going on when changed to 5 seconds vs 300ms ?

      CommentAdd your comment...