1 answer
- 210
Dave,
This can happen on nodes with very variable latency and/or connectivity. For a node to be declared down, it needs to have 100% ping loss; a ping is lost when it fails to respond in the timeout period. By default NMIS sends 3 pings and all 3 pings must fail. When using fping, this is done with an exponential back-off algorithm, so if fping says its down, its down.
The first step would be to adjust the PING Timeout to a higher value, this will likely resolve your issues, by default this is set to 300 milliseconds. The configuration options for this are:
'fastping_timeout' => '5000',
'ping_timeout' => '5000',
I would set these to 5000 (5000 milliseconds), they are probably set at 300 right now.
If you wanted to have 5 pings instead of three, which would mean all 5 pings have to fail for it to be considered node down, you can change the following settings.
'fastping_count' => '5',
'ping_count' => '5',
Change these to 5 if you like.
More information is available HERE: NMIS8 and fping or just ping, and
Ping Timeout and NMIS, including fast ping - fping
Note: The downside to having higher ping timeouts and counts is that during outages, NMIS will take longer to declare nodes down and will get a little more behind.
Mark H
- Nick Day
Are there any issues with fpingd keeping state about all the pings going on when changed to 5 seconds vs 300ms ?
Add your comment...
Some devices on the network have reachability that is highly irregular, this causes lots of ping down events/emails/tickets/etc. How can we suppress these and stop the noise?