You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

 

Overview

 

NMIS 8.6.6 introduces a new feature called 'polling failover' for monitoring hosts that are reachable by multiple/redundant paths. This page briefly describes this feature.

Best Current Practices and Alternatives

Most environments that deal with multi-homed nodes use aside 'loopback' addresses for monitoring, and let their routing infrastructure handle any failover, convergence and re-routing required.
This makes path changes transparent to NMIS and thus minimises negative effects of such a change.

If that is not an option in your environment, then you may want to use NMIS' new feature for explicit polling failover.

Node Configuration for Polling Failover

NMIS normally communicates with a node exclusively  using the configured host address/name.
If this address becomes unreachable, then NMIS attempts to switch over to the host_backup address on the fly and continues polling like normal.

To enable this capability, simply add your host's secondary address/name in the node configuration dialog (like in the screenshot below) and run a type=update operation:

As long as at least one address remains reachable, NMIS will be able to poll the node.

Should the primary address become inaccessible, then NMIS switches polling over to the fallback host_backup address, and raises an event  called 'Node Polling Failover'; the node is also flagged as being in 'degraded' state. This event is cleared if and when the primary address becomes reachable again.
When NMIS polling has fallen back to the secondary address, the node's status shows "Node Polling Failover" as one of the reasons for the degraded state, like in the screenshot below:

If all addresses of the node are unreachable, then NMIS flags the node as 'unreachable'.

Known Limitations

  • NMIS currently pings both addresses in parallel.
    Changes of ping status do not cause events to be raised or cleared.
  • Only one set of Ping (Response Time, Packet Loss) statistics is recorded.
    The Ping statistics will switch transparently from primary to fallback address if and when the primary becomes unreachable.
  • Polling failover is not available for WMI data collection.
  • The polling failover decision is made for each collect or update operation, regardless of both previous results and of which addresses are pingable.
    Whenever an SNMP connection needs to be opened, NMIS tries the primary address first, and if that fails, switches to the secondary.
    This can introduce undesirable delays to a node's polling, but minimises latency for switching back when the primary address becomes accessible again.

  • No labels