If you have a maintenance window or a scheduled outage for a device then you will likely want to suspend alerting for that device during that period.
NMIS has supported this for a long time; NMIS 8.6.2 adds a number of new capabilities, in terms of scheduling such maintenance windows in advance or even recurringly.
Please note that the scope of a scheduled outage is the whole device, ie. no alerting whatsoever takes place for any aspect of the device for the duration of the outage.
Outages do not affect the polling of a device; NMIS does keep track of the device's status and any collected information even during an outage, it just won't produce alerts.
Managing Outage Windows using the GUI
- In the NMIS GUI Select "Service Desk" -> "Alerts" -> "Outages"
- Select the device or devices involved.
- Set the start and end time of the outage window.
- Insert any reference number if required in the ticket number field
You can also delete outages using the GUI. You can, at this time, not update outages in the GUI.
How it works
- NMIS checks for current outages whenever alerts or escalations are processed; If an outage is current, then alerting is suppressed.
- NMIS does, however, not suspend polling during an outage.
- While an outage is current and a node does not respond to ICMP or SNMP, then the KPI's for that poll are set to U (unknown, which prevents them from contributing to averages), this is so the overall reachability and availability results are not reduced for that node which has a planned outage. The KPI's included in this exclusion are:
- Furthermore, the state of nodes with current outages does not contribute to the overall health metrics and KPIs, which helps with reporting statistics so your teams performance will look better!
- In version 8.6.2 and newer, the per-node status view displays current and scheduled/future outages prominently.
- In version 8.6.2 and newer, the events
Planned Outage Openand
Planned Outage Closedare raised for each polled device that enters or leaves an outage window.
- In version 8.6.2 and newer NMIS tracks the state of outages per node and shows a translucent overlay on the node health graph for each outage.
Recurring Outages and Flexible Selectors (8.6.2 and newer)
In version 8.6.2 it is possible to schedule outages with much more flexibility, but that requires the use of the command line tool
outage_admin.pl (see next section).
NMIS supports four frequency types:
Each has its own start and end types, which follow the conventions for opReports' period and frequency format:
Format for Start and End
|once||various||Any of our Supported Time Formats should work, |
but using the ISO8601 format is the most robust choice
|14:00 last monday|
|Wday is one of "Mon", "Tue" ... "Sun" (Case-insensitive).|
Monday is considered first, Sunday last.
Start: Sun 14:00, End Wed 17:00 will cover sun, mon, tue, wed;
Start: Fri 17:00, End Mon 09:00 will cover fri, sat, sun, mon.
|D is the day of the month, 1..31. -D counts from the end of the month; |
-1 is the last day of the month, -2 the second to last etc.
|24:00 means the end of the day, and makes sense only as period end.|
00:00 means the beginning of the day. Leading zeros can be omitted.
NMIS Versions before 8.6.2 support only selection of one or more nodes by name; in 8.6.2 you can use any node configuration property to determine which devices should be subject to your outage window.
Please note that the NMIS GUI does not expose any of these advanced selectors; you have to use
outage_admin.pl to make use of them.
The selectors are given in the format of
outage.node.<propertyname>, and one of three comparison operations are supported:
- Explicit Single Value
outage.node.group=MyGroupName would select nodes that belong to group MyGroupName
- List of Explicit Values, as individual array elements
outage.node.nodeModel.0=net-snmp outage.node.nodeModel.1=RedBackwould select nodes whose model is either net-snmp or Redback
- Regular Expression
outage.node.roleType=/^devel/would select nodes whose configured roleType starts with "devel".
Managing Outages from the Command Line (8.6.2 and newer)
NMIS version 8.6.2 introduces the tool
admin/outage_admin.pl, which lets you perform all outage-related operations. Simply start it without arguments and you'll be given an overview of the supported operations, like this:
To see the required/possible/expected arguments for outage creation, run it with
act=create but no creation arguments, like this: