We all want the best performance for a server, and to optimise physical resources, our configuration has to be fine-grained adjusted. In this guide you will find recommended parameters, that may not suit in all cases, as a server performance will depend on a lot of factors.

Table of Contents

Opmantek Applications

This article configurations are related to Opmantek products. opCharts, opEvents, opConfig, opHA, opReports, ... all use the omkd daemon which servers the frontend requests. Also, opEvents, opCharts and opConfig have their own daemons.

Before Start

The first thing to do will be get the information of out our system:

System Information: NMIS and OMK support tool will give us all the information needed.
Monitor services: NMIS can monitor the involved processes - apache2, nmis9d, omkd and mongod - and provide useful information about CPU and memory - among others.

Number of processes

NMIS runs a daemon to obtain periodically the nodes information.

The number of workers is set in the parameter:

Code Block
nmisd_max_workers

By default 10.

Some aprox. configurations:

Configuration items

In low memory environments lowering the number of omkd workers provides the biggest improvement in stability, even more than tuning mongod.conf does. The default value is 10, but in an environment with low users concurrency it can be decreased to 3-5.

...

OMK has the equivalent parameter:

Code Block
omkd_workers

Setting also omkd_max_requests, will help to have the threads restart gracefully before they get too big.

Code Block
omkd_max_requests

Process size safety limiter: if a max is configured and it's >= 256 mb and we're on linux, then run a process size check every 15 s and gracefully shut down the worker if over size.

Code Block
omkd_max_memory

Process maximum number of concurrent connections, defaults to 1000:

Code Block
omkd_max_clients

The performance logs are really useful for debugging purposes, but they also can affect performance. So, it is recommended to turn them off when they are not necessary:

Code Block
omkd_performance_logs => false

MongoDB memory usage

MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size.

...

Here is an interesting information regarding how MongoDB reserves memory for internal cache and WiredTiger, the underneath technology. Also some adjustment that can be done: https://dba.stackexchange.com/questions/148395/mongodb-using-too-much-memory

Server examples

Two servers are compared in this section.

Master Primary only have one node, but more than 400 poller nodes. opHA process is what will require more CPU and memory usage.
Poller have more more than 500 nodes. nmis process will require more CPU and memory, for polling the information for all the nodes.

Stressed system
Status
colour Green
title poller-nine

System information:

Name	Value	Notes
nmisd_max_workers	10	(nmis9 only)
omkd_workers	4
omkd_max_requests	500
Nodes	406
Active Nodes	507
OS	Ubuntu 18.04.3 LTS
role	poller

This is how the server memory graphs looks in a stressed system - We will be focused focus on the memory as it this is where the bottleneck is:

NMIS process keeps remains stable, is not using more than 120 mb, and the process was stopped - probably killed for the system due to high memory usage:

Status

colour	Yellow
title	TODO

How to check this

...

Check processes once nmis9d is restarted again:

Code Block
top

Healthy system
Status
colour Green
title master-nine

System information:

Name	Value
nmisd_max_workers	5
omkd_workers	10
omkd_max_requests	undef
Nodes	2
Poller Nodes	536
OS	Ubuntu 18.04.3 LTS
role	master

...

Daemons graphs:

omk:

mongo:

NMIS 8

The main NMIS 8 process is called from different cron jobs to run different operations: collect, update, summary, master, ...

For a collect or an update, the main thread creates forks to perform the operation requested.

Configurations that affect performance

There are some important configuration that affects performace:

...

This option always needs to have also the option mthreads=true.

Code Block
nmis8/bin/nmis.pl type=collect abort_after=60 mthread=true ignore_running=true;

Stressed system
Status
colour Green
title Customer server UZH

System information:

Name	Value
nmisd_max_workers	50
nmisd_scheduler_cycle	30
nmisd_worker_cycle	10
nmisd_worker_max_cycles	10

nmis9d is crashing with no error messages.

Some server info:

CentOS 7
463 Nodes
Poller server
High IO Wait

Image Added

increased open files to 100’000

...

If the collect has a lot of nodes to process, the number of children won't reach the limit instantly. While the main thread is forking, the children complete their jobs and will exit. Also, the main process will wait for them to change their state so the number will increase slowly.
NMIS can have more than one instance of the main process running, and the number of children could be higher that max_threads, as the limit is only per instance.

sort_due_nodes: When NMIS decides what to poll it can do some in a pseudo random order which is the default, if your server is overloaded you will likely see some nodes never getting polled, hence pseudo random, so for heavily loaded servers, enable sort_due_nodes, in the NMIS configuration add with the value set to 1.

Gaps in Graphs

If the server takes a long time to collect and cannot complete any operation, an useful tool is nmis8/admin/polling_summary. Here we can see how many nodes have any late collect, and a summary of nodes being collected and not collected.

A symptom of an overloaded server can be the gaps in the graphs.

This is an example about how these parameters can impact in the performance of the server, in a server with 64 CPUs and more than 3700 nodes:

...

totalPoll=3713 ontime=891 1x_late=1460 3x_late=41 12x_late=56 144x_late=1265

...

totalPoll=1229 ontime=998 no_snmp=14 demoted=0 1x_late=217 3x_late=0 12x_late=0 144x_late=0

...

Note that problems in the modelling that throughs errors in the logs can also make the system slow.

...

Space shortcuts

Child pages

Versions Compared

Old Version 17

New Version Current

Key

Related Articles

Opmantek Applications

Before Start

Number of processes

Configuration items

Server examples

Stressed system
Status
colour Green
title poller-nine

Healthy system
Status
colour Green
title master-nine

NMIS 8

Configurations that affect performance

Stressed system
Status
colour Green
title Customer server UZH

Gaps in Graphs

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 17

New Version Current

Key

Related Articles

Opmantek Applications

Before Start

Number of processes

Configuration items

Server examples

Stressed system StatuscolourGreentitlepoller-nine

Healthy system StatuscolourGreentitlemaster-nine

NMIS 8

Configurations that affect performance

Stressed system StatuscolourGreentitleCustomer server UZH

Gaps in Graphs

Stressed system
Status
colour Green
title poller-nine

Healthy system
Status
colour Green
title master-nine

Stressed system
Status
colour Green
title Customer server UZH