Recommended Configuration for Server Performance

There are lots of factors that determine the system health of a server. The hardware capabilities - CPU, memory or disk - is an important one, but also the server load - number of devices (Nodes to be polled, updated, audited, synchronised), number of products (NMIS, OAE, opCharts, opHA - each running different processes), number of concurrent users.

We all want the best performance for a server, and to optimise physical resources, our configuration has to be fine-grained adjusted. In this guide you will find recommended parameters, that may not suit in all cases, as a server performance will depend on a lot of factors.

Before Start

The first thing to do will be get the information of out system:

System Information: NMIS and OMK support tool will give us all the information needed.
Monitor services: NMIS can monitor the involved processes - apache2, nmis9d, omkd and mongod - and provide useful information about CPU and memory - among others. TODO How to monitor these services

Number of processes

NMIS runs a daemon to obtain periodically the nodes information.

The number of workers is set in the parameter:

nmisd_max_workers

By default 10.

OMK has the equivalent parameter:

omkd_workers

Setting also omkd_max_requests, will help to have the threads restart gracefully before they get too big.

omkd_max_requests

MongoDB memory usage

MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size.

MongoDB cache size can be changed by adding the cacheSizeGB argument to the /etc/mongod.conf configuration file, as shown below.

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  wiredTiger:
      engineConfig:
         cacheSizeGB: 1

Here is an interesting information regarding how MongoDB reserves memory for internal cache and WiredTiger, the underneath technology. Also some adjustment that can be done: https://dba.stackexchange.com/questions/148395/mongodb-using-too-much-memory

Server examples

Two servers are compared in this section.

Master only have one node, but more than 400 poller nodes. opHA process is what will require more CPU and memory usage.
Poller have more more than 500 nodes. nmis process will require more CPU and memory, for polling the information for all the nodes.

Stressed system POLLER-NINE

System information:

Name	Value
nmisd_max_workers	10
omkd_workers	4
omkd_max_requests	500
Nodes	406
Active Nodes	507
OS	Ubuntu 18.04.3 LTS
role	poller

This is how the server memory graphs looks in a stressed system - We will be focused on the memory as it is where the bottleneck is:

NMIS process keeps stable, is not using more than 120 mb, and the process was stopped - probably killed for the system due to high memory usage: TODO How to check this