Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
[root@nmisslvcc5 log]# ps -auxf | egrep " D| Z"
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      1563  0.1  0.0      0     0 ?        D    Mar17  10:47  \_ [jbd2/dm-2-8]
root      1565  0.0  0.0      0     0 ?        D    Mar17   0:43  \_ [jbd2/dm-3-8]
root      1615  0.3  0.0      0     0 ?        D    Mar17  39:26  \_ [flush-253:2]
root      1853  0.0  0.0  29764   736 ?        D<sl Mar17   0:04 auditd
root     17898  0.0  0.0 103320   872 pts/5    S+   12:20   0:00  |       \_ egrep  D| Z
apache   17856 91.0  0.2 205896 76212 ?        D    12:19   0:01  |   \_ /usr/bin/perl /usr/local/nmis8/
root     13417  0.6  0.8 565512 306812 ?       D    10:38   0:37  \_ opmantek.pl webserver             -
root     17833  9.8  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>
root     17838 10.3  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>
root     17842 10.6  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>




Test Disk I/O Performance With dd Command

The dd command is very sensitive regarding the parameters it handles, since it can cause serious problems on your server, OMK uses this command to obtain and measure server performance and latency, so with this we determine that the writing speed and reading the disc.

...

OMK's base scale is as follows:
0.0X s to be correct.
0.X s, there is a warning (and there would be
issue)
X.0 s would be critical (and there would be a problem).


Please note that one gigabyte was written for the test and 47 MB/s was the performance and the time it took to write the block was 0.223301 seconds from the server for this test.

Where:

  • if=/dev/zero (if=/dev/input.file) : The name of the input file you want dd the read from.
  • of=/data/omkTestFile (of=/path/to/output.file) : The name of the output file you want dd write the input.file to.
  • bs=10M (bs = block-size): set the size of the block you want dd to use. Note that Linux will need free RAM space. If your test system doesn't have enough RAM available, use a smaller parameter for bs (like 128MB or 64MB, etc. or you can even test with 1, 2, or even 3 gigabytes).
  • count=1 (count=number-of-blocks): The number of blocks you want dd to read.
  • oflag=dsync (oflag=dsync) : Use synchronized I/O for data. Do not skip this option. This option get rid of caching and gives you good and accurate results
  • conv=fdatasyn: Again, this tells dd to require a complete “sync” once, right before it exits. This option is equivalent to oflag=dsync.


Polling summary

The OPMANTEK monitoring system has the polling_summary tool, this will help us determine if the server takes a long time to collect the information from the nodes and cannot complete any operation, here we can see how many nodes have a late collection and a summary of the collected and uncollected nodes.

...

Code Block
[root@opmantek ~]# iostat -xtc 3 5
Linux 2.6.32-754.28.1.el6.x86_64 (opmantek)     04/05/2021      _x86_64_        (8 CPU)

04/05/2021 09:23:40 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           12.47    0.00    0.73   10.53    0.00   86.72

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    4.50   35.50   148.00   452.00    15.00   110.98 4468.74  274.22 5000    0.60    100.00
sdb               0.00    42.50    0.00    6.50     0.00   392.00    60.31     0.13   20.00    0.00   20    0.34    92.12
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0    0.65    56.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0    0.86    10.50
dm-2              0.00     0.00    4.50   52.00   140.00   416.00     9.84   149.56 5229.59  274.22 5658    0.21    25.03
dm-3              0.00     0.00    0.00    0.50     0.00     4.00     8.00    66.00    0.00    0.00    0    0.45    14.40


04/05/2021 09:23:43 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           18.17    0.00    5.29    6.31    0.00   76.82

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    50.00    9.50   19.00   596.00   260.00    30.04   130.41 2569.47  283.11 3712     0.60    92.36
sdb               0.00    36.50    0.50   59.00     8.00   764.00    12.97    25.34  425.82   18.00  429     0.25    78.82
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.23    92.45
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.86    88.93
dm-2              0.00     0.00    8.00  163.50   440.00  1308.00    10.19   240.76  966.94  337.38  997     0.37    68.28
dm-3              0.00     0.00    0.00   33.00     0.00   264.00     8.00    48.31    0.00    0.00    0     0.18    12.75 

04/05/2021 09:23:46 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.50    0.00    1.21    11.37    0.00   75.56

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    9.50   18.00   268.00   220.00    17.75   112.91 1763.73  143.42 2618     0.85    100.00
sdb               0.00    10.00    2.00    1.50   112.00    92.00    58.29     0.01    3.86    6.25    0     0.94     97.54
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.45     75.39
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.78     24.96
dm-2              0.00     0.00   13.50   11.50   552.00    92.00    25.76   185.21 3029.96  101.85 6467     0.25     67.18 
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.86     43.91

04/05/2021 09:23:49 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           12.10    0.00    7.21    9.17    0.00   87.92

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    55.50    7.00   44.00    92.00   488.00    11.37   110.52  929.20  139.86 1054     0.75    89.54
sdb               0.00    65.00    0.50   34.00     4.00   792.00    23.07     0.83   24.09    1.00   24     0.55    93.61
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.14    99.99
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0     0.36    78.98
dm-2              0.00     0.00    7.00  242.50    84.00  1940.00     8.11   179.44  240.22  137.36  243     0.75    25.30
dm-3              0.00     0.00    0.00    5.00     0.00    40.00     8.00     1.30  305.90    0.00  305     0.23    45.12

04/05/2021 09:23:52 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.50    0.00    11.21    19.30    0.00   92.92

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.16   114.34    7.02  191.18   132.04  2444.27    13.00     3.60   18.18   81.41   15     0.14    99.99    
sdb               0.03   205.87    2.36   70.03    31.22  2207.55    30.92     5.81   80.25   53.76   81     0.94    97.54
dm-0              0.00     0.00    0.10    1.01    11.77     8.07    17.90     0.84  755.10   72.31  822     0.60    98.36
dm-1              0.00     0.00    0.09    0.13     0.74     1.03     8.00     0.22  985.66  153.25 1580     0.47    94.48
dm-2              0.00     0.00    9.25  575.59   129.18  4604.83     8.09     6.09    9.74   74.24    8     0.61    82.37
dm-3              0.00     0.00    0.12    4.74    21.57    37.89    12.24     2.52  518.00  131.58  527     0.23    93.15

[root@opmantek ~]#



This problem was solved with moving the MV to an environment with solid state disks, the client validated that the MV was using mechanical disks (HDD), so a clone of a laboratory MV does not work since it is presented the same Problem, when replacing HDD disks to solid state disks, the MV and the monitoring services stabilize, the RAM memory, CPU and disk utilization is normal, this according to the nodes that the monitoring system is monitoring .