Sample Header Ad - 728x90

Understanding iostat with Linux software RAID

4 votes
1 answer
3317 views
I'm trying to understand what I see in iostat, specifically the differences between the output for md and sd devices. I have a couple of quite large Centos Linux servers, each with E3-1230 CPU, 16 GB RAM and 4 2TB SATA disk drives. Most are JBOD, but one is configure with software RAID 1+0. The servers have very similar type and amount of load, but the %util figures I get with iostat on the software raid one is much higher than others, and I'm trying to understand why. All servers are usually 80-90% idle with regard to CPU. *Example of iostat on a server without RAID:*
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.26    0.19    1.15    2.55    0.00   86.84

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               2.48     9.45   10.45   13.08  1977.55  1494.06   147.50     2.37  100.61   3.86   9.08
sdc               4.38    24.11   13.25   20.69  1526.18  1289.87    82.97     1.40   41.14   3.94  13.36
sdd               0.06     1.28    1.43    2.50   324.67   587.49   232.32     0.45  113.73   2.77   1.09
sda               0.28     1.06    1.33    0.97   100.89    61.63    70.45     0.06   27.14   2.46   0.57
dm-0              0.00     0.00    0.17    0.24     4.49     1.96    15.96     0.01   18.09   3.38   0.14
dm-1              0.00     0.00    0.09    0.12     0.74     0.99     8.00     0.00    4.65   0.36   0.01
dm-2              0.00     0.00    1.49    3.34   324.67   587.49   188.75     0.45   93.64   2.25   1.09
dm-3              0.00     0.00   17.73   42.82  1526.17  1289.87    46.50     0.35    5.72   2.21  13.36
dm-4              0.00     0.00    0.11    0.03     0.88     0.79    12.17     0.00   19.48   0.87   0.01
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    1.17   1.17   0.00
dm-6              0.00     0.00   12.87   20.44  1976.66  1493.27   104.17     2.77   83.01   2.73   9.08
dm-7              0.00     0.00    1.36    1.58    95.65    58.68    52.52     0.09   29.20   1.55   0.46
*Example of iostat on a server with RAID 1+0:*
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.55    0.25    1.01    3.35    0.00   87.84

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb              42.21    31.78   18.47   59.18  8202.18  2040.94   131.91     2.07   26.65   4.02  31.20
sdc              44.93    27.92   18.96   55.88  8570.70  1978.15   140.94     2.21   29.48   4.60  34.45
sdd              45.75    28.69   14.52   55.10  8093.17  1978.16   144.66     0.21    2.95   3.94  27.42
sda              45.05    32.59   18.22   58.37  8471.04  2040.93   137.24     1.57   20.56   5.04  38.59
md1               0.00     0.00   18.17  162.73  3898.45  4013.90    43.74     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     4.89     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.07    0.26     3.30     2.13    16.85     0.04  135.54  73.73   2.38
dm-1              0.00     0.00    0.25    0.22     2.04     1.79     8.00     0.24  500.99  11.64   0.56
dm-2              0.00     0.00   15.55  150.63  2136.73  1712.31    23.16     1.77   10.66   2.93  48.76
dm-3              0.00     0.00    2.31    2.37  1756.39  2297.67   867.42     2.30  492.30  13.08   6.11
So my questions are: 1) Why is there such a relatively high %util on the server with RAID vs the one without. 2) On the non-RAID server the %util of the combined physical devices (sd*) are more or less the same as the combined LVM devices (dm-*). Why is that not the case for the RAID server? 3) Why does it seem like the software RAID devices (md*) are virtually idle, while the underlying physical devices (sd*) are busy? My first thought was that it might be caused by RAID checking, but /proc/mdadm shows all good. Edit: Apologies, I thought the question was clear, but that seems there is some confusion about it. Obviously the question is not about the difference in the %util between drives on one server, but why the total/avg %util value on one server is so different from the other. Hope that clarifies any misunderstanding.
Asked by Dokbua (209 rep)
Jul 10, 2014, 02:00 PM
Last activity: May 22, 2020, 09:02 AM