I'm trying to understand what I see in
iostat
, specifically the differences between the output for md and sd devices.
I have a couple of quite large Centos Linux servers, each with E3-1230 CPU, 16 GB RAM and 4 2TB SATA disk drives. Most are JBOD, but one is configure with software RAID 1+0. The servers have very similar type and amount of load, but the %util
figures I get with iostat
on the software raid one is much higher than others, and I'm trying to understand why. All servers are usually 80-90% idle with regard to CPU.
*Example of iostat
on a server without RAID:*
avg-cpu: %user %nice %system %iowait %steal %idle 9.26 0.19 1.15 2.55 0.00 86.84 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 2.48 9.45 10.45 13.08 1977.55 1494.06 147.50 2.37 100.61 3.86 9.08 sdc 4.38 24.11 13.25 20.69 1526.18 1289.87 82.97 1.40 41.14 3.94 13.36 sdd 0.06 1.28 1.43 2.50 324.67 587.49 232.32 0.45 113.73 2.77 1.09 sda 0.28 1.06 1.33 0.97 100.89 61.63 70.45 0.06 27.14 2.46 0.57 dm-0 0.00 0.00 0.17 0.24 4.49 1.96 15.96 0.01 18.09 3.38 0.14 dm-1 0.00 0.00 0.09 0.12 0.74 0.99 8.00 0.00 4.65 0.36 0.01 dm-2 0.00 0.00 1.49 3.34 324.67 587.49 188.75 0.45 93.64 2.25 1.09 dm-3 0.00 0.00 17.73 42.82 1526.17 1289.87 46.50 0.35 5.72 2.21 13.36 dm-4 0.00 0.00 0.11 0.03 0.88 0.79 12.17 0.00 19.48 0.87 0.01 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 1.17 1.17 0.00 dm-6 0.00 0.00 12.87 20.44 1976.66 1493.27 104.17 2.77 83.01 2.73 9.08 dm-7 0.00 0.00 1.36 1.58 95.65 58.68 52.52 0.09 29.20 1.55 0.46*Example of
iostat
on a server with RAID 1+0:*
avg-cpu: %user %nice %system %iowait %steal %idle 7.55 0.25 1.01 3.35 0.00 87.84 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 42.21 31.78 18.47 59.18 8202.18 2040.94 131.91 2.07 26.65 4.02 31.20 sdc 44.93 27.92 18.96 55.88 8570.70 1978.15 140.94 2.21 29.48 4.60 34.45 sdd 45.75 28.69 14.52 55.10 8093.17 1978.16 144.66 0.21 2.95 3.94 27.42 sda 45.05 32.59 18.22 58.37 8471.04 2040.93 137.24 1.57 20.56 5.04 38.59 md1 0.00 0.00 18.17 162.73 3898.45 4013.90 43.74 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 4.89 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.07 0.26 3.30 2.13 16.85 0.04 135.54 73.73 2.38 dm-1 0.00 0.00 0.25 0.22 2.04 1.79 8.00 0.24 500.99 11.64 0.56 dm-2 0.00 0.00 15.55 150.63 2136.73 1712.31 23.16 1.77 10.66 2.93 48.76 dm-3 0.00 0.00 2.31 2.37 1756.39 2297.67 867.42 2.30 492.30 13.08 6.11So my questions are: 1) Why is there such a relatively high
%util
on the server with RAID vs the one without.
2) On the non-RAID server the %util
of the combined physical devices (sd*) are more or less the same as the combined LVM devices (dm-*). Why is that not the case for the RAID server?
3) Why does it seem like the software RAID devices (md*) are virtually idle, while the underlying physical devices (sd*) are busy? My first thought was that it might be caused by RAID checking, but /proc/mdadm
shows all good.
Edit: Apologies, I thought the question was clear, but that seems there is some confusion about it. Obviously the question is not about the difference in the %util
between drives on one server, but why the total/avg %util
value on one server is so different from the other. Hope that clarifies any misunderstanding.
Asked by Dokbua
(209 rep)
Jul 10, 2014, 02:00 PM
Last activity: May 22, 2020, 09:02 AM
Last activity: May 22, 2020, 09:02 AM