Why is the size of my IO requests being limited, to about 512K?
6
votes
1
answer
9277
views
I read
/dev/sda
using a 1MiB block size. Linux seems to limit the IO requests to 512KiB an average size of 512KiB. What is happening here? Is there a configuration option for this behaviour?
$ sudo dd iflag=direct if=/dev/sda bs=1M of=/dev/null status=progress
1545601024 bytes (1.5 GB, 1.4 GiB) copied, 10 s, 155 MB/s
1521+0 records in
1520+0 records out
...
While my dd
command is running, rareq-sz
is 512.
> rareq-sz
The average size (in kilobytes) of the read requests that were issued to the device.
>
> -- [man iostat
](http://man7.org/linux/man-pages/man1/iostat.1.html)
$ iostat -d -x 3
...
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 309.00 0.00 158149.33 0.00 0.00 0.00 0.00 0.00 5.24 0.00 1.42 511.81 0.00 1.11 34.27
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
The kernel version is 5.1.15-300.fc30.x86_64
. max_sectors_kb
is 1280.
$ cd /sys/class/block/sda/queue
$ grep -H . max_sectors_kb max_hw_sectors_kb max_segments max_segment_size optimal_io_size logical_block_size chunk_sectors
max_sectors_kb:1280
max_hw_sectors_kb:32767
max_segments:168
max_segment_size:65536
optimal_io_size:0
logical_block_size:512
chunk_sectors:0
By default I use the BFQ I/O scheduler. I also tried repeating the test after echo 0 | sudo tee wbt_lat_usec
. I also then tried repeating the test after echo mq-deadline|sudo tee scheduler
. The results remained the same.
Apart from WBT, I used the default settings for both I/O schedulers. E.g. for mq-deadline
, iosched/read_expire
is 500, which is equivalent to half a second.
During the last test (mq-deadline, WBT disabled), I ran btrace /dev/sda
. It shows all the requests were split into two unequal halves:
8,0 0 3090 5.516361551 15201 Q R 6496256 + 2048 [dd]
8,0 0 3091 5.516370559 15201 X R 6496256 / 6497600 [dd]
8,0 0 3092 5.516374414 15201 G R 6496256 + 1344 [dd]
8,0 0 3093 5.516376502 15201 I R 6496256 + 1344 [dd]
8,0 0 3094 5.516388293 15201 G R 6497600 + 704 [dd]
8,0 0 3095 5.516388891 15201 I R 6497600 + 704 [dd]
8,0 0 3096 5.516400193 733 D R 6496256 + 1344 [kworker/0:1H]
8,0 0 3097 5.516427886 733 D R 6497600 + 704 [kworker/0:1H]
8,0 0 3098 5.521033332 0 C R 6496256 + 1344
8,0 0 3099 5.523001591 0 C R 6497600 + 704
> X -- split On [software] raid or device mapper setups, an incoming i/o may straddle a device or internal zone and needs to be chopped up into smaller
pieces for service. This may indicate a performance problem due to a bad setup of that raid/dm device, but may also just be part of
normal boundary conditions. dm is notably bad at this and will clone lots of i/o.
>
> -- [man blkparse
](http://man7.org/linux/man-pages/man1/blkparse.1.html)
## Things to ignore in iostat
Ignore the %util
number. It is broken in this version. (https://unix.stackexchange.com/questions/517132/dd-is-running-at-full-speed-but-i-only-see-20-disk-utilization-why/517219#517219)
I *thought* aqu-sz
is also affected [due to being based on %util](https://utcc.utoronto.ca/~cks/space/blog/linux/DiskIOStats) . Although I thought that meant it would be about three times too large here (100/34.27).
Ignore the svtm
number. "Warning! Do not trust this field any more. This field will be removed in a future sysstat version."
Asked by sourcejedi
(53232 rep)
Jul 11, 2019, 10:51 AM
Last activity: Dec 18, 2019, 06:47 AM
Last activity: Dec 18, 2019, 06:47 AM