Sample Header Ad - 728x90

Why is the size of my IO requests being limited, to about 512K?

6 votes
1 answer
9277 views
I read /dev/sda using a 1MiB block size. Linux seems to limit the IO requests to 512KiB an average size of 512KiB. What is happening here? Is there a configuration option for this behaviour?
$ sudo dd iflag=direct if=/dev/sda bs=1M of=/dev/null status=progress
1545601024 bytes (1.5 GB, 1.4 GiB) copied, 10 s, 155 MB/s
1521+0 records in
1520+0 records out
...
While my dd command is running, rareq-sz is 512. > rareq-sz The average size (in kilobytes) of the read requests that were issued to the device. > > -- [man iostat](http://man7.org/linux/man-pages/man1/iostat.1.html)
$ iostat -d -x 3
...
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            309.00    0.00 158149.33      0.00     0.00     0.00   0.00   0.00    5.24    0.00   1.42   511.81     0.00   1.11  34.27
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-3             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
...
The kernel version is 5.1.15-300.fc30.x86_64. max_sectors_kb is 1280.
$ cd /sys/class/block/sda/queue
$ grep -H . max_sectors_kb max_hw_sectors_kb max_segments max_segment_size optimal_io_size logical_block_size chunk_sectors
max_sectors_kb:1280
max_hw_sectors_kb:32767
max_segments:168
max_segment_size:65536
optimal_io_size:0
logical_block_size:512
chunk_sectors:0
By default I use the BFQ I/O scheduler. I also tried repeating the test after echo 0 | sudo tee wbt_lat_usec. I also then tried repeating the test after echo mq-deadline|sudo tee scheduler. The results remained the same. Apart from WBT, I used the default settings for both I/O schedulers. E.g. for mq-deadline, iosched/read_expire is 500, which is equivalent to half a second. During the last test (mq-deadline, WBT disabled), I ran btrace /dev/sda. It shows all the requests were split into two unequal halves:
8,0    0     3090     5.516361551 15201  Q   R 6496256 + 2048 [dd]
  8,0    0     3091     5.516370559 15201  X   R 6496256 / 6497600 [dd]
  8,0    0     3092     5.516374414 15201  G   R 6496256 + 1344 [dd]
  8,0    0     3093     5.516376502 15201  I   R 6496256 + 1344 [dd]
  8,0    0     3094     5.516388293 15201  G   R 6497600 + 704 [dd]
  8,0    0     3095     5.516388891 15201  I   R 6497600 + 704 [dd]
  8,0    0     3096     5.516400193   733  D   R 6496256 + 1344 [kworker/0:1H]
  8,0    0     3097     5.516427886   733  D   R 6497600 + 704 [kworker/0:1H]
  8,0    0     3098     5.521033332     0  C   R 6496256 + 1344 
  8,0    0     3099     5.523001591     0  C   R 6497600 + 704
> X -- split On [software] raid or device mapper setups, an incoming i/o may straddle a device or internal zone and needs to be chopped up into smaller pieces for service. This may indicate a performance problem due to a bad setup of that raid/dm device, but may also just be part of normal boundary conditions. dm is notably bad at this and will clone lots of i/o. > > -- [man blkparse](http://man7.org/linux/man-pages/man1/blkparse.1.html) ## Things to ignore in iostat Ignore the %util number. It is broken in this version. (https://unix.stackexchange.com/questions/517132/dd-is-running-at-full-speed-but-i-only-see-20-disk-utilization-why/517219#517219) I *thought* aqu-sz is also affected [due to being based on %util](https://utcc.utoronto.ca/~cks/space/blog/linux/DiskIOStats) . Although I thought that meant it would be about three times too large here (100/34.27). Ignore the svtm number. "Warning! Do not trust this field any more. This field will be removed in a future sysstat version."
Asked by sourcejedi (53232 rep)
Jul 11, 2019, 10:51 AM
Last activity: Dec 18, 2019, 06:47 AM