Extremely poor performance for ZFS 4k randwrite on NVMe compared to XFS?
3
votes
0
answers
1192
views
I've been a fan of ZFS for a long time and I use it on my home NAS, but in testing its viability for production workloads I've found that its performance is inconceivably bad compared with XFS on the same disks. Testing on an Intel P4510 8TB disk using fio 3.21 using these settings:
fio \
--name=xfs-fio \
--size=10G \
-group_reporting \
--time_based \
--runtime=300 \
--bs=4k \
--numjobs=64 \
--rw=randwrite \
--ioengine=sync \
--directory=/mnt/fio/
Results look like this:
xfs-fio: (groupid=0, jobs=64): err= 0: pid=63: Mon Feb 1 21:46:44 2021
write: IOPS=189k, BW=738MiB/s (774MB/s)(216GiB/300056msec); 0 zone resets
clat (usec): min=2, max=2430.4k, avg=336.28, stdev=4745.39
lat (usec): min=2, max=2430.4k, avg=336.38, stdev=4745.40
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 10], 10.00th=[ 10], 20.00th=[ 11],
| 30.00th=[ 12], 40.00th=[ 14], 50.00th=[ 23], 60.00th=[ 35],
| 70.00th=[ 36], 80.00th=[ 37], 90.00th=[ 39], 95.00th=[ 40],
| 99.00th=[ 44], 99.50th=[ 8455], 99.90th=[ 66323], 99.95th=[ 70779],
| 99.99th=
bw ( KiB/s): min=95565, max=7139939, per=100.00%, avg=757400.32, stdev=21559.21, samples=38262
iops : min=23890, max=1784976, avg=189327.65, stdev=5389.87, samples=38262
lat (usec) : 4=0.03%, 10=13.41%, 20=36.22%, 50=49.56%, 100=0.12%
lat (usec) : 250=0.13%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.46%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.01%, >=2000=0.01%
cpu : usr=0.27%, sys=7.34%, ctx=793590, majf=0, minf=116620
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,56715776,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
WRITE: bw=738MiB/s (774MB/s), 738MiB/s-738MiB/s (774MB/s-774MB/s), io=216GiB (232GB), run=300056-300056msecDisk stats (read/write):
nvme7n1: ios=25/21951553, merge=0/173138, ticks=4/660308, in_queue=265520, util=21.39%real
On ZFS, with this zpool create:
# zpool create -o ashift=13 -o autoreplace=on nvme6 /dev/nvme6n1
And this volume create:
zfs create \
-o mountpoint=/mnt/nvme6 \
-o atime=off \
-o compression=lz4 \
-o dnodesize=auto \
-o primarycache=metadata \
-o recordsize=128k \
-o xattr=sa \
-o acltype=posixacl \
nvme6/test0
The results look like this:
zfs-fio: (groupid=0, jobs=64): err= 0: pid=64: Mon Feb 1 23:00:41 2021
write: IOPS=28.3k, BW=110MiB/s (116MB/s)(32.3GiB/300004msec); 0 zone resets
clat (usec): min=7, max=314789, avg=2258.78, stdev=2509.17
lat (usec): min=7, max=314790, avg=2259.28, stdev=2509.22
clat percentiles (usec):
| 1.00th=[ 52], 5.00th=[ 70], 10.00th=[ 81], 20.00th=[ 106],
| 30.00th=[ 225], 40.00th=[ 1057], 50.00th=[ 1713], 60.00th=[ 2606],
| 70.00th=[ 3458], 80.00th=[ 4146], 90.00th=[ 4948], 95.00th=[ 5669],
| 99.00th=[ 8455], 99.50th=, 99.90th=, 99.95th=,
| 99.99th=
bw ( KiB/s): min=51047, max=455592, per=100.00%, avg=113196.01, stdev=702.99, samples=38272
iops : min=12761, max=113897, avg=28297.59, stdev=175.73, samples=38272
lat (usec) : 10=0.01%, 20=0.01%, 50=0.80%, 100=16.73%, 250=12.93%
lat (usec) : 500=2.45%, 750=2.97%, 1000=3.37%
lat (msec) : 2=14.91%, 4=23.92%, 10=21.20%, 20=0.50%, 50=0.19%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.31%, sys=7.39%, ctx=11163058, majf=0, minf=32449
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,8476060,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
WRITE: bw=110MiB/s (116MB/s), 110MiB/s-110MiB/s (116MB/s-116MB/s), io=32.3GiB (34.7GB), run=300004-300004msecreal
XFS did 189k iops, ZFS did 28.3k iops - an 85% decrease - with equivalent decrease in throughput. CPUs are dual Xeon 6132, this machine's kernel is 4.15.0-62-generic, though I've seen the same effects on 5.x kernels as well.
Asked by Evan
(171 rep)
Feb 3, 2021, 08:54 PM