How to measure the role of btrfs in SSD wear on my PC?
3
votes
1
answer
1920
views
I've used btrfs for my encrypted partitions (luks) on Samsung EVO SSDs. The disks failed faster than expected. How can I assess whether e.g. ext4 would be more reliable on these disks for my usage, or what usage is more likely to contribute to their wear level?
## Background
After about 2 year of use as root and home disk on my desktop PC, my Samsung SSD 870 EVO 500GB started failing, with hundreds of bad blocks and thousands of uncorrectable errors:
$ sudo smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.15-100.fc36.x86_64] (local build)
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 870 EVO 500GB
Firmware Version: SVT01B6Q
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 19378
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 69
177 Wear_Leveling_Count 0x0013 098 098 000 Pre-fail Always - 44
183 Runtime_Bad_Block 0x0013 065 065 010 Pre-fail Always - 200
187 Uncorrectable_Error_Cnt 0x0032 099 099 000 Old_age Always - 2696
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 59
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 83504703737
The disk was rather busy, but well under the [warranty limits](https://semiconductor.samsung.com/consumer-storage/support/warranty/) of 5 years or 300 TB TBW.
Its predecessor was a Samsung 850 EVO 250 GB and was in a similar state after 5 years of use. Maybe the new disk was just worse than the previous, but I started wondering whether there's a common factor.
One thing they shared is that I've installed Fedora on them, and recently Fedora started using btrfs by default (at least for luks filesystems), instead of ext4 (I believe the previous disk had ext4 most of its life). Fedora 38 for example creates this layout by default:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 465,8G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 464,2G 0 part
└─luks- 253:0 0 464,2G 0 crypt /home
/
$ mount | grep luks
/dev/mapper/luks- on / type btrfs (rw,relatime,seclabel,compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/root)
/dev/mapper/luks- on /home type btrfs (rw,relatime,seclabel,compress=zstd:1,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/home)
$ mount | grep sda
/dev/sda2 on /boot type ext4 (rw,relatime,seclabel)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
A few years back it was similar but without [compression](https://btrfs.readthedocs.io/en/latest/Compression.html) and some other btrfs parameters.
The btrfs documentation, after a discussion of COW (copy on write), contains a number of [warnings on SSDs](https://btrfs.readthedocs.io/en/latest/Hardware.html#solid-state-drives-ssd) :
> Writing “too much” distinct data (e.g. encrypted) may render the internal deduplication ineffective and lead to a lot of rewrites and increased wear of the memory cells.
> It’s not possible to reliably determine the expected lifetime of an SSD due to lack of information about how it works or due to lack of reliable stats provided by the device.
> Only users who consume 50 to 100% of the SSD’s actual lifetime writes need to be concerned by the write amplification of btrfs DUP metadata.
So it sounds like I've reached over 50 % of the disks' actual lifetime despite my writes being an order of magnitude less than what promised by the warranty, and I *should* worry about SSD wear with btrfs.
### Usage patterns
I now notice that you're supposed to run a [btrfs scrub](https://btrfs.readthedocs.io/en/latest/Scrub.html) monthly and also after events like power outages:
> The user is supposed to run it manually or via a periodic system service. The recommended period is a month but could be less.
I never did that. The most frequently accessed files get checked and repaired anyway at some point. My files which got corrupted tended to be on the older and less accessed side.
(It would be strange if this mattered though. Another SSD of mine with luks+lvm+ext4 has over 10k powercycles, according to SMART, and no issues whatsoever.)
## Possible ideas
It's common for people to advise against running databases or other similarly write-intensive loads on btrfs.
I don't know how true/current said advice is, but I don't run any database on my PC. On the other hand, the previous disk's most devastating failures where in a region used by Thunderbird for its message storage (which was a few GB worth of mbox files; I've since switched to Maildir). I wonder whether there are some database-like loads on my PC which I could do without, or move to another filesystem. (I've already disabled baloo.)
Is there some benchmarking tool or utility to tell which applications are producing most writes (or most potential wear) to the filesystem?
Alternatively, is there a benchmarking tool or utility to stress test a certain filesystem-disk combination and see the impact on the disk's self-reported wear in the various cases?
Asked by Nemo
(938 rep)
Jun 5, 2023, 11:43 AM
Last activity: May 8, 2024, 10:45 AM
Last activity: May 8, 2024, 10:45 AM