Offlining a ZFS pool speedily and safely as a monolithic whole?

4 votes
1 answer
4056 views
                          Much as the question says.

Suppose I want to have the equivalent of a scripted "emergency button" for my FreeNAS pool - something that I can click to run from a GUI or execute in console/SSH, which very quickly closes everything that might be reading or writing to it, unmounts the file system, and - ideally - quiesces the disks or partitions it's using. 

I don't care about errors arising to other software or remote connections by doing this, or aborting any long file transfers prematurely, I just want it to offline the pool in the fastest way that's consistent with retaining its consistency and possibly giving it a few seconds for any pending writes to complete and the pool to be in a consistent state for data purposes.

The options suggested by ZFS commands don't look promising: zpool offline only works on individual devices so one might have a race condition if writing happens while disks are removed one at a time; zpool export requires the -f option if in use and carries a warning that -f can lose data as well. One could check all open file descriptors using the pool or its devices (thousands or hundreds of thousands of them?) and manually force-close each but that could hit race conditions as it doesn't stop new fd's being created at the same time. I also should not assume all ZFS activity is mediated by a list of remote file serving daemons to be sent exit signals, because some file activity is likely to be local (cron/CLI/detached sessions).

So looking at how best to offline an entire pool safely and quickly, it looks like umount might be my best bet - it works at a file system level and can offline an entire file system speedily and as a monolithic unit, after which zpool export looks like it would then be able to actually finish and quiesce any internal activity in a safe manner without the -f option, keeping the data itself in a guaranteed consistent state. If there's raw disk activity going on (resilver or scrub) then I guess that would resume or restart when the pool was later brought back online.

But even umount doesn't seem to do it completely, because there could be iSCSI zvol targets in use as well. The data within those inherently can't be kept consistent as the server doesn't know its structure, so the remote initiators will have to do data repair as best they can when they reconnect. I'm fine with that, but I'm not sure if some kind of command to force-terminate or offline the targets is needed or best practice. (Note: force-terminating _connections_ has the same issues as closing individual fd's would.)

I'm aware that there is bound to be some kind of data loss or issue if the pool is abruptly kicked out of RW state when writes are happening. But as long as it doesn't lose consistency (at a ZFS pool and file system level) then that's fine - any in-use files/iSCSI targets being updated will have to take their chances on files/blocks being in a ZFS-consistent but data-invalid state due to going offline partway through data being written. That's unavoidable and not an issue for the question.

So what steps do I actually need to do, to offline an in-use pool as fast as possible consistent with guaranteed pool safety and consistency - and would manually umounting an in-use ZFS file system (as part of a solution) be safe or carry any risk of data damage?

**Update:** Mentioning here in case someone else finds this useful. The accepted answer states that export -f may have issues with zvols (iSCSI etc). Based on this hint, I found that the iSCSI handler used by FreeNAS can forcibly logout/terminate sessions, and has other useful subcommands which could be issued beforehand - see man ctladm. Whatever your zvols are used for there's likely to be some command to end sessions on them.)
Asked by Stilez (1311 rep)
Jan 25, 2018, 10:56 AM
Last activity: Jan 3, 2025, 12:32 PM
Offlining a ZFS pool speedily and safely as a monolithic whole?

Related Questions