The idiomatic ZFS on Linux quickstart cheat-sheet

I’m a FreeBSD guy that has had a long, serious, and very much monogamous relationship with ZFS. I experimented with Solaris 9 to learn about ZFS, adopted OpenSolaris (2008?) back in the “aughts” for my first ZFS server, transitioned my installations over to OpenIndiana after Oracle bought Sun Microsystems out, and then at some point switched to FreeBSD, which I found to be a better designed OS once I had moved everything headless and was ready to completely bid the need for a desktop environment goodbye. But every once in a while I have to stand up a ZFS installation on Ubuntu, and then I spend a little too much time trying to remember how to do ZFS things that FreeBSD makes easy out-of-the-box. After doing that one time too many, I decided to put down my Linux-specific notes in a post for others (and myself) to reference in the future.

A fully functional ZFS setup following ZFS best practices and Linux/Ubuntu idiomatic approaches

This guide will focus mainly on the Linux sysadmin side of things; note that basic knowledge and understanding of ZFS concepts and principles is assumed, but I’ll do my best to provide a succinct summary of what we’re doing and why we’re doing it at each point.

Step 1: Installing ZFS

Unlike on FreeBSD, on Linux you need to manually install the ZFS kernel modules and userland tooling to bring in ZFS filesystem support and install the venerable zfs and zpool utils used to manage a ZFS installation. Canonical’s Ubuntu was, to my knowledge, the first to offer a pre-packaged ZFS option for Linux users (after gambling Oracle wouldn’t sue them for violating the CDDL license if they included ZFS support in their repos), and I believe it’s still the most popular Linux distribution for ZFS users, so the specific command line incantations below are for Ubuntu:

sudo apt install zfs-dkms

This will download, build, and install the ZFS kernel modules to match the version of the Linux kernel you’re currently running. Unlike most kernel modules,¹ ZFS support isn’t built or distributed as part of the base kernel that Canonical maintains for its distributions, instead you have to manually build and load the kernel module that provides ZFS support (but this is automated by the .deb installed with apt) – and this needs to be done each time you upgrade the kernel.² All this really means is that installing zfs-dkms will take longer than installing most packages – expect the installation process to look like it’s stuck and be extra patient. Installing zfs-dkms will also pull in an automatic dependency on the userspace tools, zfsutils-linux, as well as other ZFS-related libraries and dependencies.

Step 2: Setting up your zpool

This part of the process is largely going to be the same regardless of which operating system you are using and is standard ZFS fare. You’ll need to identify the drives you wish to use in your zpool (the ZFS abstraction over the physical disks, arranged in the hierarchy/topography you desire) and use sudo zpool create to create your first zpool (traditionally named tank). The only thing of note here is that you should use a stable path available to identify your disks, so instead of doing something like sudo zpool create tank mirror /dev/sda /dev/sdb to create a two-disk mirror zpool comprised of the two disks /dev/sda and /dev/sdb, you should instead use a different path to the same device, such as via /dev/disk/by-id/ or /dev/disk/by-uuid/ (going with by-id/ might make it easier to figure out which disk to use, as the contents of by-uuid/ are all GUIDs).³ On Linux, lsblk is your friend here to list disks attached to the system.

# to create a mirror of the two volumes:
sudo zpool create -o ashift=12 tank mirror /dev/disk/by-id/scsi-0VOLUME_NAME_01 /dev/disk/by-id/scsi-0VOLUME_NAME_02

And you can verify that the operation has succeeded by using `zpool list` to see the list of zpools live on the system.

Most ZFS properties and features are configurable, can be set at any time,⁴ and are inherited from parent datasets. Let’s set some default properties that are good starting values (we can always change them or override them for specific child datasets at any time):

sudo zfs set compression=lz4 tank # or =zstd if you're on the very newest versions
sudo zfs set recordsize=1m tank # i/o optimization for storing content that rarely changes

Step 3: Creating your ZFS datasets

The “zpool” is, as mentioned, the abstraction over the physical disks in your PC. It’s closest analogue is a “smart disk” comprised of multiple physical disks arranged in some specific topography with certain striping/redundancy/parity managed by the lower-level zpool command. Just as a zpool is a virtual disk, a dataset is a “smart partition” used to break up your “disk” into multiple logical storage units. Unlike real partitions, zfs datasets aren’t fixed in size, they rather straddle the line between a partition and a folder. They can be nested (like a folder), but you can’t rename a file across datasets (like a separate filesystem/partition). You can snapshot them individually (or altogether, atomically) for backup and cloning purposes (see below), and its the finest-grained level of control you have for turning on/off or re-configuring ZFS features and properties like the record size (only available to the time of creation), automatic block-level compression, etc.

While ZFS automatically creates a dataset for the root zpool (in this case we now have /tank/ mounted and ready) but it’s generally not a good practice to write directly to this dataset. Instead, you should create one or more child datasets where most content will go. We’ll just create one dataset for now:

sudo zfs create tank/data

and we can see all our datasets with `zfs list`, which shows them in their hierarchy/tree as configured.

Step 4: Configuring automatic snapshots

One of the coolest and most important ZFS features is undoubtedly its instant, zero-cost snapshotting (enabled by its copy-on-write design). This lets you freeze an image of any dataset (or all of them) as it exists at any point of time, then restore back to it (or selectively copy files/data back, as needed) at any point in the future, regardless of any changes you’ve made. (You only pay the storage cost of data added or deleted thereafter.) ZFS snapshots can be made manually with sudo zfs snap -r tank@snapshot_name_or_date (which snapshots tank and all its child datasets, instantly) or sudo zfs snap tank/data@snapshot_name (which snapshots only the one tank/data dataset). But since they’re virtually free, why not go a step further and automatically take snapshots of the data on a schedule? That way you’re protected in case of inadvertent data loss, not just when you take a snapshot before manually performing a known potentially destructive action.

On Linux, the best way to automate these snapshots is with zfs-auto-snapshot, which we’ll install with sudo apt install zfs-auto-snapshot. It’ll automatically create new snapshots every month/week/day/hour of designated zfs datasets, and delete the oldest ones too so you’re not paying the storage price forever.

After installing zfs-auto-snapshot, it’s time to choose which datasets we want to protect and how often we want to take the snapshots. Instead of using a configuration file, zfs-auto-snapshot uses zfs properties to determine which zfs datasets to include in each snapshot interval, and since zfs properties are inherited by default, if we set up snapshots for the root dataset, it’ll automatically do the same for all child datasets.

Let’s enable a daily snapshot of the root volume (and all child datasets):

sudo zfs set com.sun:auto-snapshot:daily=true tank

You can repeat this but replace daily with one of daily, monthly, weekly, or hourly to (additionally) opt-into that frequency of snapshots. To exclude a dataset (and its children) from being included in a particular schedule, you can e.g. use sudo zfs set com.sun:auto-snapshot:daily=false tank/no_backups to turn off daily snapshots for the tank/no_backups dataset (assuming it exists).

You can check if this is working (after waiting the prescribed amount of time) by checking to see what snapshots you have listed:

zfs list -t snap

Step 5: Automatic monthly ZFS scrubs on Linux with systemd

One thing that makes ZFS stand out compared to other operating systems like ext4 or even xfs is that it calculates the hash of each block of data you store on it. In the event of bitrot (the silent corruption of data already stored to disk), zfs can a) flag that a file has been silently corrupted, b) automatically restore a good copy from another disk or parity (assuming your zpool topography provides redunancy).⁵ It does this automatically every time you read a file, but what about if you have terabytes of data just sitting there, silently rotting away? How do you catch that corruption in time to fix it from a second copy on the zpool? The zpool scrub tank operation runs a low-priority scan in the background to detect and (hopefully) repair just that, but it needs to be scheduled (or run manually).

On FreeBSD, this would be accomplished with the help of a simple monthly periodic script, but on Linux (Ubuntu particularly) it’s not as simple. The idiomatic way of setting up monthly work on Ubuntu is via the use of systemd units (aka services) and timers, unfortunately this requires setting up two separate files, but the good news is that you can just copy and paste what I’ve provided below, only modifying the zpool name from tank to whatever you are using, as needed.

The first file we need to create is the actual systemd service, which is what is tasked with running running the zfs scrub operation. Copy the following to /etc/systemd/system/zfs-scrub.service:

[Unit]
Description=ZFS scrub of all pools

[Service]
Type=oneshot
ExecStart=/usr/sbin/zpool scrub tank

And copy this timer file (which specifies when zfs-scrub.service is automatically run) to /etc/systemd/system/zfs-scrub.timer:

[Unit]
Description=Run ZFS scrub service monthly

[Timer]
OnCalendar=monthly
# Run if missed while machine was off
Persistent=true
# Add some randomization to start time to prevent thundering herd
RandomizedDelaySec=30m
AccuracySec=1h

[Install]
WantedBy=timers.target

Then execute the following to get systemd to see and activate the monthly scrub service:

systemctl daemon-reload
systemctl enable --now zfs-scrub.timer

and verify that the timer has been started with the following:

systemctl list-timers zfs-scrub.timer

which should show you output along the lines of the following:

$ systemctl list-timers zfs-scrub.timer
NEXT LEFT LAST PASSED UNIT ACTIVATES
Wed 2025-10-01 00:23:31 UTC 2 weeks 6 days - - zfs-scrub.timer zfs-scrub.service

1 timers listed.

You can see if this is working by checking when the last scrub took place with zpool status:

$ zpool status
pool: tank
state: ONLINE
scan: scrub repaired 0B in 00:00:00 with 0 errors on Wed Sep 10 18:43:22 2025

And with that, you’re all set!

Largely due to licensing restriction workarounds ↩
This too is *normally* taken care of by the package manager, provided all the packages have been correctly built and uploaded to the repository by the time you try to install a newer kernel version. There’s very little you have to do manually. ↩
You could use /dev/disk/by-path/ but that means if you physically swap disks around in their cages the references would become switched around, so it’s best not to. ↩
The most notable exception to this is the ashift property set above with -o ashift=12, which is a decent value for any ssd or 4k/512e hdd. ↩
Or assuming you are using zfs set copies=2 tank (or greater). ↩

The NeoSmart Files

Recovery software and more