What is ZFS? The File System other File Systems Aspire to

The file system to end all file systems is actually much more than that

What is OPENZFS File System
iXsystems, Inc.

File systems are what make organized data storage possible, and quite a few have appeared over the years: FAT32, exFAT, NTFS, Ext2/3/4, Btrfs, HFS, APFS, etc. However, there’s one file system that outshines them all–ZFS. But what exactly is ZFS?

What is ZFS?

In my mind’s eye, I see a slightly tipsy Sun Microsystems (the creators) software engineer doing their Inspector Clouseau impression when the idea for this file system to end all files systems was brokered, saying, “We have zee file sees-tem!” If you’re seeking a less fanciful explanation, how about ZFS being 128 bits instead of the usual 64 bits, making it capable of handling “Z”etabytes (1 billion terabytes) of storage. Zee where I’m going here?

But while its capacity is huge, ZFS is equally about independent operation and data integrity. It acts as its own volume manager and RAID controller and offers file server functionality via Samba (SMB), NFS, iSCSI, etc. Features such as copy on write and Merkle parity trees protect from just about every conceivable type of data hazard. If your data is going to be used to navigate a rocket to Mars, ZFS is what you want.

Major Features of ZFS

The ZFS volume manager handles pools (zpools) of any type of storage device, normally SSDs and hard drives. It can tier them (fastest to slowest), and allows multiple mount points, or in ZFS parlance–data sets. Data sets are functionally similar to user shared folders in that the have their own settings for compression, security, access, and more.

Basic ZFS operational structure
Delphix

Here’s an example. You have 50TB of storage and you create data Sets A, B, and C. These all share that same 50TB so if you write 12TB of data to set A, sets B and C will show the same 38TB remaining. Data set A can be compressed and encrypted and available to only a few people, while data sets B and C are uncompressed and available to everyone. Again, this all occurs across in the same zpool.

Fletcher checksums or SSH 2 hashes are used to assure data security and are stored in Merkle trees, meaning there are several checks for integrity. In case of an error, the data is corrected on transmission then where it was stored as well.

Copy on write is an important tool in the ZFS’s pantheon of data protection features. What it actually does is write new data to an empty location, leaving the old data which might be deleted intact, which is then overwritten at some future date. This means there’s a copy of the data still intact should a write fail for any reason: drive issues, power failure, DMA errors, etc. Not until a write is absolutely completed is the data written marked as existent.

Copy on Write in ZFS
Delphix

Snapshots are a side benefit of copy-on-write. As old data isn’t overwritten, it can be marked for retention via a point in time description of the blocks in use, aka snapshot. Snapshots can be copied offsite with all the data to function as true backups.

Hot spares (extra drives on standby) are supported. These are swapped in for a failed or failing drive as temporary replacement until the actual replacement can be re-silvered. That is, rebuilt or filled with the data the failed/failing drive contained.

De-duplication. This term might give you the idea that duplicates are being removed, but it actually means they aren’t being written. This is another advantage of creating and saving hashes for every block of data. If the system finds a familiar hash, it compares the data bit by bit, then throws away the incoming data if it’s truly the same.

Purposeful duplication. You can specify multiple copies be written in place of or along with RAID. These copies are distributed across multiple physical devices or virtual devices (Vdevs) such as a RAIDZ array.

RAIDZ is more or less a RAID 5 implementation that can survive one or two drive failures. Mirroring is also supported.

ZFS also lets you replace smaller drives with larger drives and take advantage of the extra capacity. As long as you have one free space for the drive (bay, SATA port, etc.) the contents of the smaller drive are copied over, and then you remove the smaller drive. This is possible because of the dynamic stripe sizing used by the system.

ZFS can be administrated using only a few simple, easily understood commands. There are, of course, numerous options.

It would take a dozen articles to cover ZFS thoroughly, but those are some of the major highlights. To reiterate:

The Upsides of ZFS

Here are a few of the many upsides of ZFS.

  • 128 bits, scales to zetabytes
  • Functions as an advanced volume manager and file server
  • Tiered caching
  • Extraordinary data integrity features
  • Hot spares
  • Expandable pools

The Downsides of ZFS

As with everything, there are a few downsides as well.

  • Processor and memory intensive
  • Overkill for most end users
  • Write performance not ideal for sustained high-bandwidth tasks

When to Use ZFS

ZFS, as OpenZFS, is available for everything from FreeBSD to several flavors of Linux, to macOS and Windows. It’s most salient deployment in the consumer market is in FreeNAS. ZFS can be used with one or more drives, but it’s capabilities are best suited for multi-drive scenarios and large data sets.

The relatively slow write performance (depending on setup) might make it less suitable for bandwidth-intensive applications such as high-resolution video recording or editing. If you need a place to store that video safely after it’s edited, then you want ZFS.