Saturday, August 29, 2009

Expanding a ZFS pool

As some users of ZFS come asks - is it possible to expand a pool?

The standard answer per zfs manual is to add more devices or vdevs (raidz, mirrors). As this gives you more space, it does comes with a backside and that is the need for more harddisk that again consumes more space. Space in commidity hardware is not something we have plentyfull of so the best approach would be to replace the existing harddrives.

The zfs documentation vaguely mentiones that you can do a one by one replacement of the drives within a vdev, thus indirectly expanding.

Earlier I have never been in need for expanding any vdev since upgrading space have come hand in hand with upgrade the entire server. But now the data growth have superseded the lifetime of the server, I had to solve this issue.

Server I use for storage today have two pools, each with one single vdev in raidz. The first goal was to upgrade the biggest pool. This consisted of 6 x 750GB harddrives, new drives was 1.5TB.

The clue is that the zpool will scale the vdev by the smallest device within its entity. And what one need to do is to replace the disks one by one.

To be absolutely sure on the method and the result I did first a quick test on a small install. I create a instance in vmware with 6 virtual disks.

da1: 1024MB (2097152 512 byte sectors: 64H 32S/T 1024C)
da2: 1024MB (2097152 512 byte sectors: 64H 32S/T 1024C)
da3: 1024MB (2097152 512 byte sectors: 64H 32S/T 1024C)
da4: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)
da5: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)
da6: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)

# zpool create test raidz da{1,2,3}
# zpool status
pool: test
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0

errors: No known data errors
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 2.98G 141K 2.98G 0% ONLINE -

As you can see the pool test is created as usual with 3 x 1GB disks and the total unconsumed space is approx 3 GB.

Now we continue to replace all the disks with 8 GB's.

# zpool replace test da1 da4
# zpool replace test da2 da5
# zpool replace test da3 da6

Note: you must wait for the resilver process to complete for each drive until you continue with the next one.

While replacing zpool status will report this:
# zpool status
pool: test
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sat Aug 29 19:52:20 2009
config:

NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da4 ONLINE 0 0 0 30.5K resilvered
da5 ONLINE 0 0 0 50K resilvered
da3 ONLINE 0 0 0 29.5K resilvered

errors: No known data errors

Now, the resilver process is very quick, since there where no data stored in the pool. This is a major benefit with ZFS, it only cares for the blocks that actually have data. In a hardware raid implementation, a replacement always results in a full XOR recalculation.

There is still one drive left to replace, you can see that the total space is still unaffected by the two first drives been replaced.

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 2.98G 165K 2.98G 0% ONLINE -

After the last drive been replaced the capacity have increased:

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 24.0G 168K 24.0G 0% ONLINE -

In comparison with hardware and dedicated solutions (SAN) the approach is exactly the same. There are actually not many implementations you can expand a raid in this manner. Most solutions are based on moving a volume on a raid to a bigger raid or spanning the volume over several raid sets.

Many have complained ZFS lack of been able to resize raids, but they need to remember that very few implementions can do this. Resizing requires heavy recalculations and a major logistical works since the data is allready stored in a specific pattern with checksums.

Again, zfs shows us the future of filesystems. The flexibillity, ease of use and end to end intergrity puts most other fs's in the dust.

Note for FreeBSD: The current tested version, zfs v13. You most reboot the server when all replacements have been done in order for the zfs to detect the increased capacity. If there are other better methods, please comment.

VoWifi leaking IMSI

This is mostly a copy of the working group two blog I worked for when the research was done into the fields of imsi leakage when using voice...