Nearline backup, using cheep storage and ZFS via FUSE
I wanted to have an automated backup system for my PC, normally I use blu-ray RW each week to copy my stuff to, this is annoying as it requires me to do something.. something that I can automate.
I recently brought a 880gb USB2 drive for a real cheep price. I did not want another power brick hence the USB powered also theres no need for speed for this project.
One need is that the drive stores a lot of my backups for a long time. Given that my data been backed up will only change a little it gave way for me to think about data deduplication currently ZFS is the only stable free-sih (CDDL) file system that supports data deduplication. Also currently the CDDL conflicts with the GPL license so no kernel support for ZFS. Instead ZFS is supported by FUSE.
Normally I avoid fuse, the whole idea of a file system in user space annoys me but anyway... For the sole purpose of backups we shall let it be.
Lets get ZFS :
# emerge -av sys-fs/zfs-fuse ( you may need to unmask this)
Lets get rid of any partitions, as I will use the whole disk :
# fdisk /dev/sdb
hit 'd' then 'w'
I also wanted the system to leave my USB drive alone, eg don't auto anything! To achieve this I made use of HAL.
First lets find out HAL calls the USB2 drive I have :
# hal-find-by-property --key block.device --string /dev/sdb
For me this returned :
/org/freedesktop/Hal/devices/volume_part_1_size_888183152128
/org/freedesktop/Hal/devices/storage_serial_Seagate_FreeAgent_Go_2GE94FPL_0_0
I want to match a string based on file system type so I selected the HAL object
/org/freedesktop/Hal/devices/volume_part_1_size_888183152128
Lets query the HAL object for a string we can match :
# lshal --show /org/freedesktop/Hal/devices/volume_part_1_size_888183152128
I selected the string "volume.fstype" as I do not want HAL to touch any ZFS file systems, you could pick other strings its up to you and your needs.
Now lets make a local HAL policy for this :
# nano /etc/hal/fdi/policy/10-zfs-fuse.fdi
Enter the following into the file :
"<?xml version="1.0" encoding="UTF-8"?> <deviceinfo version="0.2"> <device> <match key="volume.fstype" string="zfs"> <merge key="volume.ignore" type="bool">true</merge> </match> </device> </deviceinfo>"
Restart the HAL daemon, after saving the file :
# /etc/init.d/hald restart
As we are using a USB drive theres a potential that it may receive random dev nodes say sdc or sdb. Lets use udev to abstract that away to something we can always use as a device path.
I created the following local udev rule, that will create a symlink to the real device node to my known set path, it will also halt the processing of other udev rules if they match.
# nano /etc/udev/rules.d/10-local.rules
I put the following in this file :
"KERNEL=="sd*", SUBSYSTEMS=="block", ATTR{size}=="1734732719" SYMLINK:="dsk/zfs/backups""
In short, match any sd device that uses block mode and matches the size, then symlink to /dev/dsk/zfs/backups and stop processing udev rules. I picked size as a string to match, but you can use any string from the parent and strings from the child objects if you need to be more exact in your match.
I used the following command to find a set of strings to match to :
# udevadm info -a -p /sys/block/sdb/
Now that the USB disk will do what I want, lets create a ZFS filesystem.
To create a top level ZFS file system and initial pool :
# zfs create backups /dev/dsk/zfs/backups
( in this example "backups" is the zpool name and "/dev/dsk/zfs/backups" is the dev node) enable data dedup
# zfs set dedup=on backups
To remove the USB drive and unmount the ZFS file system we must export it first :
# zpool export backups
To reattach the usb storage we need to import, as we are not using a standard device node path we need to use the 'd' option.
# zpool import -d /dev/dsk/zfs/ backups
See the respective man pages for 'zfs' and 'zpool' for information on the above commands.
I will soon write a script that will take care of my backups and the use of ZFS.