I wanted to have an automated backup system for my PC, normally I use blu-ray RW each week to copy my stuff to, this is annoying as it requires me to do something.. something that I can automate.

I recently brought a 880gb USB2 drive for a real cheep price. I did not want another power brick hence the USB powered also theres no need for speed for this project.

One need is that the drive stores a lot of my backups for a long time. Given that my data been backed up will only change a little it gave way for me to think about data deduplication currently ZFS is the only stable free-sih (CDDL)  file system that supports data deduplication. Also currently the CDDL conflicts with the GPL license so no kernel support for ZFS. Instead ZFS is supported by FUSE.

Normally I avoid fuse, the whole idea of a file system in user space annoys me but anyway... For the sole purpose of backups we shall let it be.

Lets get ZFS :

# emerge -av sys-fs/zfs-fuse ( you may need to unmask this)

Lets get rid of any partitions, as I will use the whole disk :

# fdisk /dev/sdb
hit 'd' then 'w'

I also wanted the system to leave my USB drive alone, eg don't auto anything! To achieve this I made use of HAL.

First lets find out HAL calls the USB2 drive I have :

# hal-find-by-property --key block.device --string /dev/sdb

For me this returned :


I want to match a string based on file system type so I selected the HAL object


Lets query the HAL object for a string we can match :

# lshal --show /org/freedesktop/Hal/devices/volume_part_1_size_888183152128

I selected the string "volume.fstype" as I do not want HAL to touch any ZFS file systems, you could pick other strings its up to you and your needs.

Now lets make a local HAL policy for this :

# nano /etc/hal/fdi/policy/10-zfs-fuse.fdi

Enter the following into the file :

"<?xml version="1.0" encoding="UTF-8"?> <deviceinfo version="0.2"> <device> <match key="volume.fstype" string="zfs"> <merge key="volume.ignore" type="bool">true</merge> </match> </device> </deviceinfo>"

Restart the HAL daemon, after saving the file :

# /etc/init.d/hald restart

As we are using a USB drive theres a potential that it may receive random dev nodes say sdc or sdb. Lets use udev to abstract that away to something we can always use as a device path.

I created the following local udev rule, that will create a symlink to the real device node to my known set path, it will also halt the processing of other udev rules if they match.

# nano /etc/udev/rules.d/10-local.rules

I put the following in this file :

"KERNEL=="sd*", SUBSYSTEMS=="block", ATTR{size}=="1734732719" SYMLINK:="dsk/zfs/backups""

In short, match any sd device that uses block mode and matches the size, then symlink to /dev/dsk/zfs/backups and stop processing udev rules. I picked size as a string to match, but you can use any string from the parent and strings from the child objects if you need to be more exact in your match.

I used the following command to find a set of strings to match to :

# udevadm info -a -p /sys/block/sdb/

Now that the USB disk will do what I want, lets create a ZFS filesystem.

To create a top level ZFS file system and initial pool :

# zfs create backups /dev/dsk/zfs/backups

( in this example "backups" is the zpool name and "/dev/dsk/zfs/backups" is the dev node) enable data dedup

# zfs set dedup=on backups

To remove the USB drive and unmount the ZFS file system we must export it first :

# zpool export backups

To reattach the usb storage we need to import, as we are not using a standard device node path we need to use the 'd' option.

# zpool import -d /dev/dsk/zfs/ backups

See the respective man pages for 'zfs' and 'zpool' for information on the above commands.

I will soon write a script that will take care of my backups and the use of ZFS.