Imagine that we have two very large directories (tens or hundreds of GB ) that we want to keep identical in two different systems. For example, we could think of a directory with professional documents that we want to synchronize between the home and work PC or in a directory with family photos that we want to synchronize between your home PC and your mother’s PC.

The idea would be that after modifying the files in one of the PCs, we could copy only the changes to the other PC (which in principle would be little data volume) without having to copy all the files every time (remember that we speak of many GB of data ).

If both systems are connected by network (local, Internet), although there may be other different solutions, my preferred solution would be to use rsync ( Backups with rsync ), available in Linux, Mac OS X and Windows / Cygwin.

We could also choose to store the data in ” The Cloud “, with Dropbox, Google Drive, Skydrive, iCloud, etc. Thus, if we modify some files in a system, the modified files are updated in the cloud and, later, in the other systems. But, let’s say we handle 100GB of data. In addition to needing to have at least such space contracted in our cloud storage provider, it turns out that to initially upload 100GB to the cloud with a 1Mbps Internet connection up we would need several days:

100,000,000,000 bytes / (1,000,000 bit / s / 8 bytes / bit) = 800,000 s = 9.26 days

If we do not have rsync, or storage in the cloud, and perhaps no network access, there would be those who would copy all the files to an external hard drive (hundreds of GB) or maybe who would keep track of which files have changed to copy only those to A USB stick. And later still have to re-copy the files in the target system …

Well, it turns out to rdiffdirbe an excellent solution to this problem. rdiffdirIs written in Python, is part of the application duplicity (to make directory backups), is based on rdiffand uses the library librsync. With rdiffdirwe can easily create a file that contains only the changes with the rsync algorithm and apply those changes to the system that is out of phase.

Let’s use for example the case of the directory with professional documents that we want to synchronize between the home PC and the work PC to understand how we would do it with rdiffdir:

We start from a scenario in which the directories are perfectly synchronized in both systems. We just got to work, and the directory we want to sync contains the following:

Work ~ $ find directory /
Directory / subdirectory1
Directory / subdirectory1 / A.txt file
Directory / subdirectory2
Directory / subdirectory2 / Btf file

Work ~ $ cat directory / subdirectory1 / A.txt file
Test A

Work ~ $ cat directory / subdirectory2 / Btf file
Test B

Before starting to work, we generate a file with the checksums of the blocks of the files and with the information of the directories with rdiffdir:

Work ~ $ rdiffdir signature directory signature _ $ (date +% y% m% d) .rdiffdir

And we started to work: edit files, add files and directories, delete files …

Work ~ $ echo "New line" >> directory / subdirectory1 / A.txt file

Work ~ $ rm -rf directory / subdirectory2 /

Work ~ $ mkdir directory / subdirectory3

Work ~ $ echo "Test C"> directory / subdirectory3 / Ctf file

And at the end of the day, we copy the signature file and generate a file with the changes:

Work ~ $ rdiffdir delta signature_130208.rdiff directory changes _ $ (date +% y% m% d) .rdiffdir

We go home, copy the change file, and apply it on the directory:

Home ~ $ rdiffdir patch directory changes_130208.rdiffdir

And we verify that, indeed, we have the changes of the day:

Home ~ $ find directory
Directory / subdirectory1
Directory / subdirectory1 / A.txt file
Directory / subdirectory3
Directory / subdirectory3 / Ctf file

Home ~ $ cat directory / subdirectory1 / A.txt file
Test A
New line

Home ~ $ cat directory / subdirectory3 / file.txt
Test C

If we plan to make changes at home, we would have to repeat the process: generate a signature file before starting, and one of changes at the end.

But, see if it will cost more the collar than the dog! What does a signature file contain from a huge directory? What takes to generate it? Well I just tried a directory with 22000 files, 1400 directories and 10GB and took about 6 minutes and takes up about 130MB: A suitable size to be able to carry it on a USB stick.

Duplicity is available on some Linux distributions (Fedora, Ubuntu, Debian). In others, we can compile it. We can also use it under Windows under Cygwin. You have to install the packages beforehand pythonlibrsync1and librsync-develthen, you just have to download the file with the sources, unzip it, enter it from the Cygwin shell and execute:

Python install

Am a tech geek.. Do you wanna know more about me..? My contents will do tell you.

Pin It