Look no further: the solution for your messy backups is here!

Sorry for the sensational title, but I’m quite happy since discovering bup saved me lots of coding. I was planning to build a file-level deduplication backup system similar-to/based-on my WolfDedup, but bup is a lot better than anything I could code.

What is bup? It’s a software for backing up stuff.

It uses git packfile format and it’s quite fast, with parts of it written in C for speed (the rest is Python). It has an awesome support for big files like VM images or database dumps since it uses a sliding checksum similar to rsync to find differences in files and it saves only the different parts.
From what I’ve seen it also handles beautifully files that are moved around (think renaming a directory: all the files inside now have a different “name” since the path changed: bup doesn’t make a copy of their content, doubling the space required for a backup).

After some basic tests, I made a more serious test on some of my backup directories. I made a simple import-to-bup.sh script that cycled over the directories of my previous backups (simple rsync in the form myhost-YYYY-MM-DD) and backed them up with bup:

#!/bin/bash

export BUP_DIR=/storage/bup-storage

for dir in myhost* do
    mv -v $dir $HOST-tmpbup
    bup index $HOST-tmpbup
    bup save -n $HOST $HOST-tmpbup
    mv -v $HOST-tmpbup $dir
done
# BUP_DIR=/storage/bup-storage bup init
Initialized empty Git repository in /storage/bup-storage/
# ./import-to-bup.sh
`myhost-2011-10-08' -> `myhost-tmpbup'
Indexing: 59300, done.
Reading index: 59300, done.
bloom: creating from 1 file (147913 objects).                    
bloom: adding 1 file (148745 objects).                           
bloom: creating from 3 files (454326 objects).                   
bloom: adding 1 file (149533 objects).                             
bloom: adding 1 file (150231 objects).                             
bloom: creating from 6 files (909414 objects).                     
bloom: adding 1 file (142687 objects).                             
bloom: adding 1 file (144142 objects).                              
bloom: adding 1 file (147992 objects).                              
bloom: adding 1 file (163257 objects).                              
bloom: adding 1 file (141220 objects).                               
bloom: creating from 12 files (1795038 objects).                     
bloom: adding 1 file (144708 objects).                               
bloom: adding 1 file (145323 objects).                               
bloom: adding 1 file (152774 objects).                               
bloom: adding 1 file (146545 objects).                               
bloom: adding 1 file (131424 objects).                               
bloom: adding 1 file (176892 objects).                               
bloom: adding 1 file (164957 objects).                              
bloom: adding 1 file (165647 objects).                              
bloom: adding 1 file (184207 objects).                              
Saving: 100.00% (48863124/48863124k, 59300/59300 files), done.
bloom: adding 1 file (102730 objects).
`myhost-tmpbup' -> `myhost-2011-10-08'   
`myhost-2011-12-19' -> `myhost-tmpbup'
Indexing: 43254, done.
bup: merging indexes (69447/69447), done.
Reading index: 69303, done.
bloom: creating from 23 files (3457649 objects).                  
bloom: adding 1 file (132517 objects).                            
Saving: 100.00% (8421774/8421774k, 69303/69303 files), done.      
bloom: adding 1 file (91784 objects).
`myhost-tmpbup' -> `myhost-2011-12-19'  
`myhost-2012-04-30' -> `myhost-tmpbup'
Indexing: 32085, done.
bup: merging indexes (89381/89381), done.
Reading index: 89369, done.
Saving: 100.00% (27344424/27344424k, 89369/89369 files), done.      
bloom: adding 1 file (125398 objects).
`myhost-tmpbup' -> `myhost-2012-04-30'   
`myhost-2012-12-03' -> `myhost-tmpbup'
Indexing: 64464, done.
bup: merging indexes (122479/122479), done.
eading index: 122298, done.
bloom: adding 1 file (134894 objects).                              
bloom: adding 1 file (132885 objects).                              
bloom: adding 1 file (132791 objects).                               
bloom: adding 1 file (132578 objects).                               
bloom: adding 1 file (132852 objects).                               
bloom: adding 1 file (132776 objects).                               
bloom: adding 1 file (131897 objects).                               
bloom: adding 1 file (132821 objects).                               
bloom: adding 1 file (200000 objects).                                
bloom: adding 1 file (200000 objects).                                
bloom: adding 1 file (200000 objects).                                
bloom: adding 1 file (171507 objects).                                
bloom: adding 1 file (134164 objects).                                
bloom: adding 1 file (132233 objects).                                
Saving: 100.00% (48117055/48117055k, 122298/122298 files), done.     
bloom: adding 1 file (44040 objects).
`myhost-tmpbup' -> `myhost-2012-12-03'
#

Check the space used:

# du -sh myhost-201*
47G     myhost-2011-10-08
8,1G    myhost-2011-12-19
27G     myhost-2012-04-30
46G     myhost-2012-12-03

# du -sh /storage/bup-storage
35G     /storage/bup-storage/

bup provides a simple text mode ftp-client interface to access your files, or you can fuse-mount the backups for even easier access.

The software is very easy to install on the latest Ubuntu and quite easy to install on CentOS 6 (you’ll need EPEL and rpmforge repos), so go check it out!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s