Archive BackupPC host to B2

Find a file

Florian Charlaix 5461dec55c Use account_id instead of access_key for bucket names and trim it trim to 50		2022-10-19 13:38:03 +02:00
.gitignore	Use account_id instead of access_key for bucket names and trim it trim to 50	2022-10-19 13:38:03 +02:00
backup-manager.py	Use account_id instead of access_key for bucket names and trim it trim to 50	2022-10-19 13:38:03 +02:00
BackupPC_archiveHost_b2.py	Use account_id instead of access_key for bucket names and trim it trim to 50	2022-10-19 13:38:03 +02:00
BackupPC_archiveStart	autostart: Use ArchiveSplit from BackupPC's conf	2011-10-02 11:48:50 -04:00
COPYRIGHT	Clarifying license stuff.	2011-10-02 11:36:20 -04:00
Dockerfile	Rewrite a lot of BackupPC_archiveHost_s3 to b2	2022-04-08 18:20:55 +02:00
README.markdown	Rewrite a lot of BackupPC_archiveHost_s3 to b2	2022-04-08 18:20:55 +02:00
requirements.txt	Rewrite a lot of BackupPC_archiveHost_s3 to b2	2022-04-08 18:20:55 +02:00
secrets.py.orig	Rewrite a lot of BackupPC_archiveHost_s3 to b2	2022-04-08 18:20:55 +02:00

README.markdown

BackupPC archive host B2

Fork from rtucker/backuppc-archive-s3

This is a Python script that acts as an interface between BackupPC and Backblaze B2. It uses BackupPC's archive function to extract a tarball and split it into chunks, like the normal archive function. Then, the chunks are encrypted using gpg and transmitted to B2 using B2 Python SDK.

Installation

Install the prerequisites

You will need Python3.9, pip requirements, and a working BackupPC installation.

Download and install this script

Something like this seems like a good idea:
  cd /usr/local/src/
  git clone git://github.com/rtucker/backuppc-archive-s3.git
Then create a link from /usr/share/backuppc/bin/ to here:
  ln -s /usr/local/src/backuppc-archive-s3/BackupPC_archiveHost_s3 /usr/share/backuppc/bin/

Configure this script

Create a file in this directory called secrets.py, based upon the secrets.py.orig file. It should have your AWS Access and Shared keys, a passphrase that will be used to encrypt the tarballs.
  accesskey = 'ASDIASDVINASDVASsvblahblah'
  sharedkey = '889rv98rv8fmasmvasdvsdvasdv'
  gpgsymmetrickey = 'hunter2'
Previously, you could use a speedfile to change the permitted upstream bandwidth on the fly. This was cantankerous and was ultimately dropped in September 2011. See tag stable-20110610 if you need this functionality (and open an issue to let me know!), or take a look at The Wonder Shaper to limit throughput on a system-wide level.

Configure BackupPC

From the BackupPC configuration interface, go to Edit Hosts and add a new host, archiveS3, which looks like the existing archive host. Save this, select the archives3 host, and then Edit Config for that host.

Change the settings on each tab as follows:
Xfer
 XferMethod:         archive
 ArchiveDest:        /var/lib/backuppc/archives3
 ArchiveComp:        bzip2
 ArchiveSplit:       500
 ArchiveClientCmd:   $Installdir/bin/BackupPC_archiveHost_s3 $tarCreatePath $splitpath $parpath $host $backupnumber $compression $compext $splitsize $archiveloc $parfile *
Backup Settings
 ClientTimeout:      720000
That should be just about it. Note that ArchiveDest is where it will stage the tarballs before it uploads them; this must have enough disk space for your archive! ArchiveSplit is the size of each tar file, in megabytes; you may want to adjust this for your needs. Also, the ArchiveClientCmd is the default, except with the _s3 added.

Use it

Go to the main page for the archives3 host and click Start Archive. To start with, just tick the box next to the smallest backup you have, then Archive selected hosts. Go with the defaults (which look suspiciously like what you set on the Xfer tab, do they not? :-) and then Start the Archive.

Watch syslog and hopefully everything will work. If it does not, there will be decent debugging output in the archive job's log, viewable via the BackupPC console.

backup-manager.py

There is a companion script, backup-manager.py, that can be used to see what's on S3. Run it with no arguments to get a listing of backups and their ages, or use the --help argument to see what it can do.

The "crown jewel" of this whole system is the script command, which produces a script that can be used to restore a backup. It uses S3's Query String Request Authentication mechanism to generate temporary URLs to download each file required to restore a backup.

Each night, from cron, I run a script:

    #!/bin/sh
    BACKUPMGR=/path/to/backup-manager.py

    # Delete all backups older than 30 days.
    $BACKUPMGR delete --age=30

    # Create restore scripts, valid for one week, for all of my computers
    cd /home/rtucker/Dropbox/RestoreScripts/
    $BACKUPMGR --expire=604800 --host=gandalf script > restore_gandalf.sh
    $BACKUPMGR --expire=604800 --host=witte script > restore_witte.sh
    # etc, etc

    # Output a list of what's on the server
    $BACKUPMGR

The output of this is mailed to me, so I always know what's going on!

FAQs

BackupPC is written in Perl. Why is this thing written in Python?

I know Python much better than I know Perl, so I wrote it in Python. The good news is that BackupPC doesn't care, but it does mean this probably won't be part of the BackupPC main distribution any time soon.
Is this project dead?

You could say that. A lot of my projects are one-off scripts that solve a very specific need I have, and I don't put too much thought into making them useful for other people. This script works for me and (sorta) meets my needs, so that's where it is.
What changed in September 2011?

I got tired of seeing a square-wave pattern on my throughput graphs, and so I modified the system to use Python's multiprocessing library. It will now run GPG encryption jobs in the background, with as many CPUs as you have available, while transmitting files.

This probably isn't a problem for anyone else, but my BackupPC server is slow (exactly one "Intel(R) Pentium(R) 4 CPU 1.70GHz") and is behind a very asymmetric cable modem connection.