How to do incremental backups with AWS?

There's plenty of ways of doing backups, here's a few thoughts and options.

Incremental Backups

Ideally your backups should be incremental. This means you can roll back to previous versions if you need to. It can also significantly reduce CPU, bandwidth usage, and sometimes storage. You can also look at differential backups, which are slightly different, in that they're a delta to the last full backup.

On-Premise backups to AWS

The AWS Storage Gateway virtual appliance could be useful for you. Install the VM, allocate some disk. There are a few modes and they changed the names not so long ago, but in essence it works like a local disk that's backed up to S3. It can keep all your data on premise, or use the on-premise disk as a cache to S3.

Otherwise there are plenty of other options to get data into AWS.

AWS S3 Sync

I use the "aws s3 sync" command line utility to upload data to AWS S3, using the IA storage class. This keeps the backups safe on AWS S3. You can upload the files you want backed up direct to S3, you can upload the repository of a backup program that's stored locally, or your backup software may use S3 natively. Using the command line here's what you do.

aws s3 sync /path/to/backups s3://bucketname/backups --storage-class STANDARD_IA --delete --exclude ".sync/*"

You can turn on encryption and versioning within S3. It keeps each version of the file separately, it's not incremental, so it can use a more storage than incremental backup software that also does compression.

Destination

S3 is a good place for backups. If your backups are fairly static you can use S3 Infrequent Access class storage, which is cheaper than standard storage class. You can use the S3 Glacier storage class if your backups are immutable (never change), but remember it takes hours to get those backups back.

Dropbox is a reasonable place for backups. I've used the Dropbox Uploader script. It's not as flexible though - it doesn't delete files that have been removed locally, wasting disk space. This is how you use it.

/opt/Dropbox-Uploader/dropbox_uploader.sh -s -q upload /path/to/files /dropbox/path

**Borg Backup (no longer using) **

I backup my Linux server (which happens to be on AWS) using Borg Backup. This creates an incremental, de-duplicated backup on a local disk. It has retention policies which tell it how long to save data - eg every night for a week, once a week for a month, monthly for a year. There are plenty of incremental backup programs you can use.

One thing I don't love about Borg Backup is each time it runs it renames existing files. I think this behaviour changes for really large backups, but my 500MB backup definitely renames the file each night. You end up with hundreds of tiny files and one new / renamed file each day that's large. Because of that, if your backups are remote you'll probably use a lot more bandwidth that you'd expect. I stopped using Borg because of this.

Borg also supports remote repositories natively, accessed via SSH. You could have have an EC2 instance with an EBS disk come up, sync to it, then the instance go down - but EBS is a lot more expensive than S3 so it's not a great option.

**Restic Backup (now using for PC / web server backups) **

I've been using Restic Backup for a couple of years now. It aims to be easy, fast, verifiable, secure, efficient, and free. It works on most platforms (*nix, Windows, etc), and it's compiled to a single binary so installation is easy.

It's block based and efficient with disk / network. I've done multiple restore tests and it's always restored everything perfectly. I'm happy with restic. It's under slow but steady development, things like compression will be added in future - it does de-duplication but not compression as at August 2020.

Initialise the repository

set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe init --repo x:\repository

set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe --exclude c:\data\exclude --repo c:\data backup x:\repository

You can have Restic keep backups for configurable amounts of time - for example keep daily backups, then weekly backups for 8 weeks, monthly for 24 months.

restic_0.9.1_windows_amd64.exe  --repo x:\repository forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24
restic_0.9.1_windows_amd64.exe  --repo x:\repository prune

If you want Restic to backup to S3 you just define your keys and do a backup like this. One thing to consider here is Restic may read the data from S3, even though it has a local cache, so IA class might end up more expensive than standard in some cases - though I suspect those would be rare cases.

Here's the basic setup of Restic:

REM setup S3 (once)
set AWS_ACCESS_KEY_ID=ABCDEFGHIJK
set AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXX/AAA
set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe --repo s3:s3.amazonaws.com/s3-bucket-name init

This is how you do the backup

REM backup
restic_0.9.1_windows_amd64.exe --repo s3:s3.amazonaws.com/s3-bucket-name backup c:\data