Backing Up Files to Amazon S3 with Rake
posted
Friday, June 06 at 04:09AM
Just a quick script I hacked together to backup file submissions for a project. Naturally, it could be cleaned up, and certain items could be set using sane defaults based on environmental variables. For example, I'm setting current_bucket to a hard-coded value based off the RAILS_ENV variable. One could instead pass this in doing something like BACKUP_BUCKET_PREFIX=myproject_ RAILS_ENV=production rake backup:all. Or something. Whatever.
require 'aws/s3'
############################
### HELPERS HERE ###########
############################
# Just litterring up my tasks a bit. =)
# I don't want to load these helpers unless I'm actually connecting, so let's wrap them in a module
module AWSHelpers
include AWS
def msg(msg, post_msg = nil)
print "#{msg}... "
yield if block_given?
puts "done. #{post_msg.nil? ? "" : "#{post_msg}."}"
end
def get_submission_paths
# We need the full path for accessing the file, and a key to reference it by
# in S3
files = []
msg("Finding local files") do
# abusing is just a string helper, explained below
abusing "#{RAILS_ROOT}/public" do |doc_root|
# Replace this with whatever path you're aiming to back up.
files = Dir["#{doc_root}/system/submissions/*/*"].collect{|f| {:path => f, :key => f[(doc_root.length + 1)..-1]}}
end
end
files
end
def set_current_bucket
# The name of the bucket
current_bucket = "projectname_backup_#{RAILS_ENV}"
msg("Setting current bucket to #{current_bucket}", "Current bucket set to #{current_bucket}") do
begin
S3::Bucket.find(current_bucket)
rescue
# Stupid AWS::S3 doesn't pass a named exception. Catch all.
begin
S3::Bucket.create(current_bucket, :access => :public_read)
rescue
raise "Couldn't get a bucket for storage. Sorry mate."
end
# Try to create it.
end
end
S3::Bucket.current_bucket = S3::S3Object.current_bucket = current_bucket
end
def get_current_objects
objs = []
msg("Getting current objects") do
objs = S3::Bucket.objects
end
objs
end
def list_new_files(file_list, current_objects)
key_list = current_objects.collect{|o| o.key }
file_list.reject{|f| key_list.include?(f[:key])}
end
def upload_files(file_list)
puts "Nothing to upload!" and return if file_list.empty?
file_list.each do |f|
msg("+ Uploading #{f[:path]}") do
S3::S3Object.store(f[:key], open(f[:path]))
end
end
end
end
############################
### TASKS HERE #############
############################
namespace :backup do
desc "Backup all files to S3"
task :all => :connect do
current_files = get_submission_paths
upload_files(current_files)
end
desc "Backup previously unbacked-up files to S3"
task :new => :connect do
current_files = get_submission_paths
current_objects = get_current_objects
new_files = list_new_files(current_files, current_objects)
upload_files(new_files)
end
desc "Backup a tarball of all files to S3"
task :tar => :connect do
puts "Mason has not implemented this yet."
end
task :connect => :environment do
include AWSHelpers
AWS::S3::Base.establish_connection!(
:access_key_id => 'your_key',
:secret_access_key => 'your_secret'
)
set_current_bucket
end
end
For an explanation of abusing see DRYing/Cleaning Up Your Code/Namespaces With 'using'
Requires the AWS::S3 library.