At precisely 30 seconds after 3:31pm today, Unix time read 1234567890 seconds since January 1st, 1970.
That is all. Well, that, and I’m a complete and utter dork
At precisely 30 seconds after 3:31pm today, Unix time read 1234567890 seconds since January 1st, 1970.
That is all. Well, that, and I’m a complete and utter dork
I really love rsync. I’ll get to the specifics later, but first, the excessive backstory.
I’ve been doing a lot of backup scripting recently. Yes, there are tons of commercial apps out there, but none of them that I’ve looked into are a perfect match for all of our needs. I’ll eventually settle on one and it will probably replace 80% of my scripts, but plenty will remain.
One problem that I’ve encountered while doing the whole backup juggling bit is the ferocious rate of change in the nature of the data we’re archiving. Code I’d written a year ago was obsoleted 6 months ago by code that was obsoleted 3 months ago by code that I replaced a few weeks back that is being replaced by the code I’m writing right now.
Another one of the problems is that the sheer quantity of data involved is growing in a very uncontrolled way. Early last May (the oldest archive I have easy access to), a full archive of the entire system was barely 3gb in size. Today, it is closer to 60gb. ~20x growth over the last 9 months.
It’s been fun, if somewhat frustrating, dealing with all of the growth.
On January 20th, I needed to perform a long-deprecated sort of snapshot. The code that generated this sort of file no longer worked because so many things had changed. I wound up digging out old scripts from SVN and updating them to run against the new environment.
Because of the amount of data involved, this took a very long time. It didn’t help that the scripts consumed an unfair amount of system resources – I couldn’t run them with any meaningful priority during the day without crippling everyone else.
Lots of low priority io later, I finally had a 54gb tar file… In one of the three places I needed it.
The first transfer was simple, the hosts are on the same gigabit switch as each other. Unfortunately, scping that much data between two hosts at that kind of speed has negative effects on the systems involved. I had to throttle the transfer way down to before it could run without visibly impacting performance.
rsync --partial --bwlimit=10000 -e "ssh -i ${RSA_KEYFILE}" ${LOCAL_FNAME} {REMOTE_USER}@${REMOTE_HOST}:${REMOTE_FNAME}
The second transfer… wasn’t so easy. I needed to move the file to my office without negatively impacting everyone’s ability to work – and I couldn’t wait for the transfer to run at low enough speeds not to cripple the T1.
We have a backup 6mbit DSL link that I only use for emergencies and for testing. Even at a full 6mbit, the transfer would have taken more than 36 hours. Compressing the file took a while but brought the file size down to a much more manageable 24gb (~11 hours over the DSL).
The only remaining gotcha was that DSL link can’t actually SSH through the firewall into the colo
So… I started the transfer over https last night and went home.
This morning, it was finally time to decompress the monstrosity locally, but I noticed a hiccup in dsl traffic overnight and figured I’d run a check on things first – just to make sure that http resume had worked correctly.
ammon@scruffy:~$ gunzip --test archive_2009_01_20.tar.gz gunzip: archive_2009_01_20.tar.gz: invalid compressed data--format violated
This was not good. I had a 24gb file that was somehow corrupted… somewhere.
Since re-downloading the whole thing would cost me another whole day… I had to find out a way to repair the file in a reasonable amount of time. Some research and suggestion gathering later, it was confirmed that rsync would probably handle the task.
Assuming that I wouldn’t be using an unfair amount of bandwidth for this, I switched back to the T1 link so I could tunnel through SSH again.
ammon@scruffy:~$ rsync --checksum --inplace -e "ssh" wernstrom:/tmp/archive_2009_01_20.tar.gz archive_2009_01_20.tar.gz sent 1280578 bytes received 1440757 bytes 2622.97 bytes/sec total size is 25619572576 speedup is 9414.34 ammon@scruffy:~$ gunzip --test archive_2009_01_20.tar.gz ammon@scruffy:~$
(remember, this is unix, no output implies success)
So, yeah. Rsync, I love it when you work.
It took some time and generated a lot of disk activity when the process started, but it worked almost painlessly and only transferred the data I needed – thus leaving the shared network resource free for everyone else