how to copy disk image of server to amazon aws S3
Does anyone know how to copy a disk image of a whole linode server to a bucket in amazon aws S3?
I have tried following all the articles in the library, logged support tickets, and spoken to a few people at Linode and no one and nothing has been able to tell me how to do it successfully.
The latest I heard from Linode support was that it is not possible to do this since Amazon S3 does not support SSH.
please note that I do not want to copy the data to my local machine before pushing it to S3. I just want to copy directly from Linode to S3.
Is this possible?
Can you please give me detailed instructions of how would I do it?
Thanks
Lance
14 Replies
For a 20 GB image with the default network profile, this should take no less than an hour.
HOw do you use s3cmd to copy from Linode to S3? how would you issue this command? Are you sure this is possible to do in rescue mode?
thanks
blueprintetckeeper, and/or chefduplicitytarsnap
For sending stuff to S3, I find it's easier to use whatever you normally use to send stuff to S3. For me, it's s3cmd, but there's probably others out there.
In the interest of science, I just deployed a fresh Linode (with a 10 GB disk image – I'm not made of money here, yo), booted up Rescue mode, and ssh'd to lish. Long story short, I couldn't make it work. My first attempt was to install s3cmd (apt-get update; apt-get install s3cmd; s3cmd --configure) and try to put the file. It returned immediately, having done nothing:
root@hvc0:~# s3cmd mb s3://awesome-bucket-of-science
Bucket 's3://awesome-bucket-of-science/' created
root@hvc0:~# s3cmd put /dev/xvda s3://awesome-bucket-of-science/disk.img
root@hvc0:~#
So I installed Boto 2.0 from the repository (apt-get install python-boto) and tried to upload that way. It, too, failed, but after doing much more:
root@hvc0:~# python
Python 2.7.2+ (default, Aug 16 2011, 07:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from boto.s3.connection import S3Connection
>>> conn = S3Connection('<aws access="" key="">', '<aws secret="" key="">')
>>> bucket = conn.create_bucket('awesome-bucket-of-science')
>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'disk.img'
>>> k.set_contents_from_filename('/dev/xvda')
... a long pause here ...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 713, in set_contents_from_filename
policy, md5, reduced_redundancy)
File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 653, in set_contents_from_file
self.send_file(fp, headers, cb, num_cb, query_args)
File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 535, in send_file
query_args=query_args)
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 423, in make_request
override_num_retries=override_num_retries)
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 618, in make_request
return self._mexe(http_request, sender, override_num_retries)
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 584, in _mexe
raise e
socket.error: [Errno 32] Broken pipe</module></stdin></aws></aws>
I suspect it is dying when trying to find the mimetype and MD5 hash of /dev/xvda. So, I installed a newer version of Boto which has a setcontentsfrom_stream method to skip this:
root@hvc0:~# apt-get install python-pip
root@hvc0:~# pip install boto --upgrade
...
root@hvc0:~# python
>>> import boto
>>> conn = boto.connect_s3('<aws access="" key="">', '<aws secret="" key="">')
>>> bucket = conn.create_bucket('awesome-bucket-of-science')
>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'disk.img'
>>> fp = open('/dev/xvda', 'rb')
>>> k.set_contents_from_stream(fp)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 757, in set_contents_from_stream
% provider.get_provider_name())
boto.exception.BotoClientError: BotoClientError: s3 does not support chunked transfer</module></stdin></aws></aws>
So, nope. I think it can certainly be made to work, but I've spent an hour on this and couldn't get it to upload my /dev/xvda, so it's your turn to play around with it for awhile! -rt
Thanks a lot for giving it a good shot. I've spent hours trying to get this to work and have not found a way to do it. Trust me, I tried very hard prior to putting a request on here.
Perhaps someone else may have a solution?
Data should already be getting backed up nightly (or more often depending on the data).
Starting a fresh VPS from scratch should be scripted out (or if a once in a while process, documented THOROUGHLY).
In the case you need to spin up a brand new Linode, it will be way faster to start a fresh VPS, run the setup scripts (or do so manually from your config documentation), and restore the data then it will be to prep a new VPS, setup the empty partition, and copy back a boatload (i.e. 20GB) of image data.
I don't see why a proprietary image of Linode's VPS setup stored offsite is that much of an asset.
When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.
There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).
@Guspaz:
The proper approach from a Linode perspective is probably to store custom data (like a tarball of your web root, or a latest backup of the databases, or whatnot) somewhere that you can pull it down (like S3, or a "master" linode), and then write a stack script that gets the right packages and config settings going, then pulls down the tarball containing the necessary custom files; this is very simple to do.
When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.
There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).
The reason is that I want to shut down my linode for a while. I'm not using it now and I'm not sure if I'm going to. But in case I do want to have it back I want an easy way to store it and bring it back at some point down the road. I don't want to keep paying $20/mo if I'm not using it.
@Guspaz:
The proper approach from a Linode perspective is probably to store custom data (like a tarball of your web root, or a latest backup of the databases, or whatnot) somewhere that you can pull it down (like S3, or a "master" linode), and then write a stack script that gets the right packages and config settings going, then pulls down the tarball containing the necessary custom files; this is very simple to do.
When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.
There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).
I guess if I have no idea how to write a script, I'm at a loss. I have no background in sysadmin or programming. Just simple folk.
Make a step by step documentation (fresh vps, install Apache, install PHP, install ….)
Document the config setups (and make a copy of them to a thumb drive).
And of course your data should be backed up already (and document that process as well).
Then if you need to do a bare metal restore, you have the step by step process (with examples and copies on your thumb drive) to do so.
The key to this method is not to skip ANY step. What seems blatantly obvious at this point in time will be a muddled faint memory 6 weeks/months/etc from now.
Was able to do this, albeit it's still hacky. After rebooting into rescue mode:
update-ca-certificates
apt install python3-pip
pip3 install s3cmd
s3cmd --configure
dd if=/dev/sda | s3cmd put - s3://bucket/linode.img
The important part being since v1.5
, it supports stdin via -
as file name.
https://serverfault.com/a/690328
You can also verify the checksum, albeit it's a bit involved, because Amazon calcs them per part in multipart uploads. You need to pay attention to the part size (in my case 15MB) and number of parts (in my case 201).
for i in {0..200}; do echo $i; dd bs=1M count=15 skip=$((15*$i)) if=/dev/sda | md5sum | grep -o "^[^ ]*" >> checksums.txt; done
https://stackoverflow.com/a/19896823
Then compare the final checksum locally:
# xxd -r -p checksums.txt | md5sum
fa1c909f001e2ca5e21c64e51e0a7be6 -
To the one from amazon (ignore the -201
on the end, that's the number of parts):
# s3cmd ls --list-md5 s3://bucket/linode.img
2019-04-03 22:18 3149922304 fa1c909f001e2ca5e21c64e51e0a7be6-201