Shared disk / synced disk between Linodes
When a file is uploaded to one server, I'd like it to magically also be on the other server (for example when a user updates their profile picture).
How do I do that? NFS? Hadoop? GlusterFS? At the moment the web server shells to an rsync, which is nasty nasty nasty.
Thanks in advance.
14 Replies
@graham:
When a file is uploaded to one server, I'd like it to magically also be on the other server (for example when a user updates their profile picture).
If you're willing to accept some small latency, I like Unison for this - keeps two filesystem trees in sync, even bi-directionally. I typically set up a periodic (say 1-5 minutes depending on requirements) cron job that automatically syncs the two locations.
If uploads will always go to one server, you can use Unison in a purely mirrored approach, but if they might get updated on either server depending on which one handled the web request, Unison will figure out which needs to be updated.
You can certainly also trigger Unison from the web server so it only happens following a known update or to cut down on latency during an update.
It'll work fine directly over two filesystems, if you have the remote system mounted via NFS, for example, but can also run just fine over SSH.
– David
I'd recommend DropBox (LAN syncing is supported in the experimental builds), but a lot of people don't like using that for business purposes since the data is stored by a third party (DropBox on Amazon S3).
That and their product seems to be getting bloaty-er and slower… IMHO.
Still a great service if indeed you need to sync 25+ Gigs of data to multiple locations.
@Guspaz:
Unison is rsync-like (and in fact uses the rsync algorithm). The OP mentioned that he's not pleased with rsync as a means to synchronizing the two environments, presumably due to the delay between syncs and the overhead of having to scan everything each time.
Actually, it's not clear what was "nasty nasty nasty" - I assumed it was the need to shell out the rsync command from within the web server environment. I'd be surprised if it was the rsync algorithm itself, which is well suited to synchronization duties, and to be honest, something along its lines should be desired in any synchronization solution if only to cut down on bandwidth.
Yes, while Unison uses an implementation of the rsync protocol as part of its transfer methodology, it's not really fair to equate the overall applications. For one big difference, unison is bi-directional, while rsync (the utility) is uni-directional. Also, Unison can recognize file moves, and not send any file contents or deltas at all in that case. It also keeps a cached state, so runs across even very large filesystems with minimal changes can be very efficient.
– David
i have a pretty little graphic you can take a look at of our general setup.
rsync is great for eventual consistency, especially backups, whereas I would like near-realtime updates, under a second. The uploaded image needs to get to the media server before the next request comes in from the same user.
I will investigate unison, thanks for the recommendation.
On dedicated hardware I'd set up DRBD if it's a simple pair.
Otherwise it'd be NFS.
Only problem is with two servers, if one dies the other loses the data so it's not really replicated data, but merely copied.
For replicating the data realtime the only thing I can think of is DRBD, which still should not be too intensive as it'll only be as intensive as the actual writes.
Dropbox is far faster, since it notices files as soon as they change and syncs them, but there's added latency while one machine syncs to Dropbox's service, and then back down to the other machine. Sync times between machines when a file changes are probably 1-5 seconds (on a reasonably sized file), but that seems too slow for your desires.
Looks like you're going to have to go with some sort of filesystem-based solution.
DRBD isn't what you want, and is unusable for you. It's network RAID-1, which is great for reliability, but only one machine can mount the virtual disk at any given time. So you can't use it to share data.
One of the concerns about various networked or replicated file systems seems to be that if one machine is down, then the other machine may not be able to access the file system. However, if your setup has the client machine relying on the server machine anyhow, that may not be relevant; the client couldn't do anything anyhow.
It's certainly not realtime, but just to add a data point, I can run Unison to sync two trees on machines at different physical locations (connected by an OpenVPN tunnel, which Unison runs ssh over) with sub-second run times when there are no changes. This is for a filesystem tree with about 6000 files and 8-9GB of data. I run Unison via a cron job that checks for an existing copy running, so you can run it at a small time interval without worrying about the occasional longer runtime needed when some big files change.
Mixing the two approaches would avoid any single point of failure dependency (such as with a common central filesystem), and provide a safety net if the real-time update got lost for some reason (temporary disconnection between servers), so that you'd know the two systems would eventually come back in sync once re-connected.
– David
@bezerker:
On dedicated hardware I'd set up DRBD if it's a simple pair.
DRBD replicates disks at the block level. Just be aware that this means one system MUST be read-only unless you use a cluster-aware file system. So DRBD isn't a complete solution for what you want to do (write from both nodes). Commonly used cluster file systems are glusterfs, GFS, and OCFS. NFS is NOT a cluster aware file system.
inotifywait
@Vance:
Instead of using rsync to scan and sync an entire directory tree, you could use
in a shell script and just sync the file that's changed. inotifywait
Or incron
http://ubuntuforums.org/showthread.php?t=249889
It's working smoothly so far.
Additionally a nightly cron on each NFS client copies the contents of the NFS share to local disk, as a backup.
Thanks again for all the help and advice.