Large Scale Deployments: What's your Linode recipe?

Hello all:

I have been approached by a client to put a business plan for a large scale service, a membership-only teaching-oriented website that will rely heavily on protected video content. We will be dev-ing on Drupal 7 and running Movie Masher as a video encoder/decoder server app on EC2 for backend processing.

I want to keep the non-video side of the project in Linode for many reasons too long to explain here, but I have never really set up something as large as what these people are asking for. 42k 30 second video clips, 10k memberships with an extra free second tier membership level of an additional 3.5k. And this is just for the first sex months only. They expect it to grow to 200k within one year with 120k videos. The videos will be protected, not public though.

After a few days trying to figure this out (starting with whether I should even be looking at Linode, but collocated servers instead) I thought perhaps some of you maybe be running some large scale set ups and would be kind enough to give some basics of your set ups. How many nodes? Are you running the databases separately? Using node balancers? How many? CDN…which one? Storing at Linode or S3?

Money is not a problem, so knock yourselves out. This is how I see it on my head as of right now:

One Linode Linode 2048 to run the front Drupal site.

One Linode 20GB to run as storage for video files OR An S3 account at amazon.

One Linode 20GB to run the MySQL database.

3 or 2 Node Balancers, one for each Linode used.

One EC2 at Amazon to run the MM server.

Not sure how do I implement the CDN or if needed since all this content won't be public. I would imagine I would probably want one.

Or should I go the collocated route?

Anyway, any feedback would be much appreciated.

8 Replies

Personally, on any large projects, we colocate. That way we have complete control over the hardware (cpu, ram, storage) and we know exactly what is running on the box (no pesky virtual neighbors to make troubleshooting harder then it needs to be).

Hardware upgrades are a one time investment instead of a monthly hit. Storage is what you need, not what's available. And all 8 cpu cores are all yours baby, sharing is for sissies (just kidding).

That way you can buy/build your own custom boxes and just shop for decent bandwidth.

Don't take this the wrong way, absolutely nothing wrong with Linode (they're the best VPS provider out there), but you'd be trying to make a large screwdriver into a jack hammer going the VPS route (in my opinion).

Downside is you really need to have warm spare hardware available (either in the data center or ready to ship out depending on your downtime tolerance).

Of course the advantage of VPS systems is they can be spun up or down quickly and cheaply, so you could always go the LINODE route first, and then when you start to outgrow their storage/bandwidth migrate to the co-locate plan.

That would allow you do debug your business model without worrying about the hardware, and then move to phase 2 if and when you need to.

Or you could always check with your Magic 8 Ball and see what that says. Free advice (even from the fine folks here on Linode Forums) is still just free advice.

@kannary100:

Hello all:

I have been approached by a client to put a business plan for a large scale service, a membership-only teaching-oriented website that will rely heavily on protected video content. We will be dev-ing on Drupal 7 and running Movie Masher as a video encoder/decoder server app on EC2 for backend processing.

I want to keep the non-video side of the project in Linode for many reasons too long to explain here, but I have never really set up something as large as what these people are asking for. 42k 30 second video clips, 10k memberships with an extra free second tier membership level of an additional 3.5k. And this is just for the first sex months only. They expect it to grow to 200k within one year with 120k videos. The videos will be protected, not public though.

The odd thing is that Linode is much better suited at encoding videos than EC2, due to the much larger amounts of CPU power available at Linode (relative to cost). I'm not sure why you'd want to run only the video encoding at Amazon, unless the cost savings from hourly billing (scaling frequently up and down in number of EC2 nodes) are so enormous as to outweigh the CPU advantage at Linode. Storage is much cheaper at Amazon, though.

@kannary100:

After a few days trying to figure this out (starting with whether I should even be looking at Linode, but collocated servers instead) I thought perhaps some of you maybe be running some large scale set ups and would be kind enough to give some basics of your set ups. How many nodes? Are you running the databases separately? Using node balancers? How many? CDN…which one? Storing at Linode or S3?

Your large number of discrete clips sounds like a good candidate for S3, so long as you use it directly rather than trying to use some sort of posix filesystem layer. Linode is very high performance, but currently very high storage cost ($1.00/GB). S3 is cheaper for storage ($0.083 - $0.140 per GB depending on redundancy and volume, assuming under 50TB), but a bit more expensive for transfer ($0.12/GB for first 10TB, $0.09/GB for next 40TB, Linode is $0.10/GB generally, but is willing to cut discounts for bulk use).

Basically, comparing between the two, you'd need to figure out which is cheaper. There's also the complexity question: Linode servers would serve the data off a regular hard drive and web server, while S3 requires more coding on your part.

When pricing, remember that Linode pools bandwidth among all linodes and datacenters, so if you've got a 20GB MySQL box, that's adding 2000GB per month of bandwidth to the pool, but consuming none (data transfer within a Linode datacenter is free if using the private network, although between datacenters is not free). Of course, the 20GB linode is not priced for bandwidth efficiency, the bandwidth stops scaling linearly after the 4GB linode. But you can always try to work something out with Linode if you're going to be buying a lot of bandwidth.

@kannary100:

Money is not a problem, so knock yourselves out. This is how I see it on my head as of right now:

One Linode Linode 2048 to run the front Drupal site.

One Linode 20GB to run as storage for video files OR An S3 account at amazon.

One Linode 20GB to run the MySQL database.

3 or 2 Node Balancers, one for each Linode used.

One EC2 at Amazon to run the MM server.

Not sure how do I implement the CDN or if needed since all this content won't be public. I would imagine I would probably want one.

Or should I go the collocated route?

Anyway, any feedback would be much appreciated.

In terms of node balancers, I think you've got the relationship backwards. One node balancer sits in front of multiple linodes, directing traffic between them. They only do web traffic, so it only makes sense to put node balancers in front of things serving up web data (drupal linodes, video serving linodes). You'd have multiple linodes running drupal in a cluster, with a node balancer to load balance traffic between them.

Linode Staff

@Guspaz:

They only do web traffic
NodeBalancers can balance any TCP protocol (whether or not it makes sense to do so).

https://library.linode.com/linode-platf … -reference">https://library.linode.com/linode-platform/nodebalancer-reference

-Chris

@caker:

NodeBalancers can balance any TCP protocol (whether or not it makes sense to do so).
That makes me curious: Will they do UDP as well? I'm not using UDP, just something I'm curious about in case I do.

@caker:

@Guspaz:

They only do web traffic
NodeBalancers can balance any TCP protocol (whether or not it makes sense to do so).

https://library.linode.com/linode-platf … -reference">https://library.linode.com/linode-platform/nodebalancer-reference

-Chris

OK, sure, but you can't just put them in front of a random TCP app and expect them to work. You can do that with HTTP. For TCP, you'd need to verify that the app is compatible with that sort of connectivity.

Amazon S3 can generate unique time-limited URLs. That way, you can keep your video files non-public and only show them after your app has verified the user's credentials. I'm not sure whether this functionality integrates with their CloudFront CDN, though. (Of course there's nothing wrong with serving videos from S3 without using CloudFront. Just a little more latency. If you use your app to generate time-limited URLs, you're already sacrificing latency anyway.)

I agree with @Guspaz that a Linode generally has better CPU power than an EC2 instance. But if you constantly use all of your CPU, your virtual neighbors might not be happy. Better use a huge Linode for video encoding. Or better yet, get your own 24-core server.

@kannary100:

And this is just for the first sex months only.
First sex months are always the best months. :wink:

@Guspaz:

OK, sure, but you can't just put them in front of a random TCP app and expect them to work. You can do that with HTTP. For TCP, you'd need to verify that the app is compatible with that sort of connectivity.

The same restriction applies to HTTP, as well. Just because the HTTP connections themselves are short-lived and stateless over time doesn't mean the application is going to be happy being load-balanced. Indeed, that just makes it worse sometimes. Picking a random arbitrary situation I've had in the past, load-balancing IMAP is easier to do than load-balancing webmail over HTTP.

(Having horrible flashbacks to mbox over NFS, afk)

@hoopycat:

The same restriction applies to HTTP, as well. Just because the HTTP connections themselves are short-lived and stateless over time doesn't mean the application is going to be happy being load-balanced.

Exactly. The apps must not rely on any resource outside of the HTTP request-response chain.

Things like session files in PHP (or any other platform), temporary or permanent file/image uploads, database locations (localhost vs remote host), etc… must be considered with load balancing.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct