Alternatives to MySQL on Linode?
I am writing an API that'll be running on my Apache Linode instance. This API will not be calculation heavy, but it could be very read-heavy (many users requesting stored data). Since I have not written the database component of the project yet, I am looking at not using MySQL. There are several ports or improvements upon MySQL out there, and I was wondering what others have thought of the newer systems available.
For example: Percona ServerDrizzlebetterperformanceplentytestimonials
1) Ease of use for multiple applications: Whatever I end up using, I'd like to use for future projects as well. The current project will probably talk to the database with Perl, which I suspect can connect to any database, but large compatibility is a plus.
2) Ease of installation: I'm running Debian 6, and (for example) it appears Drizzle is not yet available through apt-get. None of my work is production-critical so I am okay playing with early versions of software, as long as it's not too painful to install and maintain.
3) Efficient use of memory: I am currently using a Linode 512, which also hosts a few other small programs. I will upgrade to a 1024 if my API ends up being very widely used, but I'd like to remain on a small host if possible.
4) Choosing a sane system: I'll be honest, I don't know enough about SQL to make an informed decision. For example. the benchmarks for Percona are more impressive, but it feels very "astroturfed" and it's hard to get solid information about it. And I've only looked at these two systems, I'm sure there are many more.
So, my open-ended question: which do I actually use?!
I only mentioned the two most obvious choices, but of late the SQL world seems to have exploded.
I would be interested to hear any thoughts or comments.
Thank you for your time
21 Replies
I'd also like to know a more scalable/faster mysql product to use.
Drizzle seems to be more for cloud use (and while linode is a cloud provider, one VPS does not a cloud make), and isn't updating from MySQL as it goes; they want to use MySQL only as a starting point. It's also not stable; they're still in "beta".
That leaves Percona and MariaDB. My understanding is that MariaDB ships with Percona's XtraDB storage engine (replaces InnoDB) anyhow, so that's not really an advantage either way. I'd personally lean towards MariaDB since it seems to want to hew closer to MySQL in terms of compatibility, but they're both drop-in replacements, so I can't really say which is better.
If not, you might want to take a look at PostgreSQL
Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt
(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)
@hoopycat:
Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt
I haven't played with memcached, but most RDBMS will do query caching if you do pass it the same read query twice; I'd tend to lean towards letting the RDBMS worry about caching rather than doing it by hand, but perhaps there are big performance improvements from using memcached rather than the database server's query cache?
@hoopycat:
(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)
Defaults or not, let's be fair, MySQL has supported transactions with InnoDB for over a decade. Then again, I frequently mock Apache for being completely out of touch with reality and choosing ludicrous httpd default settings, so default settings do have some weight.
On the MyISAM jab: Yes, I was hesitant to include it, but it's an important reflection on MySQL-as-a-project's decision-making process, and I did qualify it to reflect the change in 5.5. That, and Apache, have assured me that even blatantly dangerous and poorly-justified default settings in software won't get you sued out of existence.
HOWEVER, Pgsql relies a lot on filesystem cache and mmaps, so at least something is cached.
In my Python apps I can easily integrate Beaker caching framework with SQLAlchemy and basically have query-level caching.
@Azathoth:
Postgresql does not have query caching because such thing is theoretically impossible in the context of a transaction.
Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.
@Guspaz:
Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.
Well, I could be wrong of course, but as far as I know, PostgreSQL does not have query cache like MySQL for the same reason it can't cache the row count but has to count row by row – because each transaction can see different set of available rows. PostgreSQL's caching is done differently and relies on OS system cache.
Manual
PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.
Or at least that's what I remember. Databases change all the time, so the caching behavior may be different in current versions.
@hybinet:
PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.
That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the results of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.
@Guspaz:
That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the results of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.
You're absolutely right. But unless you want your database to return stale data, the database needs to keep track of a lot of things if it is going to maintain a cache for query results. MySQL does this in a rather rudimentary fashion: whenever anything changes in a table, invalidate all cache entries that are related to that table. Even this approach results in a non-negligible amount of overhead.
A disk cache, on the other hand, works fine even after a table changes, because the changed blocks are also cached by the OS. There is no need to invalidate any cache entries. So MySQL's query caching would be faster in read-heavy use cases, whereas PostgreSQL's approach would be faster in write-heavy use cases. (Of course, there are many other factors to DB performance, too.)
I don't think this characterization is too far off from the way the two databases are usually thought of. It seems, however, that there is now a program called PostgreSQL Query Cache
@Guspaz:
Transactions don't make query caching terribly more difficult
Can't speak to transactions, but MVCC, which both Innodb and Postgres use, do. See
The net of all of this is that caches some times help and sometimes hurt, they're hard to get right, and MySQL has query cache but Postgres does not. (There's a recent add-on query cache for Postgres but it's not well vetted at this point.)
@hybinet:
A disk cache, on the other hand, works fine even after a table changes, because the changed blocks are also cached by the OS. There is no need to invalidate any cache entries. So MySQL's query caching would be faster in read-heavy use cases, whereas PostgreSQL's approach would be faster in write-heavy use cases. (Of course, there are many other factors to DB performance, too.)
But MySQL is going to be using disk caching too. It may not tell the OS what parts to cache explicitly, but the OS is generally smart enough to say, this data is being accessed a lot, I should keep it in RAM.
I wrote a search engine a long time ago, and query caching was really handy. The data in the search engine wasn't refreshed all that often (well, we're still talking minutes, but there were a rather large number of queries between each DB refresh pass), and a large number of queries were very common searches. Certainly there was a bit of a long tail type thing going on, but in my case, query caching helped performance a ton (and I did compare).
Disk caching was also irrelevant in my case, since I was keeping the entire table in RAM (then called heap tables, now I think MySQL calls them memory tables). Basically, I did all my updates and inserts on a regular table, and after each set of inserts/updates (which happened in bursts ever so often), I'd nuke the heap table and replace it with a copy from disk. It was the first web programming I'd ever done (and the first time I'd touched a database in my life), so it was all pretty badly written, but it worked well enough at the time to handle a few hundred thousand hits a day on a crappy old celeron.
@Guspaz:
The data in the search engine wasn't refreshed all that often
@Guspaz:I was keeping the entire table in RAM.
Yep, that sounds like a case where the MySQL style of query caching would help a lot.
Guspaz was quite right that I had left out MariaDB; I believe I bumped into it after making this thread. The distinction between Percona and aren't very clear, and I get the feeling it's more a corporate offering than anything anyway. MariaDB is definitely a contender, then.
To answer hybinet's question about MySQL-compatibility: for this project, no, but in general, it'd be a plus. Main concern is that I intend to host some form of blog/CMS/forum software, and it seems everything is MySQL. I've heard many good things about PostgreSQL, but I have not researched it thoroughly. I haven't finalized the products I want to use that, hence my nervousness about breaking MySQL-compatibility. The memory of speed advantages would have to be significant (and maybe they are!).
To answer Ericson578's comment about nginx, one thing at a time eh?
The comments about caching are interesting, and I will have to research this topic more. Incidentally, further down the road I'll probably be making a thread about caching at the output/Apache level
(I'm on the waiting list.)
"Your application can instantly communicate with MemSQL using the MySQL protocol. The entire MySQL ecosystem remains at your disposal."
@ferodynamics:
memsql
Interesting. How is it different from using MySQL with the MEMORY storage engine? Would it keep my data safe if my server suddenly got Fremonted?
MemSQLNuoDB
Unfortunately, for the time being those are websites are for capital investment, not practical use :/
NuoDB at least has something, but it sounds a lot like MySQL Cluster
(I… think this is the nicest thing I've ever said about MySQL.)