Large Databases
I'm currently designing an online database and backend using PHP/MySQL where a clientside program uploads a large amount of data for each user. I'm rather new to MySQL programming, and am wondering the best way to go about this. If some of you play Team Fortress 2 then you will understand, otherwise just assume each class is a different player-type.
Basically I have 44 fields for every class, plus an ID and name for each, which comes to 46. With 10 classes that brings us to 460 fields plus a possible few more. So, I ask you this, would the best way to store this data be in one database that would be 460+ fields wide, or sore the data in 10 separate databases of 46 fields each? Note, when the page loads, all (or atleast most) of the data will have to be retrieved.
It it unusual for MySQL to have databases that may potentially have 100+ entries with 460+ fields each?
If anyone has a better way to go about then then I am willing to listen!
Thanks,
Smark
12 Replies
http://en.wikipedia.org/wiki/Database_normalization
Thus, have a table containing your list of users, but link that to a class table. The class table has the 10 classes of info (i.e. 10 records for each userID, each record with a different ID for each class).
If you want to extract information, you use a JOIN to link the data sets together.
–deckert
Thanks,
Smark
Are the ten sets of 46 fields the same for each class?
Can a user have more than one character?
@Smark:
Basically I have 44 fields for every class
Unlikely. You probably have 44 attributes for each class; an attribute isn't a field. I don't know Team Fortress, so I'll just use a pseudo-RPG for examples.
So a simple table could be:
classname,attributename,value
With this you'd have multiple rows for each class
Fighter,Strength,10
Fighter,Speed,20
Fighter,Dexterity,15
etc etc etc To get all the fighter attributes you'd do "select attributename,value from details where classname='Fighter'"
Now this is where you'd start to normalise data; you'd have one table listing classname and associating it with a number and maybe other info. So
1, Fighter, "This character has two handed swords as a feat"
2, Rogue, "Strong but clumsy"
3, Elf, "Tolkein is spinning in his grave"
You'd have another table listing attribute name and a number
1, Strength
2, Speed
3, Dexterity
Now your main table would be
1,1,10
1,2,20
1,3,15
2,1,25
2,2,5
2,3,10
3,1,5
3,2,30
3,3,40
Doesn't mean anything to humans, but this is a database and with cross joins we've defined 3 classes and 3 attributes for each class.
Your main table would be 440 rows long (your 10 classes with 44 attributes each).
This is a very small database
But seriously, the problem with creating a gazillion fields is one of scalability, especially if there's a large number of attributes that could be selected from. What if character A's attributes are different to character B's? Maybe there's 20 in common, but the other 20-odd are different; now the table has to be 60+ fields wide, a lot of which are empty. It's just bad database design. Conversely the design I described is scalable, extensible, normalized, maintainable and all other good stuff
(Actually, I am semi-serious about the perl hash; this is too small and too simple to use a database for; the overhead a database introduces is amazing)
@sweh:
If you want performance, then don't use a database at all. Since we're talking about tiny datasets, store them in plain text files and load them into memory.
Which if fine until you have two programs or instances accessing and modifying the data. At which point you either convert to a real DB or come up with your own half-assed locking and sharing scheme, which will make you crazy until you give up and convert to a real DB. And the performance thing is mostly irrelevant, the dataset, as described, is so small[1] that it will live in memory regardless of the implementation method. Let the OS and DB worry about caching and disk write-out; after all, a few people have worked on this problem. The footprint and setup of something like SQLite isn't all that big, if you're loading Perl anyway.
Steve
[1] Assuming that the 40+ fields aren't all multi-megabyte BLOBS, that is.
I ended up making a table something like this:
Name ID Class Kills Deaths
-------------------------------------
Smark 1 Scout 34 32
Smark 1 Medic 13 64
Smark 1 Heavy 76 12
Smark 1 Spy 24 9
Smark 1 Engineer 67 12
Smark 1 Demoman 12 53
Smark 1 Pyro 64 23
Smark 1 Soldier 12 12
Smark 1 Total 302 217
Reject 2 Scout 324 214
Reject 2 Medic 352 242
Reject 2 Heavy 894 252
Reject 2 Spy 124 25
Reject 2 Engineer 96 47
Reject 2 Demoman 121 256
Reject 2 Pyro 134 132
Reject 2 Soldier 324 231
Reject 2 Total 2369 1399
The table goes on and on with 10 entries per player, and it goes farther to the right with more fields, I just showed deaths and kills to give you the idea of how it worked…
@sweh:
If you want performance, then don't use a database at all.
Or learn how to use one? I do databases all day long, so i'll have to take some issue with your statement.
@sweh:
Since we're talking about tiny datasets, store them in plain text files and load them into memory.
If he's not worried about multiple processes accessing it, that should be ok.
@sweh:
But seriously, the problem with creating a gazillion fields is one of scalability, especially if there's a large number of attributes that could be selected from. What if character A's attributes are different to character B's?
Which was my original question. If A's attributes are different then B's, it would probably be a good idea to have a separate TABLE for each one, with a one-to-one relationship on the main TABLE. Then, a VIEW could join them all together, especially helpful for the common fields.
@sweh:
Conversely the design I described is scalable, extensible, normalized, maintainable and all other good stuff
:-)
scalable - Not really. In practicality, such a TABLE would probably kill performance especially if any sort of aggregation was required.
extensible - If, and only if, only one value was required per entry.
normalized - Absolutely not. In this TABLE all values must be of the same data type.
maintainable - Let me guess, you're a coder who has never touched a database before, and figures as long as the statement can be written, the database is maintainable?
@sweh:
(Actually, I am semi-serious about the perl hash; this is too small and too simple to use a database for; the overhead a database introduces is amazing)
A database introduces overhead? I think you mean the connection adds it to the code. A DB connection adds no more to code than any library's API does.
Sweh, i don't mean to knock you here. It just sounds as if you don't usually touch databases too often, so even if the logic is correct, as far as databases are concerned, a long TABLE is definitely not the way to go.
Now, the OP wants to use MySQL, so i think all he wants is quick data access anyway and not a real database. But that is another story.
Yeah well. I've only got 12 years of commercial experience, mostly with Oracle (I used to DBA Oracle 7.1.6 and Oracle 8.1.3), recently with postgres. Complicated SQL still causes my brain to leak out of my ears, though, so I tend to do most logic in perl (which helps with portability as well). My current application consists of 50-odd tables, the longest of which is a couple of million rows long by 30 fields wide.
Your objections… scalable… I was talking about scalability in terms of number of attributes; a 460 field wide table is bad. extensible; can have multiple rows per attribute (but that's not required in this problem). It's extensible by easily adding more attributes without needing to modify the schema; normalized… absolutely YES it is ("Name", "Class" should both be lookups into secondary tables; not sure you know what a normalized dataset actually is), maintainable (schema changes are bad when it comes to maintainability; my design didn't require any) etc etc etc.
But, at heart, I'm a unix geek and for data this small I'd still do it with text files
But shrug each to their own. There's more than one way to skin a cat.
@Smark:
Thanks for the help everyone… After posting on a few other forums and talking with some people, it looks like my post mislead everyone…
I ended up making a table something like this:
Name ID Class Kills Deaths ------------------------------------- Smark 1 Scout 34 32 Smark 1 Medic 13 64 Smark 1 Heavy 76 12 Smark 1 Spy 24 9 Smark 1 Engineer 67 12 Smark 1 Demoman 12 53 Smark 1 Pyro 64 23 Smark 1 Soldier 12 12 Smark 1 Total 302 217 Reject 2 Scout 324 214 Reject 2 Medic 352 242 Reject 2 Heavy 894 252 Reject 2 Spy 124 25 Reject 2 Engineer 96 47 Reject 2 Demoman 121 256 Reject 2 Pyro 134 132 Reject 2 Soldier 324 231 Reject 2 Total 2369 1399
The table goes on and on with 10 entries per player, and it goes farther to the right with more fields, I just showed deaths and kills to give you the idea of how it worked…
I think you could still do with some work on that design - there is repetition and repetition has repercussions down the line, for example if a player changes their name you have to update every row that contains their name.
I would start with something like this
One table is player info
id
name
Another table to store kills info
id
owner_id
class
kills
deaths
owner_id is a foreign key to id to the player table.