thousands of files in one directory
I was thinking of creating multiple buckets to distribute the files based on hashing their id. But then, how many buckets do I need? Sub-buckets?
ReiserFS and ext3 both support b-tree searches. I read that ext3 supports around 10**20 files per directory, but I couldn't find any data for ReiserFS.
Anybody have experience doing this kind of thing?
5 Replies
Also,might be worth looking at pastebin's sourcecode (
@funkytastic:
Anybody have experience doing this kind of thing?
I'd avoid extremes such as that. Even if the filesystem technically supports that many files in a single directory, various admin tools you may wish to use when working with that tree are likely to bog down, sometimes severely.
I'd certainly suggest sharding the set of files among one or more levels depending on your expected scale. If you're in control of the filenames (say assigning uuids or something), just create a few levels based on initial characters. For example, with a uuid scheme, using 2-character directories (00-ff) with 2 levels you can support a million files with an average leaf directory size of about 16, assuming even uuid distribution.
If you're only going to be in low hundreds of thousands, a single level of directories would still average only ~400 files in each leaf node per hundred thousand.
If you don't have control over the filenames, you may want to hash the filename and then use characters from the hash since otherwise common naming patterns could significantly skew the tree.
– David
for scalability of mass image hosting, you would be better served to push your images to amazon s3 or rackspace cloudfiles.
its going to be cheaper for you in the long run when it comes to raw file storage (but potentially more for actual bandwidth) and rackspace has a CDN built in, with no extra costs bandwidth from cloud files to the cdn edge like amazon does.
amazon has better access controls.
this would be the more scalable way to do this, and the infrastructure is already there, you don't have to reinvent it.