Max files in single directory

I'm running some nodes with Ubuntu 10.04, 32-bit, with an ext3 file system. Is there a limit to the number of files I can have in a single directory?

I've found conflicting information on this, so I thought I'd ask here. I know the limit isn't 32K files, because I've already got 136K files in one directory. I'm trying to figure out if there is any downside, or any limitation, to letting this continue to grow…

4 Replies

No, although sometimes dealing with the directory can get cumbersome and slow. Make sure you have the "dir_index" filesystem feature enabled, particularly if ls is starting to get quite sluggish.

@gregr:

I've found conflicting information on this, so I thought I'd ask here. I know the limit isn't 32K files, because I've already got 136K files in one directory. I'm trying to figure out if there is any downside, or any limitation, to letting this continue to grow…
You may have heard about 32K since that's the limit (in ext3, ext4 increases that) on the number of sub-directories within a single directory.

There's no hard limit on files in a directory though, as long as you have free inode blocks on the filesystem. So it does depend on the initial filesystem creation of inode space, but that's usually more than enough for the practical number of files that would use up the actual data space. You can use "df -i" to see how things stand on the filesystem in question.

As for downsides, it's mostly a question of performance at very large sizes, which in turn will depend on the applications being used and whether or not they become inefficient in processing very large numbers of files.

For myself, I don't tend to let single directories get into the hundreds of thousands, even if it works. In such cases, management can become more complex (getting efficient ls output, etc…). But there's normally a pretty easy way to slice up such storage.

For example, with such large sets of files, there's usually some pattern to the naming, and if an even distribution, you can create an extra level of sub-directory using the first character or two of the filename. So a file of an arbitrary name is still trivial to locate (including the containing directory), but you divide things up into smaller chunks of files.

-- David

If you've got 136k files in a single directory, you might want to be asking yourself if that should be living in a database instead.

Awesome - thanks for the info, guys.

> There's no hard limit on files in a directory though, as long as you have free inode blocks on the filesystem. So it does depend on the initial filesystem creation of inode space, but that's usually more than enough for the practical number of files that would use up the actual data space. You can use "df -i" to see how things stand on the filesystem in question.

Running df -i shows I'm using 20% of my available inodes, while I'm 50% out of disk space, so seems like there's not a immediate danger of running out. Glad I know this now, though, so I can keep an eye on it. :-)

> As for downsides, it's mostly a question of performance at very large sizes, which in turn will depend on the applications being used and whether or not they become inefficient in processing very large numbers of files.

Makes sense. I do notice things are a little sluggish when trying to, say, auto-complete filenames from the shell, or doing a ls of some sort; however, my application never tries to list the files - it always knows the exact filename it's looking for, and performance doesn't seem to be suffering.

> For example, with such large sets of files, there's usually some pattern to the naming, and if an even distribution, you can create an extra level of sub-directory using the first character or two of the filename. So a file of an arbitrary name is still trivial to locate (including the containing directory), but you divide things up into smaller chunks of files.

Yep, that's actually what I'm planning to do, which also gives me natural partitions to move this out over multiple servers when it becomes necessary.

> If you've got 136k files in a single directory, you might want to be asking yourself if that should be living in a database instead.

It's actually done this way on purpose - it's essentially blob data, I always need an entire blob at one time, and it's not relational in any sense. In general, I've found relational databases to be the most expensive way (in terms of I/O and CPU) to store and access data like this…when in this particular case, file system access is efficient and cheap.

Thanks everyone!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct