Using Object or Block Storage for File Searches / Copying?
I'm trying to understand if I can use Linode Object or Block storage for a very specific use case . . . I have tried looking through online documentation, including the Object Storage ebook, but am still not sure whether what I'm hoping to do is possible.
I have millions of pdf/doc/xls/etc files I am storing on a server. Right now, when I want to access certain files, I feed a list of partial file names into a Python script, which then finds the files and copies them into a new directory (e.g., "Search_results"). I then copy "Search_results" to my local machine so I can easily search and browse the file contents manually.
Would something like this be possible if I were to move all of these millions of files into Object storage? Basically, I need to be able to fetch specific lists of files, copy them into their own directory, and either download that directory or be able to search and browse the contents of that directory online. It seems like Object Storage is good for storing unstructured data and for accessing one file at a time, but I have no idea if I would be able to conduct file operations as described above. Any advice welcome!
1 Reply
✓ Best Answer
Hey there,
I'd say Object Storage might be the best solution for you here. It fits your needs as far as the massive amount of data you are working with, as you can have up to 50 million objects per data center and up to 50 TB of total storage. Also given the fact that the files are unstructured, meaning you're not in need of the filesystem structure provided by Block Storage.
It looks like the one caveat that makes Block Storage more appealing in this case is the ability to fetch lists of files rather than one file at a time. That said, I found a related Community Site post that gives some instruction on how to mount an s3 bucket onto a Linode using fuse and s3fs:
How do I use s3fs on Linode Object Storage?
This way you'd be able to read Object Storage files on your system similarly to a normal directory. My impression given this other Community Site post is that wildcard matching would be supported in this case. I'm interested to hear how it would work out if you decided to test it, even with a smaller fraction of your files first.