[Q&A Friday] How binaries are physically stored in the filesystem
Today we have a question from systemz who asks where Nuxeo binaries are physically stored in the file system.
The class that takes care of binary storage is the BinaryManager. There are several implementations (BinaryManagerClient, DefaultBinaryManager, SQLBinaryManager, XORBinaryManager). There is also one for Amazon S3 on our Marketplace. It offers the option of storing binaries in an Amazon S3 bucket.
The default implementation is DefaultBinaryManager. Its goal is to compute the digest of the binary and store it on the server's file system accordingly. Let's say, for instance, that our file has the following digest: a75badefeb96972667306ac8f696143b
Its location will be nuxeo.data.dir/binaries/data/a7/5b/a75badefeb96972667306ac8f696143b where:
- nuxeo.data.dir is the folder corresponding to the nuxeo.data.dir property from nuxeo.conf
- binaries is the path defined in the binaryManager contribution
- data is the hierarchy with the actual binaries in subdirectories
- a7 is a folder named after the two first characters of the digest
- 5b is a folder named after the third and fourth characters of the digest
- a75badefeb96972667306ac8f696143b is the file named after the digest
You can configure the depth of the folders and the key algorithm used to compute the digest. The default algorithm is md5 and the default depth is 2.
To change the implementation of your BinaryManager, as usual you need to contribute to an extension point:
<repository name="default" factory="...">
If you want more details about this I suggest you read the installation documentation for the Amazon S3 Online Storage.
So, as you can see, it's really hard to identify binaries without using the Nuxeo API. I would discourage anyone from doing that. But in case you have to, here are a couple of suggestions on how to retrieve the digest computed by the binary store.
Don't mistake this with the Blob digest metadata. This digest is different from the one computed by the binary store. It's used by the unicity check API.
Again, this is a real pain, and I really encourage you to use the usual Nuxeo API :-)
Thanks for reading, see ya' on Monday!
Category: Product & Development