Nuxeo/Blogs

Product & Development / All about the Nuxeo Platform, from strategy to feature highlights to dev tricks

[Q&A Friday] How binaries are physically stored in the filesystem

without comments

How binaries are stored in Nuxeo

How are binaries stored in Nuxeo?

Today we have a question from systemz who asks where Nuxeo binaries are physically stored in the file system.

The class that takes care of binary storage is the BinaryManager. There are several implementations (BinaryManagerClient, DefaultBinaryManager, SQLBinaryManager, XORBinaryManager). There is also one for Amazon S3 on our Marketplace. It offers the option of storing binaries in an Amazon S3 bucket.

The default implementation is DefaultBinaryManager. Its goal is to compute the digest of the binary and store it on the server’s file system accordingly. Let’s say, for instance, that our file has the following digest: a75badefeb96972667306ac8f696143b

Its location will be nuxeo.data.dir/binaries/data/a7/5b/a75badefeb96972667306ac8f696143b where:

  • nuxeo.data.dir is the folder corresponding to the nuxeo.data.dir property from nuxeo.conf
  • binaries is the path defined in the binaryManager contribution
  • data is the hierarchy with the actual binaries in subdirectories
  • a7 is a folder named after the two first characters of the digest
  • 5b is a folder named after the third and fourth characters of the digest
  • a75badefeb96972667306ac8f696143b is the file named after the digest

You can configure the depth of the folders and the key algorithm used to compute the digest. The default algorithm is md5 and the default depth is 2.

To change the implementation of your BinaryManager, as usual you need to contribute to an extension point:

  <extension target="org.nuxeo.ecm.core.repository.RepositoryService"
    point="repository">
    ...
    <repository name="default" factory="...">
      <repository name="default">
        <binaryManager class="org.nuxeo.ecm.core.storage.sql.S3BinaryManager"
         path="binaries"/>
        ...
      </repository>
    </repository>
    ...
  </extension>

If you want more details about this I suggest you read the installation documentation for the Amazon S3 Online Storage.

So, as you can see, it’s really hard to identify binaries without using the Nuxeo API. I would discourage anyone from doing that. But in case you have to, here are a couple of suggestions on how to retrieve the digest computed by the binary store.

Don’t mistake this with the Blob digest metadata. This digest is different from the one computed by the binary store. It’s used by the unicity check API.

To retrieve this digest, you have to use the Binary object, available only through an SQLBlob.

Again, this is a real pain, and I really encourage you to use the usual Nuxeo API :-)

Thanks for reading, see ya’ on Monday!

May 25th, 2012 at 3:46 pm

About Laurent Doguin

Laurent works as developer and community liaison at Nuxeo, a software company providing a full Enterprise Content Management Platform, open source, for any kind of content-driven application.