Hybrid File Storage Policies with the Nuxeo Platform


Fri 19 June 2015 By Alain Escaffre

Nuxeo Platform 7.3 release is right around the corner! Like always, we will introduce several new features and enhancements aimed at increasing the business value of content managed by the Nuxeo Platform. One such feature is the File Storage Dispatcher - a new component that determines where a file should be stored and calls the subsequent and appropriate binary manager. Until now, it was possible to configure only one binary manager per repository, but with Nuxeo Platform 7.3, it will be possible to use multiple binary managers!

Let’s take a look at some typical use cases for the File Storage Dispatcher to understand why it will be very useful.


  • Cost optimisation on storage hardware (HSM):


Usually, the entire content in a repository doesn’t have a uniform usage. Some content may be used frequently and others archived (stored in your repository forever and seldom used). To make the best use of your storage resources, you might want to use different storage backend solutions for your content - active content stored on a robust and reliable system with fast network connection (Fibre Channel) and archived content on a NAS with a lower level redundancy infrastructure. This is also known as Hierarchical Storage Management (HSM).


  • Security constraints:


We are getting into the habit of storing content on the cloud for good reasons - out of the box content volume scalability, low storage prices, and very less time spent on maintenance. Yet for some legal or security reasons you may have to keep hosting the storage solution for part of your content and you don’t want the user to be aware of this, it has to be transparent to the business process.


  • Network optimizations / direct access to content:


Sometimes, you may want to store your content separately depending on its nature. For instance, video files may be big (multiple gigabytes for a single file) and require a different storage place (reachable from your streaming server for example).

How It works


Our binary store is now able to act as a dispatcher among multiple blob providers. With a simple configuration the dispatcher leverages custom and system properties of the document to express the conditions for dispatching in one or the other blob provider.

For instance, you can do this:

<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
<blobprovider name="videos">
<class>org.nuxeo.ecm.core.blob.binary.DefaultBinaryManager</class>
<property name="path">binaries-videos</property>
</blobprovider>
<blobdispatcher>
<class>org.nuxeo.ecm.core.blob.DefaultBlobDispatcher</class>
<property name="dc:format=video">videos</property>
<property name="blob:mime-type=video/mp4">videos</property>
<property name="default">default</property>
</blobdispatcher>
</extension>

This will store any blob for a document with dc:format=video or a blob with MIME type video/mp4 to a separate binary manager which is configured to store it to a separate filesystem path.

If you want to mix local file system storage and cloud storage, you can use the S3 connector.

<?xml version="1.0"?>
<component name="org.nuxeo.ecm.platform.storage" version="1.0">
<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
<blobprovider name="AWS">
<class>org.nuxeo.ecm.core.storage.sql.S3BinaryManager</class>
</blobprovider>
<blobprovider name="localEncrypted">
<class>org.nuxeo.ecm.core.blob.binary.DefaultBinaryManager</class>
</blobprovider>
<blobdispatcher>
<class>org.nuxeo.ecm.core.blob.DefaultBlobDispatcher</class>
<property name="dc:format=video">localEncrypted</property>
<property name="blob:mime-type=video/mp4">localEncrypted</property>
<property name="default">AWS</property>
</blobdispatcher>
</extension>
</component>

S3 and encryptions parameters are to be configured on the nuxeo.conf file. As stated in the javadoc of DefaultBlobDispatcher the “name" attribute of the “property" element is a list of comma-separated clauses with each clause consisting of a property, an operator, and a value. The property can be a document xpath, ecm:repositoryName, or blob:name, blob:mime-type, blob:encoding, blob:digest or blob:length to match the current blob being dispatched. Comma-separated clauses are ANDed together. The special name default defines the default provider and must be present. Available operators between property and value are =, !=, <, and >.

Your dispatching rule can thus be based on a metadata of the document like “confidentiality”, a lifecycle state, or a mix of both depending on the nature (mime-type) or size of the blob. In case of any changes, a listener will move the content from one provider to the other. It is also possible to write a custom BlobDispatcher and have your own dispatching logic implemented in Java.

We are very excited about this new capability of the Nuxeo Platform content repository and I am sure you will be too! In my next blog, I will talk about centralised download - another new feature that we are implementing around file storage. So stay tuned!

 


Category: Product & Development
Tagged: Features, Insight, Nuxeo Platform