We recently looked at how the extensibility of the Nuxeo platform facilitated the use of MongoDB and GridFS for the back end storage of documents and binaries. Due to the modular nature of the platform, it’s just a matter of specifying a different implementation with an extension configuration.
Today, let’s focus on binary storage options, but more specifically, a new feature to expand on storage options further. The FilesystemBlobManager, now included in the Nuxeo Platform, allows referencing binaries outside of the repository from Nuxeo documents without the need to copy into Nuxeo storage and yet making them completely transparent to the application and user.
Architecture
Firstly, an understanding of the Nuxeo binary storage architecture would help put this in context. A Nuxeo document is a composite of metadata as well as zero or more associated binaries (as illustrated in the architecture diagram below). Focusing on the Blob (binary) storage side of the repository, the Blob Store encompasses all operations of storing a binary and returning it further downstream based on an ID. Then, as we get more specific and concrete, a Binary Manager implements a Pluggable Persistence Engine. This can be a choice of various implementations, such as, using AWS S3, an SQL DB or having built in AES Encryption on the stored file.
Read more about our architecture Nuxeo on AWS in this whitepaper.
Note the FileSystem BlobProvider in parallel with the default.
Focusing more specifically on the most commonly used default Binary Manager, which uses the file system, there are also choices in how and where on the file system the Blob is stored and retrieved. The BlobProvider implementation handles that. Normally by default, during imports or uploads, binaries are renamed to a unique ID and copied into a Nuxeo managed file system location with a simple flat structure. The binary is reference by ID alone which eliminates the need for keeping a hierarchy with directories for name conflicts. The binary is referenced from a document just by its unique hash based ID.
Use Cases
There are some cases when a copy is not desirable, so we provide another BlobProvider implementation to create documents in a Nuxeo repository without making a copy but by keeping a reference to the original associated binary. Instead of a copy and renaming into the binary store a full link is stored which references the location. Documents are created normally and all this is transparent to the user. All the previews and thumbnails access the original file and are stored in the default binary store. Users see no difference in behavior except the prevention of writing or modifying through the Nuxeo Platform as the use case dictates.
One such use case is large volumes or sizes of binaries that would take a long time to import but should still be referenced and indexed in the Nuxeo repository. For example, this can be the case with existing video content that uses an existing process of upload into a particular location. This could then be potentially edited and accessed directly for consumption outside of the Nuxeo repository with the storage medium optimized for it. Video content sometimes has such an external optimized workflow.
Another use case would be if there are existing external systems that need to read binaries directly from an existing binary location. You can index and use binaries in the Nuxeo Platform with all its capabilities, without having to potentially change the way existing systems rely on a location or organizational scheme to access binaries.
Usage
Now, let’s look at how to use FilesystemBlobProvider out of the box as well as its future expectations. There are a few different ways of importing files and creating documents in the Nuxeo Platform. The UI allows drag and drop, importing sets of files or explicitly choosing a document types, which creates documents wrapping a binary. Some of these methods are meant to be automatic and therefore must rely on the one default BlobProvider type. We need a way to explicitly create certain documents using FilesystemBlobProvider instead of default. The two methods are Bulk Import configuration and an Operation definition.
The bulk import using the Bulk Document Importer package is a server operation. It’s kicked off by a REST call or by using a UI for configuration, and having a specified Document type, batch size, source and destination. As it’s a server side process, you’d reference a location which the server can access by path. This is the best method for migrations or larger import processes.
When deciding to import/create using an FilesystemBlobProvider and reference binaries directly, we wrote a custom importer for this example to illustrate its usage. A bulk import will now use a different DocumentModelFactory implementation which creates documents with this FilesystemBlobProvider instead of default and the binaries in those documents are file system references. As you can see, it’s only for those you chose to import and your repository can include a combination of different methods of binary storage. This doesn’t have to affect the whole server.
As an addition or the alternative, it’s just as easy to implement a custom Operation which will take the source path of the binaries and create the directory structure for wrap. This can now be called by a user method with input, by a REST call, and other external applications. Those who are familiar with Nuxeo Studio or development by API will be able to take advantage of this nicely.
In either case, previews and thumbnails will be stored in the default binary store and not be added to your source location. This does not affect the standard UI import and is generally the preferred configuration because a UI import is referencing the client’s machine and the client won’t have the same path as the server does to reference binaries. In the described intended use cases you won’t likely be needing to modify the binary upon import because that would require a copy to the Nuxeo Platform for temp storage, thereby defeating the main purpose of this feature. You can build listeners to keep the thumbnails and previews synchronized to the externally referenced file if you expect modifications.
Implementation Example
The following configuration was done to implement and deploy an importer using FilesystemBlobProvider.
First, you have to register the FilesystemBlobImporter with an ID and the source root location. You can deploy multiple configurations if you have multiple mapped drives for example. The “fs” is the default ID that the example importer will use.
<extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
<blobprovider name="fs">
<class>org.nuxeo.ecm.core.blob.FilesystemBlobProvider</class>
<property name="root">/Users</property>
<property name="preventUserUpdate">true</property>
</blobprovider>
</extension>
Now you extend DocumentModelFactory and implement a new Factory, FileSystemDocumentModelFactory in this case, which creates the document. Here we will show the override method. The key area is retrieving the BlobProvider by ID as defined above.
@Override
public DocumentModel createLeafNode(CoreSession session, DocumentModel parent, SourceNode node) {
File file = node.getBlobHolder().getBlob().getFile();
BlobInfo blobInfo = new BlobInfo();
blobInfo.key = file.getAbsolutePath();
Blob blob = ((FilesystemBlobProvider)Framework.getService(BlobManager.class)
.getBlobProvider("fs")).createBlob(blobInfo);
DocumentModel doc = Framework.getLocalService(FileManager.class)
.createDocumentFromBlob(session, blob, parent.getPathAsString(), true, file.getName());
return doc;
}
Then you deploy and include an xml extension which will use this importer instead of the default one.
<extension target="org.nuxeo.ecm.platform.importer.service.DefaultImporterComponent" point="importerConfiguration">
<importerConfig sourceNodeClass ="org.nuxeo.ecm.platform.importer.source.FileSourceNode" >
<documentModelFactory documentModelFactoryClass="org.nuxeo.ecm.platform.importer.externalblob.factories.FileSystemDocumentModelFactory" />
</importerConfig>
</extension>
Now your bulk imports use the FilesystemBlobProvider
and don’t copy the source binary.
We plan on providing more of these pre-defined implementations as runtime choices and continue to increase configurability as well. Any feedback on the most common use cases and feature requests is welcome!