CMIS (Content Management Interoperability Service) is primarily an interoperability API - it is designed to be the common denominator between several big ECM vendors. The objective of CMIS is not to be a sharp and efficient API tailored for a specific solution. From a developer perspective, it is definitely not fun using such an API. It’s pretty much like performing an open heart surgery with ski gloves - you might be able to make it (hopefully), but it really does not help!
At Nuxeo, we think that API does matter and it should be sharp and efficient. This is especially true when building client side applications that heavily rely on REST Calls.
In an ideal world, as a developer you want to be able to :
- easily define a custom API that will do what you need
- in a single call / network round trip
- in a single transaction
- tell the server what data you want to retrieve
- what attributes, schemas, adapters you need to have on the client side
This is the very reason why we have designed a custom tailored REST API for the Nuxeo platform. And that’s why we built Automation API. However, sometimes CMIS is the best way to go, may be because in your use cases interoperability is key, or CMIS is good enough for you or simply because you don’t have the choice.
We have several people in the Nuxeo community who are using Nuxeo through CMIS. Knowing this, our goal is to find a way to make the usage of CMIS as easy as possible and, more importantly, as non-limiting as possible. The idea here is to see how CMIS can be used to leverage a custom application based on the Nuxeo Platform. When I say “leverage”, I don’t just mean basic data access interoperability, but taking advantage of the customization that are deployed on the Nuxeo server side.
So, in short, this blog is about me trying to convince you (and myself) that we can use CMIS in the best way without making it too hard. Let’s see if I can do that!
Complex Types
In Nuxeo, we use ComplexTypes (in the XSD meaning) to define multi-level structures in the metadata. For example, a Document can have a field of type ‘Addresses’. It can be a list of Addresses which may a structure with definitions for Street name, zip code and other similar information.
From the XSD point of view, this would look like : xml
In the JSON world, an example could look like :
doc.addresses = [ { "streetNumber":"181", "streetName":"North 11th St", "zipCode":11211 }, { "streetNumber":"529", "streetName":"Court", "zipCode":11231 }]
This may look like a lot of details, but when you want to model business objects ComplexTypes are very useful.
CMIS does not support complex types : only scalars and multi-valued scalars are allowed! Several people complained about this problem - they wanted to leverage the structures they can define via Nuxeo Studio while still being able to rely on CMIS. Fortunately, one of these guys ended up being worried enough by the problem to work on this and submitted a Jira issue (NXP-14474) as well as some GitHub Pull Requests.
All of this has been merged since our release of Nuxeo Platform 7.1 and the complex properties can now be exposed as JSON encoded String :
- From the CMIS point of view properties are still scalar : so we don’t break anything
- From the client application point of view properties can easily be parsed to be used as complex structures
This works for both reading and writing, and once you have enabled the cmisComplexProperties mode (org.nuxeo.cmis.enableComplexProperties=true), you can work with Nuxeo using CMIS, even if your content model contains complex structures.
Check out some code usage examples here in the TestCmisBindingComplexProperties.java test file (both for reading and writing).
Allow Access to Multiple Streams
Another limitation of the current CMIS model is that documents can only have one stream.
Inside Nuxeo, we don’t have this kind of limitation because we think that in a lot of cases it makes sense to have a document with multiple Blobs.
Let’s put it another way : would you be happy if Gmail had a hard limit on one attachment? Probably not! We believe this is also true for a lot of other types of Content.
However, the CMIS model contains the notion of Renditions. Renditions are supposed to be “different views of the same content” like a thumbnail view or a preview. But the technical definition is generic enough and we can use it for giving access to whatever we want.
Since the Nuxeo Rendition system is pluggable, you can easily :
- Configure renditions that can be automatically computed by the system
- Different views of a picture
- Different formats of a video
- Thumbnail of your document
- Contribute new renditions
- Based on templates (using the template-rendition addon)
- Based on Automation Chains
- Based on custom code
The Renditions that you define on the Nuxeo side will be accessible via CMIS. This gives you a great opportunity to overcome the “one doc / one stream” limitation.
For example, let’s say you define an alternate.xsd schema that contains an additional blob field and that you decide to use it for your DocWith2Blobs Document type :
<xs:schema targetnamespace="http://www.nuxeo.org/ecm/schemas/alternate/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:nxs="http://www.nuxeo.org/ecm/schemas/alternate/"> </xs:schema>
<extension target="org.nuxeo.ecm.core.schema.TypeService" point="schema"> <schema name="alternate" prefix="alt" src="schemas/alternate.xsd"></schema></extension>
<extension target="org.nuxeo.ecm.core.schema.TypeService" point="doctype"><doctype name="DocWith2Blobs" extends="File"><schema name="alternate"></schema></doctype></extension>
Now that we have a Document Type with several streams, let’s see how we can define an Rendition for making the alternate Stream accessible to CMIS. All you have to do is define a new Automation Chain that will extract the Blob from the input Document (see YAML representation below) :
- Document.Pop - Blob.Get: xpath: "alternate:secondaryContent"
Then you can declare the new Rendition :
<renditiondefinition name="alternate" enable="true"><operationchain>GetAlternateBlob</operationchain></renditiondefinition>
You can then use the alternate rendition to access the secondary Blob from CMIS.
A complete sample code can be found in the nuxeo-renditions-cmis-sample GitHub repository.
Custom Extractions and Conversions
The Nuxeo Platform provides several solutions to convert or render your content :
- Nuxeo Conversion Service
- Office, pictures, video, PDF, etc. conversions
- Nuxeo Template Rendering
- Render a document against FreeMarker, an MS Word Template, or Excel spreadsheet
You can even build Automation Chains that will assemble conversions, template rendering and other operations to generate or render content.
But to access this content via CMIS, you need a way to make it visible in a CMIS compliant way.
To do this, we can again use Renditions :
- Use Automation to build your rendering / conversion logic
- Associate your chain to a rendition
- Access your conversion via CMIS
Async Renditions
All of what we discussed above is perfect as long as your rendition computation is fast enough. The problem with Renditions is that the CMIS API is synchronous : the server is supposed to give you the rendition immediately.
This can be an issue if, for example, your rendition needs a long time to be generated as in the following cases:
- Video or large picture conversion
- Complex computation on all the subtree
- Call to an external service
In these cases, you definitely don’t want to have the client do a synchronous call because:
- It is bad for the client
- The client may end up timing out
- The client application will be stalled
- It is bad for your Nuxeo Server
- Starting a long running transaction
- Consuming HTTP resources for no valid reasons
This issue has already been raised in CMIS-883. The idea is to compute the rendition asynchronously and return something to the client telling that the rendition is not available yet. Unlike what is explained in CMIS-883, returning a 0 length Rendition does not work well since in most bindings, the client side always receives -1 as length (Stream is lazy loaded).
So, we used a little trick - we returned a MIME type containing an additional parameter (;empty=true) when the rendition is not yet available. The asynchronous processing on the Nuxeo side relies on the WorkManager and distributed caching using Redis.
The sample code is available in nuxeo-renditions-cmis-sample. Check it out!
So, at the end of the day, how was this journey in CMIS territory ?
It was certainly better that what I expected! Since it is trying to support all vendors, CMIS is probably not the sharpest tool but it does the job and with some tricks you can leverage the Nuxeo Platform specific features and extensibility fairly easily.
NB : Note that you can also leverage the Nuxeo Unit Test FeatureRunner system to deploy Nuxeo Services, CMIS Connectors and custom configuration in just a few lines!
To be honest, I would still prefer to use the native Automation REST API, but to reuse the analogy I used at the beginning, I have to admit that the patient is doing surprisingly well and this makes me hopeful for the recently started CMIS4DAM initiative!