CMIS: A Tale of Versioning
I’ve been working on a lot of CMIS content recently and made a bit of a shocking discovery. I wasn’t shocked insomuch as I’d found some amazing feature but, rather, that I had not yet been bitten (badly) by my lack of understanding! So I write this blog in part to coalesce my thoughts and in part to save others from potential future headaches.
This blog is all about the concept of “versioning” and what it means to Nuxeo vs what it means to CMIS. Nuxeo has a certain way of handling versioning that is compatible with CMIS, but can lead to confusion if you don’t know what’s going in.
The Problem: Where are my documents?!
Imagine you have a document type named
My_Document_Type. You have created a bunch of new documents of this type. Imagine you run a CMIS query like this:
SELECT * FROM My_DocumentType
Tip: In CMIS each Nuxeo document type becomes a virtual table that you can query from.
Tip: CMISQL does not allow dashes (“-”) in table names. Underscore (“”) works fine.
This query will most likely return ZERO results. Zip. Zilch. Nada! On the other hand if you happen to know the ID of one of these documents, you can use the CMIS
Object services command
getObject to retrieve the document without issue. So what’s going on here?
(I know I said “most likely" return zero results; I’ll explain when this is not the case as well.)
What’s going on?
By default any new document created in Nuxeo has the version “0.0”. In fact, it’s not a version at all. Nuxeo’s documentation is quite clear in its definition of a version:
An immutable, archived version of a document.
It goes on to say that versions are created when a document is “checked in”. But these versioning concepts (check out, check in, working copy) are easily forgotten when working with the Nuxeo Application because they’re hidden behind a nice UI. This is the key!
What you see in the Nuxeo Application when working with documents is a “working copy”; it’s often called the “Live Document” as well. It is NOT a version! So after creating a batch of fresh new, “0.0” documents this is why they’re not retrieved via a CMIS query.
CMIS comes from a world that is focused around archived/versioned content. So a query only returns results that are versions (in fact only the “latest version”) by default.
Note: if you’ve already read the Nuxeo CMIS documentation you might be ready to point out that, “Hey Josh, there’s a ‘searchAllVersions’ option for CMIS queries..."; let me stop you right there because the story gets more complex if you use that option, but I'll cover that too.
What’s the latest version?
The next step in my investigation of CMIS versioning was to make sure my documents had the
Versionablefacet so that I could...well...version them!
Ok I’m lying; the next step was actually to enable the
Folderish facet because I’d already built an example where Folderish documents work fine with a CMIS query. I discovered that Folderish documents are indeed returned by a CMIS query without issue. Why? Because in Nuxeo (by default) Folderish documents are not versioned at all. They fall outside this concept of versioning. So in this case the query will return the desired documents even when they're brand new/"unversioned".
Back to the current investigation, I made sure
Versionable and set about investigating what happens when I create versions of a document as well as if there are changes that are not checked in. For example say I have a document that is currently version 2.1:
I can use Nuxeo's "Export as XML" feature to get the id of this document:
The name of the document is "This document has many versions" (creative, I know). Then I used the Apache Chemistry CMIS Workbench to query my server:
cmis:name LIKE '%versions%'
And the result:
As you can see the
cmis:objectId is completely different from the one I retrieved via Nuxeo. The id obtained from Nuxeo is the id of the Live Document. The id returned via CMIS is the id of the version of that document called "2.1". It just so happens that the Live Document and version 2.1 have the same content. What happens if they don't?
Tip: Remember that you can view and compare the different versions of a document via the History tab in Nuxeo.
So I modified "This document has many versions" without incrementing the version. In Nuxeo this manifests itself as a "+" after the version label:
Running the CMIS query again returns the same result. CMIS doesn't really care what's going on with the Live Document, it only deals with versions.
But I really want the Live Document!
The fact is, as a Nuxeo developer, I expect to have access to the Live Document. There are two ways to achieve this via CMIS.
Object servicesto "browse" the repository. For example the
getObjectprovides the ability to retrieve a document given its id. If I use the id of the Live Document, it works just fine, I will get the content from the Live Document and I can even update it.
On the Nuxeo side, if I don't use anything from CMIS
Versioning services, the modified Live Document will be denoted by the "+" as expected. Or I can use CMIS versioning to increment both the major and minor versions as needed.
Query with searchAllVersions
CMIS includes a query option called
searchAllVersionsthat causes a query to return any version of the found documents (in Nuxeo's case, this includes the Live Documents). This option may be used with the following understanding:
The versions of the document should never be modified, only the Live Document.
EVERY version of the document that has ever existed will be returned in the query results, in addition to the Live Document. Here are the results of the previous query with
You probably don't want every version of the document that has ever existed, so the query needs to be filtered. There are some Nuxeo-specific CMIS System Properties that can be used identify the Live Document (see next section). But this means the CMIS client is no longer generic and will not work against all vendors. To be clear this is not a "bad thing". It's just something important to accept and be aware of. CMIS cannot provide a generic solution for every problem and still be a meaningful standard.
Nuxeo CMIS System Properties
Nuxeo exposes several custom CMIS System Properties documented here. Of particular interest in this context:
nuxeo:isVersion: used to distinguish between versions of a document and the Live Document;
falsefor the Live Document.
nuxeo:isCheckedIn: for Live Documents, this indicates whether or not the document is "checked out" in Nuxeo. Use this to avoid conflicts.
Here is an updated query to get the checked-in, Live Document for "This document has many versions":
nuxeo:isVersion = false
nuxeo:isCheckedIn = true
cmis:name LIKE '%versions%'
Tip: The properties
nuxeo:isVersion, nuxeo:isCheckedIn, cmis:isLatestMajorVersion, cmis:isLatestVersion,and
cmis:versionLabeldo not exist for Folderish documents. On the other hand these properties do exist for a document type that is NOT Versionable.
What are the takeaways?
So what did I learn about versioning in CMIS vs Nuxeo?
One important step to avoid potential confusion is to make sure that you version everything. For me this was an unintentional solution, I implemented it just because I consider it a best practice, and later discovered it was part of the reason why my examples worked.
Document modified by a CMIS user? Make a version. Document modified by a Nuxeo user? Make a version. This kind of thing is easy to manage with Nuxeo Events and Automation and is a really good way to do things anyway even without the involvement of CMIS.
The point here is that you guarantee the latest version is in sync with the Live Document. The Live Document is never checked out. It solves the problem of read access via CMIS.
How to Update Content
If you want to update content via CMIS, you have two choices:
- Use the "browsing" services like
Object, which work with paths or id's, thus giving you the Live Documents.
searchAllVersionsoption, and the Nuxeo CMIS System Properties to filter the results.
Either solution leads to accessing the Live Documents and thus being able to make changes.
How you deal with the behavior of versioning when using CMIS really depends on the use case.
If the CMIS client only needs read access, and you manage versions where appropriate (remember, no “0.0” documents), you can safely use CMIS queries with no special tweaks.
If the CMIS client should only ever deal with versioned content anyway, and this can be quite common, then Nuxeo's handling of Live Documents is not relevant. It may not often be the case that you want an external system accessing unversioned content in Nuxeo.
It gets tricky if you want to update content via CMIS but hopefully you now have a better idea of how to handle that as well.
Category: Product & Development