There have been questions about JCR vs CMIS recently (Is JCR Dead?), and I thought I’d expand a bit on why we at Nuxeo decided to drop support for JCR.
Nuxeo used JCR in the past with great success; we’ve been on the JSR-170 and JSR-283 Expert Group and it is a very nice spec. Several versions of our Nuxeo EP framework used Jackrabbit as a content store.
However for some time we’ve decided to not rely on it solely, and for more than two years, since Nuxeo 5.1.6, we’ve had an alternative storage engine, the Nuxeo Visible Content Store (VCS) (http://doc.nuxeo.com/display/NXDOC/VCS+Architecture). Since Nuxeo 5.4, VCS is the only content store, and it’s optimized for all the things we require of it.
The reason why we wanted an alternative storage, and why we now rely exclusively on it instead of maintaining costly compatibility with a second storage engine, is that Jackrabbit had a number of limitations:
- Opacity of the in-database storage. We wanted data stored in a SQL database to be real SQL tables with visible data. This is helpful for many things, be it imports, backups, debugging, etc. While the goal of JCR to be the “SQL of content” is noble, the reality is that all our customers want data to be in a SQL database, not in something they don’t know. We had had previously the same problem with Zope and its ZODB btw. Serializing Java objects in database columns is really not our idea of a clean storage.
- Limitations of the relational model. We wanted to be able to express arbitrary relational queries between objects, and the Jackrabbit query implementation, based on Lucene, really didn’t allow for that, or not with adequate performance (see point 3. below also). There was a lot of work trying to optimize that in Jackrabbit, but I only saw it as a futile exercise in reinventing SQL JOINs and the tens of years of work behind current implementations of the extremely intelligent planners and optimizers of successful SQL database engines.
- Performance. There were also unresolved performance problems with nodes containing a huge number of children, inherent in Jackrabbit’s way of storing children information. I haven’t kept up with the latest Jackrabbit releases but at the time this was killing us, and there was no simple fix available.
- Over-strict versioning model. The JCR versioning model is fixed by the spec, and does not allow for instance for small changes of metadata on versions that were very useful to us, like updates to the number of readers of a doc.
- Needless dynamicity. An important credo of JCR is that you can have “free-form” content, and can add or remove properties to nodes anytime. While this is in theory a great idea, and may appeal to the WCM-inclined, it also has a large performance impact, and is really not something that any of our customers want in their databases. Customers want well-defined and fixed schema. DBAs want to know beforehand what the tables will look like. Now, Nuxeo VCS can of course add or remove fields in its schemas, but it’s an administrative step, and is not done in the normal course of an application’s life.
But the cinch really was the first point, a proper SQL representation. At the lower level the VCS architecture really resembles the JCR concepts of nodes and properties, but it stores them cleanly in tables (see the above URL for a description). For a time we considered overlaying a JCR API on top of it, but that has not proven necessary because there was little customer demand for it, and because we have a higher-level document abstraction that hides this anyway.
Now that CMIS is getting recognition, we believe it’s a better way to expose a document store than the Java Content Repository API. We are on the CMIS Technical Committee, and we now have a CMIS interface on top of Nuxeo (http://doc.nuxeo.com/display/NXDOC/CMIS+for+Nuxeo), about which we’re very happy.