Home > Blog > Why Nuxeo Dropped JCR

Why Nuxeo Dropped JCR

There have been questions about JCR vs CMIS recently (Is JCR Dead?), and I thought I'd expand a bit on why we at Nuxeo decided to drop support for JCR.

Nuxeo used JCR in the past with great success; we've been on the JSR-170 and JSR-283 Expert Group and it is a very nice spec. Several versions of our Nuxeo EP framework used Jackrabbit as a content store.

However for some time we've decided to not rely on it solely, and for more than two years, since Nuxeo 5.1.6, we've had an alternative storage engine, the Nuxeo Visible Content Store (VCS) (http://doc.nuxeo.com/display/NXDOC/VCS+Architecture). Since Nuxeo 5.4 VCS is the only content store, and it's optimized for all the things we require of it.

The reason why we wanted an alternative storage, and why we now rely exclusively on it instead of maintaining costly compatibility with a second storage engine, is that Jackrabbit had a number of limitations:

  1. Opacity of the in-database storage. We wanted data stored in a SQL database to be real SQL tables with visible data. This is helpful for many things, be it imports, backups, debugging, etc. While the goal of JCR to be the "SQL of content" is noble, the reality is that all our customers want data to be in a SQL database, not in something they don't know. We had had previously the same problem with Zope and its ZODB btw. Serializing Java objects in database columns is really not our idea of a clean storage.
  2. Limitations of the relational model. We wanted to be able to express arbitrary relational queries between objects, and the Jackrabbit query implementation, based on Lucene, really didn't allow for that, or not with adequate performance (see point 3. below also). There was a lot of work trying to optimize that in Jackrabbit, but I only saw it as a futile exercise in reinventing SQL JOINs and the tens of years of work behind current implementations of the extremely intelligent planners and optimizers of successful SQL database engines.
  3. Performance. There were also unresolved performance problems with nodes containing a huge number of children, inherent in Jackrabbit's way of storing children information. I haven't kept up with the latest Jackrabbit releases but at the time this was killing us, and there was no simple fix available.
  4. Over-strict versioning model. The JCR versioning model is fixed by the spec, and does not allow for instance for small changes of metadata on versions that were very useful to us, like updates to the number of readers of a doc.
  5. Needless dynamicity. An important credo of JCR is that you can have "free-form" content, and can add or remove properties to nodes anytime. While this is in theory a great idea, and may appeal to the WCM-inclined, it also has a large performance impact, and is really not something that any of our customers want in their databases. Customers want well-defined and fixed schema. DBAs want to know beforehand what the tables will look like. Now, Nuxeo VCS can of course add or remove fields in its schemas, but it's an administrative step, and is not done in the normal course of an application's life.

But the cinch really was the first point, a proper SQL representation. At the lower level the VCS architecture really resembles the JCR concepts of nodes and properties, but it stores them cleanly in tables (see the above URL for a description). For a time we considered overlaying a JCR API on top of it, but that has not proven necessary because there was little customer demand for it, and because we have a higher-level document abstraction that hides this anyway.

Now that CMIS is getting recognition, we believe it's a better way to expose a document store than the Java Content Repository API. We are on the CMIS Technical Committee, and we now have a CMIS interface on top of Nuxeo (http://doc.nuxeo.com/display/NXDOC/CMIS+for+Nuxeo), about which we're very happy.

5 thoughts on “Why Nuxeo Dropped JCR

  1. This is likely the best response opposing Jackrabbit/JCR to date.
    However, I have a few questions since this is a technical response.
    1) “our customers want data to be in a SQL database, not in something they don’t know” Misleading. You can persist to a SQL database with Jackrabbit. In addition, you can place metadata in a DB and the “document” in the file system.
    2) How are import/export, backup, and debugging operations easier with a database? I actually find them to be easier with Jackrabbit.
    3) Did you consider developing a new persistence manager? I find this to be one the best features. Day took advantage of this by creating the TarPM.
    4) Does VCS support pluggable persistence layers? For example, I use Jackrabbit’s in memory for unit testing, the file system for local deployment, and the db for client environments.
    5) What kind of relationships can be created in a DB that can’t be created in Jackrabbit/JCR? How does the join support not satisfy your needs?
    6) I agree with the performance with respect to numerous child nodes. However, I found that you generally don’t need that many. It is usually a taxonomy problem. Do you have a good example of when that many would be needed?
    7) I think I see your point with respect to the versioning.
    8) This is fairly misleading. You can create well defined node types in the same manner that you design a schema for a DB. Similar to my point in #1, only metadata is visible in a DB anyways. If your documents are there, they are still in a CLOB/BLOB.
    9) What is the performance impact of schemaless persistence? Not too mention the documents themselves are unstructured. To be honest, most user search is ‘full text’ search. Lucene is far better for facets and full text search.
    10) I can’t see how real time WCM would be possible with a fixed DB schema. Want to add an image to a product page? Let’s see if we can get approval from the DBA…
    One last note, I’m still not sold on why SQL is so important to you guys. Can you expand on that? The ability to access the content in SQL may be easy but it also allows you to bypass all the CMS features such as versioning and security.
    Finally, I’m surprised there was little customer demand for it. In my experience, once your CMS plays a role in the broader enterprise environment integration becomes key. Integration is made possible by standards. For example, I’ve used JCR observation for integration with Jackrabbit, an ESB, and Amazon. Can’t do that with CMIS. In addition, you lose opportunities for reuse by not using standards. Back in my WCM days I had created a number of utilities and libraries that I brought with me on projects where JCR was used.

  2. 1) You can persist data in SQL with Jackrabbit but have you looked at the content of the tables? Serialized Java objects are really an abuse of a SQL database. Contrast this with the clean tables we have: http://doc.nuxeo.com/display/NXDOC/VCS+Architecture .
    We also place documents in the filesystem in Nuxeo (in a manner similar to the FileDataStore in modern Jackrabbit), or in the DB if preferred by the customer.
    2) Import/export/backup/debugging are easier because they are already well-known operation for a SQL DBA, which is what customers have.
    3) I considered for a long time writing a new persistence manager but in the end it wasn’t adequate for us. PersistenceManager uses too fine-grained concepts (an Item is too small) and reassembling them into SQL-level notions would be costly.
    Also in the Jackrabbit architecture there is a high price you way when dealing with the transient layer, which is a notion mostly irrelevant for us. The transient state between a SQL transaction begin and end is enough at the storage level.
    4) Yes VCS is pluggable. First, the SQL backend of VCS is pluggable to different databases (a single Dialect base class abstracts them along with some stored procedures: http://www.nuxeo.org/api/nuxeo/5.4/javadoc/org/nuxeo/ecm/core/storage/sql/jdbc/dialect/Dialect.html ). We use H2 for unit testing, which has an in-memory option.
    There’s also an internal “Mapper” SPI inside VCS that abstracts the storage: http://www.nuxeo.org/api/nuxeo/5.4/javadoc/org/nuxeo/ecm/core/storage/sql/Mapper.html . A Mapper deals with concepts of tables and rows. We already have a Mapper implementation that does remote HTTP calls to another server (MapperClient). I’ve been thinking for some time about doing a NOSQL-based implementation of it but didn’t yet find resources for that.
    5) The problem is not creating relationships, it’s having efficient JOIN queries on them, with an efficient query optimizer and planner. I’m sorry but I don’t believe in Jackrabbit being able to reimplement on top of Lucene the staggering amount of work that went into PostgreSQL’s or Oracle’s or SQL Server’s query optimizer.
    6) There are workarounds for the Jackrabbit many-children problem, but they all involve doing in your application what should be hidden in the storage engine. Pushing the blame on the application is not a solution, you’ll always have customers that import all their documents in a single folder and wonder why it’s slow when it doesn’t have to be.
    8) Yes you can set in stone your JCR schemas, but the Jackrabbit code is still designed for the case where you don’t, and this has its price, especially on the architecture of the low-level storage. (I’m not arguing about BLOBs here, we have the same solution as Jackrabbit.)
    9) Schemaless persistence in the end means that you’re not compatible with SQL concepts and must therefore reinvent a lot of what it does (storage, indexing, queries, JOINs…). About the searches being mostly fulltext I beg to differ, that’s not what we observe in the ECM world, but maybe that’s true of WCM. For fulltext VCS can reuse the database’s native fulltext engine, which may not be as flexible as Lucene but integrates with the rest of the relational model. SOLR on top of Nuxeo is still an option of course.
    10) WCM is not our focus. But still, adding an image to a product page is just adding a relation between a page document and an image document. No need to change the schema for that.
    SQL is important to us because it’s the de facto storage for everything we see in our customers’ world, and, again, it comes with such breadth of optimization and experience that cannot be reinvented in a few years. Being able to bypass the high-level features of the CMS is not a problem, first that’s something that only a DBA will be able to do, and second, while it’s harder it’s still possible in Jackrabbit.
    For general integration we want to go beyond Java APIs. CMIS gives high-level standard connectors. Otherwise for specific problems it’s very easy in Nuxeo to code a simple service and expose it with REST. Customers are ok with REST or JAX-WS APIs, and rarely ask us about JCR, and when we tell them that for instance their .NET application will have to speak a non-JCR-based protocol anyway they agree that their is no point in using JCR for heterogenous integration.

  3. I think it should be clarified that this is about Jackrabbit, not JCR. JCR can be mapped nicely to SQL tables, and the jcr2spi offers a nice shortcut abstracting out big parts of the JCR API, such as transient storage.

  4. Makes sense to me. The “mapper” SPI in particular sounds great. I think now it looks more like a JCR vs DB/SQL debate as opposed to a JCR vs CMIS debate. Along with the old remote service vs. internal component/module debate. I can understand why some folks would prefer a DB and SQL. Though I wonder how well that preference will hold up against the onslaught of NOSQL options. Who knows. Good times.

  5. Is JCR Dead? So What If It Is?
    JCR is an engineering standard that’s been around for a number of years. It’s low-level. It’s used by developers to build complex applications, usually on top of a content repository. Content management applications – like Hippo CMS – have been using JCR for years.
    And, that’s the point. JCR is a standard for developers. It’s not a standard that will help you reach out to your audience. It’s not a publishing standard in a format such that your visitors can consume it on their laptops, mobile phones or whatever device they want to use. It’s not a standard that’s going to help your Web content be “context-aware”.
    This is what REST API’s and other Web standards are for. Our view at Hippo is that we do JCR for CMS Developers, and Web and REST for your audience.
    There is no “Holy Grail” of standards, or one standard to “rule them all”. It’s clear to us that the evolving world of standards moves continuously and we believe that a WCMS needs to be flexible and open enough to move along side them. And, because standards evolve, they have a life of their own. They come to life, evolve and eventually they die. It’s a very healthy process. So, yes, JCR evolves and will one day die. Just like CMIS will evolve – and one day die.
    But the death of a standard doesn’t matter to our customers. Our customers want interoperability. Our customers want our systems to be able to communicate with each other and to be able to export content out of a CMS whenever they want to replace it with another system. Standards are good. They give you that interoperability. But open standards are like languages. It really doesn’t matter that they evolve. All that matters is that we can communicate.
    If you’re deciding on a content management system, you shouldn’t need to worry about the life and death of standards. What matters most is that you want a CMS that serves your audience. For that you need to look further than just whether any vendor slaps the JCR or CMIS sticker on their repository. Look for a CMS that has openness built into its DNA. A CMS should breathe open standards. Look for a CMS that thinks in terms of your audience and how to deliver content in the way they want to consume it – not how the developer decided it should be consumed. At the end of the day, an effective CMS not only stores content in a way that empowers the business to manage it; it delivers the content in a way that empowers the person at the very end of a CMS – the audience – to consume it.
    Arje Cahn, CTO, Hippo

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>