By now, most of you should have heard about CMIS, the upcoming specification that promises interoperability between many systems for common content management tasks. The CMIS specification is being driven by an OASIS Technical Committee and is currently still a draft; it is expected to be finalized late 2009 or early 2010.

I won't detail here all that CMIS will bring, this has been covered extensively already and will be even more in the future... No, the purpose of this article is to present Chemistry.


Chemistry is a new Apache project for CMIS that started incubating recently ("incubation" is the term used in the Apache Software Foundation for young projects that still have to prove themselves). Chemistry's goal is to provide general purposes libraries for interaction using CMIS between a server and a client. These libraries are mainly written in Java, but some JavaScript code has been added as well, and we're open to more.

Chemistry provides a high level API so that a developer can manipulate objects like documents or folders and can call simple methods on them without having to deal with details of a specific low-level communication transport. In addition to that, Chemistry also provides a SPI (Service Provider Interface) for backend developers, making it quite easy to use Chemistry to store documents in a project-specific manner.

Underlying this, Chemistry has implementations for the CMIS transports. CMIS specifies two mandatory transport protocol bindings (one extending AtomPub, for a lightweight RESTful HTTP interface, and another using SOAP for a WebService-based interface), and Chemistry will support both — and probably more in the future.

The current Chemistry code base has an initial version of the API/SPI together with some actual implementations around the AtomPub protocol. Already Chemistry can talk to itself (AtomPub client talking to AtomPub server) and store data in-memory (which is very handy for unit tests). Outside of the Apache code base, Nuxeo has also coded a backend to provide access to Nuxeo 5.2 repositories using Chemistry. Generic CMIS AtomPub clients like CMIS Explorer are able to see a Nuxeo repository through Chemistry for instance.

Chemistry Modules

The following modules will be available in Chemistry:

  • The APIs: a low-level SPI between a client and a server that mirrors the CMIS specification closely (it is expected that the SPI will be used when either the client or the server implements one of the HTTP protocols defined in CMIS), and a high-level API that wraps the SPI to provide more object-oriented notions of connections, folders and documents, and that hides the nitty-gritty details of the protocols.

  • A set of common Java utilities around CMIS, for instance a parser to turn CMIS SQL into an AST (Abstract Syntax Tree) that can be reused by different backends, or a generic in-memory implementation of the SPI and API for unit testing.

  • Four implementations of the SPI for the protocols defined by CMIS: an AtomPub server and client, and a SOAP server and client.

  • A generic implementation of the API-to-SPI wrapping, so that a third-party implementation of just the SPI can be plugged into the rest of the Chemistry framework. (Some of the four basic protocol implementations may also provide the full API when this is more efficient than using the generic wrapping.)

  • An implementation of the APIs as a JCR backend.

  • A set of generic tests for CMIS servers and client, providing an unofficial TCK for CMIS.

In the future, it is expected that more implementations of the APIs will be available, for example we envision new transports:

  • A WebDAV-based transport.

  • An HTTP-based transport less RESTish and more friendly to browsers and JavaScript.

And new backends:

  • A backend storing documents on the filesystem, with or without metadata.

  • A backend storing documents in the Google AppEngine Datastore.

  • A backend storing documents using Microsoft Windows SharePoint Services.

The Pieces of the Puzzle

As you can see, these modules will allow for wide interoperability between systems. Here's a graphical representation of the building blocks:

The User Application speaks the API:

User App

The API can be implemented in many ways. First, it could be a direct backend:

API/SPI Backend

Or, more commonly, the API will be implemented as a client binding for a specific protocol, SOAP of AtomPub:

API/SPI to SOAP or AtomPub

Each protocol speaks in its own way on the wire:

SOAP and AtomPub

And this is connected to a server that speaks the protocol as well:

SOAP or AtomPub to SPI

Finally, behind the server, a backend has to store the actual information somewhere:

JCR or Nuxeo Backend

Anyone is welcome to create new pieces, for instance new protocol bindings:

Protocol Adapter

Or new storage backends:

Filesystem Backend

Now let's see how the main pieces can be plugged together.

The simplest connection is between an application and a direct backend:

User App to Nuxeo Backend

If the backend only wants to deal with the SPI, its implementation can reuse the API-to-SPI to provide a full API experience:

Generic API to SPI

When talking through a wire protocol, we plug together a client and a server:

Client Server AtomPub Adapter

The end result is an application talking to a backend through a wire protocol:

User App API/SPI to AtomPub to SPI JCR Backend

Of course we can get creative and plug many more together:

User App  via SOAP and AtomPub to Filesystem Backend


All of this is still a work in progress (even the spec!), but you should expect rapid changes in the available features in the coming months as the spec settles down, more code is written, more test cases are written, and more testing against third-party implementations is done.

If you're interested in helping, please join the list [email protected] by sending an empty email to [email protected].