Querying Complex Schemas in Nuxeo With NXQL


Fri 09 September 2011 By Florent Guillaume

From the start, as we think it's important for a real content management platform, Nuxeo has had the ability to send requests to the document repository using a SQL-like language that we call NXQL. This allows you to do things like:

SELECT * FROM Document WHERE dc:title = 'Some Report'

or:

SELECT * FROM Document WHERE dc:modified > DATE '2011-08-01'

or more complex:

SELECT * FROM Note WHERE dc:modified > TIMESTAMP '2011-08-01 12:43:38' AND ecm:fulltext = 'merger OR ipo' AND dc:lastContribtor IN ('bob', 'john') AND (m:priority BETWEEN 1 AND 5 OR m:level > 1000)

Of course you can mix all standard operators, add parentheses, OR, etc. More details about NXQL are available at http://doc.nuxeo.com/x/uItH.

But one limitation of NXQL used to be that, although Nuxeo can store and retrieve "deep" complex properties in its documents, there was no immediate way to use them in queries written in NXQL. We call complex properties those from documents described by an XML Schema that includes a <xs:simpleType> with a <xs:list>, or a <xs:complexType> with optionally a <xs:sequence>. Such schemas provide complex data modeling options to the developer, and are used a lot to store deeply structured objects in one document.

Starting with Nuxeo 5.4.3, the limitations on querying these complex schemas have been lifted, and you can now do queries on them!

Here are some sample queries that you could not make before Nuxeo 5.4.3, and are now possible:

The documents where the first subject is 'test':

SELECT * FROM Document WHERE dc:subjects/0 = 'test'

All the possible subjects starting with 'test':

SELECT DISTINCT dc:subjects/1 FROM Document WHERE dc:subjects/1 LIKE 'test%'

The files less that one KB:

SELECT * FROM File WHERE content/length < 1024

The files sorted by size:

SELECT * FROM File ORDER BY content/length DESC

The files with a given file name:

SELECT * FROM File WHERE content/name = 'report.doc'

The JPEG files, by filtering on the MIME type:

SELECT * FROM File WHERE content/mime-type = 'image/jpeg'

The documents where the first attached file is more than one KB:

SELECT * FROM Document WHERE files/0/file/length >= 1024

The documents with an attached text file and an attached file of length 0:

SELECT FROM Document WHERE files//file/name LIKE '%.txt' AND files/*/file/length = 0

The documents with an attached file of type text and of length 0:

SELECT FROM Document WHERE files/1/file/name LIKE '%.txt' AND files/*1/file/length = 0

Note the difference, the second request needs to correlate the two file instances in the two clauses, which is why we use a special n syntax to tell the query engine that this is the same one. The number after the is arbitrary, but all the uses of files/*n with a given n will refer to the same file.

Using these new query capabilities, developers will now have full access to their data model from NXQL, and will no more have to drop to ad-hoc hibernate queries to do this kind of query.

One cautionary note to end this post: these NXQL queries are turned into SQL queries that use a number of JOIN operations, and the execution of such queries has its price if your database is not tuned correctly. Please note that just because you can make queries does not mean they will automatically be efficient, ensuring this is the job of a data architect and a DBA — this is true for any system, not just Nuxeo.


Category: Product & Development