Paginated Query with JavaScript Automation


Tue 12 July 2016 By Thibaud Arguillere

JavaScript is a wonderful language and I love it. For real. I don’t care what people (even people way smarter than me) think about it. You can’t explain love, you just love! So, since Nuxeo supports JavaScript as an Automation, I am the happiest Nuxeo fellow on the planet.

From JavaScript Automation, you can call each and every operation available on the server, and today, let’s focus on a quite important one: Repository.Query. I am sure this operation is used quite often in almost every Nuxeo application. It is about querying the platform to find documents.

Let’s take an example. You have a Contact schema with miscellaneous information:

Document Schema: Contact


You now want to add a full_name field that is always available to avoid calculating it dynamically every time for display, mails, letters (using Nuxeo Template Rendering for instance), etc.

So, of course, you first start by adding listeners, Event Handlers, for the About to Create and Before Modification events. This is not a part of this article but I assume you did it. This way, new contacts and modified contacts will be updated.

Let’s say you also want to update all the existing ones. You want to query each and every Contact and calculate its contact:full_name field. This will be a one shot maintenance operation, run once the new version of your application is installed.

So, the typical script for this is something like this (spoiler alert: It is not the good way of doing it):

  • Query all Contact (but versions, of course)
  • Update them in a loop

Here is how such a script looks:

function run(input, params) {
  var nxql, docs;
  // Query
  nxql = "SELECT * FROM Contact WHERE ecm:isVersion = 0 AND ecm:isProxy = 0";
  docs = Repository.Query(null, { 'query': nxql } );

  // Loop and update
  docs.forEach(function(contact) {
    contact["contact:full_name"] = buildFullName(contact["contact:first_name"], contact["contact:middle_name"], contact["contact:last_name"]);
    contact = Document.Save(contact, {});
  });
}

function buildFullName(first, middle, last) {
  var full = "";

  if(first !== null && first !== "") {
    full = first;
  }

  if(middle !== null && middle !== "") {
    if(full === "") {
      full = middle;
    } else {
      full += " " + middle;
    }
  }

  if(last !== null && last !== "") {
    if(full === "") {
      full = last;
    } else {
      full += " " + last;
    }
  }

  return full;
}

So, here we are using Repository.Query with just the required query string parameter, which means it returns a maximum of 1,000 documents by default. You can probably see the potential issue here: If you have more than 1,000 contacts, then you have a problem because you are not updating your whole repository but only the first 1,000 contacts.

Notice that because we are talking about Nuxeo, which is a platform - and a quite wonderful one may I say - this default number can, of course! be configured. See this documentation.

This is where you must actually build a loop calling Repository.Query repeatedly until there are no more results. And looking at the optional parameters, we can see this lovely currentPageIndex parameter (Yes. I just wrote “lovely” about a parameter). Such a loop is quite easy to setup: Start at page zero, and increment the page number. You may be surprised by the way I pass the parameter (using "" + currentPage). I’ll be back on this right after:

function run(input, params) {
  var contacts, oneContact, currentPage, found, i;
  // Start at -1 so next first page will be page 0
  currentPage = -1;
  do {
    // Query next 1,000 Contact
    currentPage += 1;
    contacts = Repository.Query(null, {
      'query': "SELECT * FROM Contact WHERE ecm:isVersion = 0 AND ecm:isProxy = 0",
      'currentPageIndex': "" + currentPage
     }
    );
    found = contacts.length;

    if(found > 0) {
      // Update full_name for each of the documents of this page
      for(i = 0; i < found; ++i) {
        oneContact = contacts[i];
        oneContact["contact:full_name"] = buildFullName(oneContact["contact:first_name"], oneContact["contact:middle_name"], oneContact["contact:last_name"]);
        oneContact = Document.Save(oneContact, {});
      }
    }

  } while(found > 0);
}

So, quickly, about the "" + currentPage thing: There is a little bug with today’s version that prevents us to pass a JavaScript number to a parameter expecting a Java Integer. Until that is fixed, the workaround is cool enough: just make it a string.

Now everything probably works well, but only if you have just some thousands of Contact, because we have two potential problems to handle. Synchronicity and transaction.

Synchronicity: If you have more than a few thousand Contact, you sure don’t want to run this from a menu item in the UI, because it will take time (lock the UI) and may even timeout. So you’ll launch this using an asynchronous event.

Transaction: All changes will be committed to the database after the operation ends. Having a long running transaction with a lot of modifications is not good, whatever the underlying database is. This means you have to commit the changes for every n Contact updated. In the following example, we commit the transaction every 100 Contact modified.

As there is no available operation to commit a transaction, we must use the magical RunScript operation and inject some Java. The main principle is that the Nuxeo Platform is always expecting a transaction to be running, so we must commit the current transaction, then immediately start a new one, which is done with two single lines of code using the TransactionHelper class:

TransactionHelper.commitOrRollbackTransaction();
TransactionHelper.startTransaction();

Here is the full script. Same paginated query with transaction handling will give you your full script. The new stuff have been highlighted here:

function run(input, params) {

  var COMMIT_MODULO = 100;
  var COMMIT_SCRIPT =  "org.nuxeo.runtime.transaction.TransactionHelper.commitOrRollbackTransaction();" + "\norg.nuxeo.runtime.transaction.TransactionHelper.startTransaction();";
  var contacts, oneContact, currentPage, found, i, updateCount;
  // Start at -1 so next first page will be page 0
  currentPage = -1;
  updateCount = 0;
  do {
    // Query next 1,000 Contact
    currentPage += 1;
    contacts = Repository.Query(null, {
      'query': "SELECT * FROM Contact WHERE ecm:isVersion = 0 AND ecm:isProxy = 0",
      'currentPageIndex': "" + currentPage
     }
    );
    found = contacts.length;    
    if(found > 0) {
      // Update full_name for each of the documents of this page
      for(i = 0; i < found; ++i) {
        oneContact = contacts[i];
        oneContact["contact:full_name"] = buildFullName(oneContact["contact:first_name"], oneContact["contact:middle_name"], oneContact["contact:last_name"]);
    oneContact = Document.Save(oneContact, {});

        // Commit if needed
        updateCount += 1;
        if((updateCount % COMMIT_MODULO) === 0) {
          RunScript(null, {'script': COMMIT_SCRIPT});
        }
      }
    }
  } while(found > 0);
}

There are other ways of doing it. You could actually set the pageSize parameter to 100 and commit after each step of the loop. It will send more queries to the database. You could also ask for 10,000, so you reduce the number of queries for the database. You could also write a some complex code to start multi-threaded updates, making sure each thread handles a different Contact.


Tagged: Nuxeo Platform, Nuxeo Automation