Export Data with Content Automation

Thu 06 March 2014 By Alain Escaffre

At Nuxeo, we do internal reporting using a BI tool that fetches data from a PostgreSQL database used as the internal data warehouse. Many sources are filled in this database. Some of this information comes from Connect. In Connect, you find objects controlling subscribed online services, customers entities, studio projects, applications descriptions and a few other entities.

To fill the internal data warehouse with this data, I could have used the Nuxeo - Mule ESB connector, or integrated at a much lower level to the database. But this time, to export data from Connect, I wanted to play with a pure automation solution, that if not as simple as a BIG SQL join, at least only requires HTTP Port 80 communication between the two systems and uses a maintained API. Plus it allows for very easy reformatting of data, and is a good example to teach you a few things about using Automation!

I wrote an automation chain that flattens all this data into a CSV file. The file is then loaded in a document in the repository, and also sent by mail to people expecting it. The chain is launched using the Nuxeo scheduler. Every Sunday it sends an event in the Nuxeo Platform event bus, where an event handler listens to it and calls my CSV export chain. Then a Python script on the data warehouse server downloads the file that was loaded in the Nuxeo document. Once it has been updated, the script imports the content in the PostgreSQL database.

It is interesting to explain how the export chain works, so as to review two notions:

  • How to produce a CSV file using automation,

  • How the use of subchains can make your configuration easier to maintain and clearer to read.

Generating a CSV file with Automation

This relies on the use of the “Render document feed” operation, available in the Conversion category. This operation takes a list of documents as input, and produces a blob as output. It takes as a parameter an FTL (or MVEL) template name, and inject the documents collection in rendering context before rendering it. The collection is available under the “This” object. You can thus use the loop directive of FreeMarker to produce a CSV in your template:

title, description, creation date
<#list This as doc>

Let the previous sample be a template named “csv_file_generation”. The following chain:

Fetch > Query (query: select * from client)

Conversion > Render documents feed (template name: csv_file_generation, mime type: application/csv)

will produce a CSV file (you can append a “UI > Download” operation and bind it to a user action for testing it). More documentation on the topic can be found on our wiki pages.

That was a simple situation. In my case, I have several objects that are linked via document ids stored in properties, and I want to produce one single CSV file with columns of values of all the objects that are interesting for one customer. That means that instead of using Fetch > Query and iterate on the result, I need to build my collection differently.

Controlling your Chain Execution with Operations from the Execution Flow Category

I choAutomation chainse the following strategy:

  • Initialize a collection of maps, that will contain for one line of the required CSV one map computed in a subchain, using Execution Context > Set Variable (value: @{new java.util.HashMap()})

  • Select all the customers using Fetch > Query

  • For each customer call a “FetchCustomerMap” using Execution Flow > Run Document Chain, that will execute the subchain repetitively for each document.

  • Call the Render Document operation for producing the final CSV like we've seen earlier. As the collection of maps (that has been filled in the Run Document Chain step) is in the context of the automation chain, it is also available in the template, and that’s what I’ll use for looping and producing the CSV

  • Store the generated blob in a document of the repository, using Document >Set Blob

  • Send that file via Email using Notification > Send Email operation

We can now zoom on the FetchCustomerMap chain that is executed for each customer. Its role is to create one line of our CSV file. Since data is spread among different objects, we use some run chain operations to isolate the process for each object, so as to make configuration clearer and easier to maintain:

  • Creates a map object using Execution Context > Set Context variable

  • run a subchain to fetch contract data and set it in the map, using Execution flow > Run chain

  • run a subchain to fetch studio project data and set it in the map, using Execution flow > Run chain

  • run a subchain to fetch application data and set it in the map, using Execution flow > Run chain

  • Add the map to the collection object that was created in the parent chain.

You may have to control the transaction time-out of your sever if processing takes to much time, or run the loop in separate transaction at each step of the loop. That 's all for today!

Category: Product & Development
Tagged: How to, Nuxeo Automation