Configuring the Distributed Jobs System
In my previous blog, we discussed the main principles behind Nuxeo Distributed Jobs System. Today, let’s see how we can, practically speaking, configure specific task servers, so that we have:
- Three Nuxeo servers answering the front-end service
- One Nuxeo server handling video conversions
- And one Nuxeo server handling the generation of PictureViews
Other default background jobs are not handled in dedicated servers in this example
Purple color signifies Asynchronous jobs
To configure this example, we need:
- To setup Redis, in charge of storing/handling the persistent queues system
- To deploy the Nuxeo Platform on as many machine as needed (Five in our example).
- To change the Queue configuration for each node
In this article, we assume that the other parts are already configured (Database, Elasticsearch, Binary Store(s), etc.).
This is a fundamental piece of this Distributed Jobs System: The Nuxeo Platform needs to be able to rely on persistent queues, using a shared system, where each Nuxeo server in a cluster can access the queues. The latter is at the heart of the system, of course. Current versions of the Nuxeo Platform (LTS 2015 and FastTrack 8.3 at the time this article is written) use Redis (2.8.x, 3.0.x) for this purpose.
So, following our video conversion example:
- A user uploads a new video
- The Nuxeo Platform creates the corresponding Nuxeo Document (with title, dublincore schema, etc. ) and stores the file
- The Nuxeo Platform then queues a “VideoConversion” job (linked to the newly created document) in the Redis database
- The dedicated Video Conversion node will detect the work, pick it up, and transcode the video
Later, if the video file is modified, the same kind of process occurs: The Nuxeo Platform first detects that the file (the binary) really is not the same, then queues the work in Redis and one of the the dedicated workers handles it.
This page explains how to configure Redis for the Nuxeo Platform, including usage of High Availability and sentinel hosts if you want to use them.
Nuxeo Cluster Deployment
In our example, we need five Nuxeo servers installed on five different machines making sure they all share the same database, and access the same Binary Store(s), which basically means we are building a cluster of Nuxeo servers. Such a configuration is explained and detailed in this documentation.
There are two important points we would like to highlight:
- Each and every Nuxeo server in the system should be installed with the exact same set of bundles: Same Studio project, same packages, etc. This makes the deployment easier (have a single Nuxeo server ready for every node) and ensures that business rules are applied everywhere with no error.
- The system must ensure that the background servers cannot be reached for client requests, ie, exclude them from a Network Load Balancer, typically.
Configuration of Each Dedicated Node
So, we have our cluster of five Nuxeo servers. Each node must now be configured for background processing:
- If it is a front end processing server, we must make sure it does not handle the video or the image processing
- On the contrary, if it is a background processing server, we must make sure it handles only the jobs it is dedicated to (video or image processing)
We also handle the underlying tool that, ultimately, performs the job. For example, by default the Nuxeo Platform uses ffmpeg to handle videos. The machine will be tuned to maximize ffmpeg performance: Number of CPUs, High frequency, fast I/O, etc.
As the Nuxeo Platform is a configurable platform, setting up each node is quite easy. It is about deploying some lines of XML, telling the Nuxeo Platform which jobs it can process. We basically contribute the “queues” extension point, override the ones handling video conversion and image processing, and for each of them, we set its
processing attribute to
false, depending on which instance we are setting. The XML must be saved in an XML file with the following requirements:
- It must declare a component (see the full example below)
- The name of the file must end with “-config.xml”
- The file must be put in
There is something important to understand about queues and asynchronous jobs configuration: The main idea (with the current version of the Nuxeo Platform) is to make sure a node does not handle a specific kind of task. By default, each and every node will handle each and every kind of jobs. To tune the system, you have to exclude some of these jobs in some nodes, and include them in others. This is how you make sure that the front-end service answers quickly to client requests, even if there are 20 huge videos to transcode at the same time.
In our case, the minimum setting for video conversions will be the following (full XML below):
- On every node but the video processing one:
<queue id="videoConversion" processing="false"/>
- Only on the video processing node:
<queue id="videoConversion" processing="true">. . .
In the first case (processing=”false”), the node will process every task but the video conversion. In the second case (processing=”true”), the node will process every task, including the video conversion.
Note: Not using the
processing attribute will allow the processing (as if the default value is “true”)
Only Video Conversion Please
In our case, we want to make sure the video conversion node handles only video conversions. That means we have to make sure we disable each and every job except the video conversion. A list of default queues and their IDs can be found on explorer.nuxeo.com. For the current Fast Track version, it is here, and if you follow the link, you will notice we have quite a few of them (Fulltext indexing, Elasticsearch indexing, Changes in Collections, Update of ACEs, CSV importer, etc.). Thankfully, we don’t have to disable them one by one. There is a shortcut to disable each and every job- use “*” as the queue ID:
<queue id="*" processing="false"/>
The way it works is the following: First you disable the jobs, then you enable the ones you want.
Here is the full XML. Say, we name the file my-background-jobs-config.xml. Notice the following:
- It declares a component. Check whether the name is unique for this instance.
- It uses the
<require>tag, which makes sure we will override the default declaration (our custom changes will be added/merged to the one already loaded, overriding the default behavior, and not the reverse). If you want to override other queues, you can find them on explorer.nuxeo.com.
- We redeclare the full contribution, because we want to use 4 threads instead of 1 (which is the value of the default contribution): For instance, we set up video conversion on a high performance machine with a lot of powerful CPUs. We will be able to handle more videos in parallel on this machine (instead of adding more nodes, which can also be done, of course).
"1.0" xml version=<component name="my.settings.for.distributed.jobs"> <!-- Require the component(s) we override --> <!-- Component deploying the video conversion --> <require>org.nuxeo.ecm.platform.video.workmanager</require> <!-- We override the WorkManager --> <extension target="org.nuxeo.ecm.core.work.service" point="queues" > <!-- First, deactivate all jobs --> <queue id="*" processing="false"/> <!-- Then, activate and tune the videoConversion queue <queue id="videoConversion" processing="true"> <maxThreads>4</maxThreads> <category>videoConversion</category> </queue> </extension> </component>
Configuration of Other Instances
The XML for the three front-end services must let the instance handle all the jobs but video conversion and pictureview generation:
"1.0" xml version=<component name="my.settings.for.distributed.jobs"> <!-- Require the component(s) we override --> <!-- Component deploying the video conversion --> <require>org.nuxeo.ecm.platform.video.workmanager</require> <!-- Component deploying the PictureViews generation --> <require>org.nuxeo.ecm.platform.picture.workmanager</require> <!-- We override the WorkManager --> <extension target="org.nuxeo.ecm.core.work.service" point="queues" > <!-- Deactivate Video conversion and PictureViews generation --> <queue id="videoConversion" processing="false"/> <queue id="pictureViewsGeneration" processing="false"/> </extension> </component>
For the PictureViews generation, the principle is exactly the same as the video conversion: Disable all jobs except PictureViews generation. Here also, we allocate more threads and would probably tune the machine, make sure ImageMagick is able to use more than one CPU, etc.:
"1.0" xml version=<component name="my.settings.for.distributed.jobs"> <!-- Require the component(s) we override --> <require>org.nuxeo.ecm.platform.picture.workmanager</require> <!-- We override the WorkManager --> <extension target="org.nuxeo.ecm.core.work.service" point="queues" > <!-- First, deactivate all jobs --> <queue id="*" processing="false"/> <!-- Then, activate the pictureViewsGeneration queue --> <queue id="pictureViewsGeneration" processing="true"> <maxThreads>4</maxThreads> <category>pictureViewsGeneration</category> </queue> </extension> </component>
And we are done, the system described in the first schema of this article has been set up.
To build the distributed jobs system we wanted, we:
- Setup Redis to handle the queues
- Set up a Nuxeo Cluster
- Three nodes to handle client requests
- One node for video processing
- One node for image processing
- Deployed a custom my-background-jobs-config.xml file in nxserver/config, on each node, to activate/deactivate jobs
- Did not forget to tune the Network Load Balancer to make sure client requests never access the background processing servers.
Scaling is easy. Each part can be scaled - it is just about adding a VM to the cluster and configuring its job processing:
- Need a second machine to process videos? Just add one and configure it exactly as we did above (deactivate all jobs, activate the video conversion one)
- There are too many client requests? Add one, two, as many front end processing nodes to the cluster as you need, making sure they do not handle video conversion nor image processing.
Here is a schema after scaling by adding:
- Two nodes for video processing only
- One node for image processing only
- Three nodes for front end processing and background jobs other than video and image processing