Browse your Document Distribution with Kibana


Thu 14 January 2016 By Guillaume Renard

If you are an Administrator of the Nuxeo Platform, getting a quick overview of where the Documents are located in the repository might be one of your most common tasks. The good news is that this can be done easily! You can now leverage the nested aggregations provided by the Elasticsearch index used in the Nuxeo Platform to produce a sunburst pie chart representing the disk usage of your Nuxeo deployment within Kibana. I will walk you through the configuration process and the steps on how to achieve this.

Disk Usage sunburst pie chart in Kibana

Configuration


Requirements:


  • Nuxeo platform 6.0 or greater, Nuxeo LTS 2015

  • Elasticsearch 1.4.4 - 1.7

  • Kibana 4.1.x


To be able to build the above chart, we need to introduce an Elasticsearch transform in the mapping file. The Elasticsearch mapping file is located in the $NUXEO_HOME/templates/common-base/nxserver/config/Elasticsearch-config.xml.nxftl.

Add the following transform based on a groovy script:

 ...
"transform": {
"lang": "groovy",
"script": "def splitPath = [];splitPath = ctx._source['ecm:path'].split('/'); ctx._source['pathDepth'] =splitPath.length; for (i = 1; splitPath.length > i; i++) { ctx._source['pathLevel' + i] =splitPath[i] }"
},
"properties" {
....

Note that the transform definition goes at the same level as the mapping properties. Let’s take a closer look at the groovy script.

def splitPath = [];
splitPath = ctx._source['ecm:path'].split('/');
ctx._source['pathDepth'] = splitPath.length;
for (i = 1; splitPath.length > i; i++) {
ctx._source['pathLevel' + i] =splitPath[i];
}

The script is going to be invoked each time a Nuxeo Document is indexed in Elasticsearch. It will extract the ecm:path string property (e.g. /default-domain/worspaces/aWorspace/aFile) from the source of the Document. This path is then split between the ‘/’ character in order to index each part of the path in a dedicated field called pathLevel1, pathLevel2, pathLevel3, etc. depending of the depth of the Document in the hierarchy.

Note that Groovy dynamic scripting is off by default from Elasticsearch v1.4.3 and you need to enable it from your $ELASTIC_HOME/config/Elasticsearch.yml:

script.groovy.sandbox.enabled: true

Once this is done, you must re-index your documents into Elasticsearch so that the transform is applied and pathLevel1, pathLevel2, pathLevel3, etc. fields are valued. To do so:


  • Go to the Admin Center of your Nuxeo deployment

  • Click on the Elasticsearch menu

  • Go to the Admin tab

  • Click Re-index repository


That’s it! Let’s now build a chart on top of these new fields.

Build the Chart with Kibana


We will now see how to design the chart in Kibana in a few steps:


  1. Go to the Visualize tab

  2. Create a Pie Chart by clicking:
    Create pie chart


  3. Click From a new search

  4. The chart design assistant appears on the left-hand side panel:

    Split chart

    Click Split Slices

  5. Setup the nested aggregations on the different path level fields created by the transform groovy script:

    Create buckets

    At the first level, select Terms aggregation type on the pathLevel1 field. In the Size input, enter 0. It will force Elasticsearch to return exhaustive counts.

  6. Click Add sub-buckets and repeat step 5 as many times as you want/need depending on the maximum depth of your hierarchy. Each time, select a deeper pathLevelx field (e.g. pathLevel2, pathLevel3, etc.).

  7. Click the Apply changes button:Apply changes


  8. Enjoy your sunburst chart:Sunburst pie chart

Troubleshooting


Nothing appears? That can happen for one of these reasons:

By default, Kibana fetches entries created in the last 15 minutes. In the top right corner, click on:

and select an appropriate time range, for example Last 5 years.

last 15 minutes
If you can’t see pathLevel1, pathLevel2, etc. in the field selector when defining your buckets in step 5 and 6, you may need to Reload field list. Go to the Kibana Settings menu and click the Orange button for the Nuxeo index.

Refresh fields

What About File Size?


The previous steps guide you to build a chart that shows how many Documents you have in each folder. In order to build a proper Disk Usage chart, you need to customize the nested aggregations a bit further.

Luckily, the Nuxeo Platform computes and stores the size of each Document (more exactly the size of their attached binaries) in the common:size Document property.

To create a Disk Usage chart, redo the previous steps and instead of selecting a Count aggregation in the metrics definition, select Sum on the common:size field.

Aggregation - sum

By the way, you may want to restrict the size computation for particular Documents. In the above screenshot, you can see that I added the following query clause:

-ecm.mixInType.HiddenInNavigation -ecm.currentLifeCycleState.deleted

to exclude the hidden Documents (typically system documents) and the deleted Documents (i.e. located in the trash bin).

Here’s the final result:

Disk sage size - Final sunburst pipe chart


Tagged: Elasticsearch, How to, Nuxeo Platform 7.x, Analytics