As a user, I think it’s great to have information presented to me without having to look for it. I am sure I am not the only one who thinks so. Most of the internet companies realized a long time ago how powerful this can be as a user retention tool. After all, who hasn’t ended up spending hours watching videos on YouTube while originally searching for just one video? No matter what the service is, the same logic applies. While browsing an item if a user gets suggestions for other “similar” items (such as similar products, songs, videos, or ‘people who like this restaurant also like’ ideas, etc) then there is a greater possibility of user engagement and retention.

That being said, here’s the most important question: Can I build this kind of feature with the Nuxeo Platform and provide relevant content suggestions to my users? The answer, of course, is yes! And that’s one of the many use cases you can implement with the Elasticsearch API passthrough which will be introduced in the upcoming Nuxeo Platform Fast Track release. The Elasticsearch query DSL includes the “more like this” query type which finds documents similar to those given in the parameters. There are a lot of parameters which allow you to build queries that exactly match your business definition of “like this”.

In the previous blog about Elasticsearch, we built a simple web application which uses the API passthrough. This time we will build a native Nuxeo Platform widget to illustrate the content suggestion feature. The source code has been made available in the nuxeo-labs repository on GitHub. Building and using a widget is really easy so here we will focus instead on how to build the right ES query to get relevant content.

So let’s run a few simple tests. For the purpose of this blog I used a Nuxeo repository which contains about 50 pictures and 50 PDF documents in a single folder. First, let’s try the default parameters:

{query : {
   "bool": {
     "must": [{
          "more_like_this" : {
           "min_term_freq" : 1,
           "docs" : [{
             "_index" : "nuxeo",
             "_type" : "doc",
             "_id" : id}]}}]}}}}

Interestingly, now when we browse a picture we get suggestions for other pictures, and when we browse a PDF document we get suggestions for other PDF documents. Not bad for a start!

PDF documents suggestions

Picture Document Suggestions

Now in order to go further and improve the suggestion relevance, we need a clear definition of what a similar content is. Obviously this definition depends on the nature of the content and the application purpose. Let’s take a simple example and say that a similar content is most likely a content with a similar title. Translate that definition to the Elasticsearch query DSL and you get the following:

{query : {
        "bool": {
          "should": [{
               "more_like_this" : {
                "fields" : ["dc:title.fulltext"],
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}],
                "min_term_freq" : 1,
                "min_word_length" : 5,
                "min_doc_freq" : 3,
                "boost" : 3
              }},{
              "more_like_this" : {
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}]}}]}}}}

If we go back to the same picture as in the previous example, we now only get suggestions from the same picture series.

Suggestions from the same picture series

{query : {
        "bool": {
          "should": [{
               "more_like_this" : {
                "fields" : ["dc:title.fulltext"],
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}],
                "boost" : 2}
              },{
               "more_like_this" : {
                "fields" : ["dc:author"],
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}],
                "boost" : 2}
              },{
               "more_like_this" : {
                "fields" : ["ecm:tags"],
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}],
                "boost" : 2}
              },{
              "more_like_this" : {
                "docs" : [{
                  "_index" : "nuxeo",
                  "_type" : "doc",
                  "_id" : id}]}}]}}}

Until the release of the next Fast Track – Nuxeo Platform 7.3, you can try the API with a snapshot distribution. But in the end, don’t forget that what really matters in order to take advantage of this feature is your definition of “like this”.