One question we often get from customers, particularly multinational firms, is whether or not enterprise search in the Nuxeo Platform can handle content and queries in multiple languages. We are glad to answer yes, it definitely can handle multilingual search, and the default configuration already provides great results for virtually all of the world’s most common languages.
However, we can improve the user experience even further by taking advantage of the language specific optimizations available in Elasticsearch. This includes stemming, which is the process of extracting the root of each word such that, for example, a query with the singular form of some words will also match content which contain the plural form of the same words. The Nuxeo Platform makes this easy to set up with only a few configuration steps!
First, we need to update the Elasticsearch mapping as explained in our documentation. The idea is to create a new field for each language and assign each with the corresponding language analyzer.
{
...
"all_fr" : {
"analyzer" : "french",
"include_in_all" : false,
"type": "string"
},
"all_sp" : {
"analyzer" : "spanish",
"include_in_all" : false,
"type": "string"
},
...
}
Here the content of the field all_fr will be analyzed by Elasticsearch using default the French analyzer. Now we need to configure Elasticsearch so it copies the content of all the document properties that must be taken into account for search into our custom all_XY fields.
Let’s assume that the search scope must include the title, description and full text extract. This translates in the following mapping in Elasticsearch:
{
...
"dc:title" : {
"type" : "string",
"copy_to": ["all_sp","all_fr"]
}
"dc:description" : {
"type" : "string",
"copy_to": ["all_sp","all_fr"]
},
"ecm:binarytext" : {
"type" : "string",
"copy_to": ["all_sp","all_fr"]
}
...
}
Next, we’ll use Nuxeo studio to configure a search view for each language by taking advantage of the Elasticsearch NXQL Hints feature. Introduced in Nuxeo 7.3, this feature lets you use the native Elasticsearch query language within Nuxeo content views! Using Nuxeo Studio, we can configure several search views with search filters that specify the Elasticsearch field and analyzer that we want to use.
That’s it for the configuration. We now have an application that provides several search views optimized for different languages! Let’s try it and see how well it handles singular and plural forms in Spanish by searching for content about the most beautiful cities in the world…in Spanish!
Spanish-optimized text search finds our sample matching document
Nuxeo has tightly integrated Elasticsearch within the Nuxeo Platform. Explore further how this greatly benefits Enterprise Content Management system/Digital Asset Management system users and expedites development of content-focused enterprise applications: