Introducing fise, the Open Source RESTful Semantic Engine


Mon 30 August 2010 By Olivier Grisel


Fise_logo

Edit: fise is now known as the Stanbol Enhancer component of the Apache Stanbol incubating project.

As a member of the IKS european project Nuxeo contributes to the development of an Open Source software project named fise whose goal is to help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon.

 As such concepts might be new to some readers, the first part of this blog post is presented as a Q&A. 

What is a Semantic Engine?

A semantic engine is a software component that extracts the meaning of a electronic document to organize it as partially structured knowledge and not just as a piece of unstructured text content.

Current semantic engines can typically:

During the last couple of years, many such engines have been made available through web-based API such as Open Calais, Zemanta and Evri just to name a few. However to our knowledge there aren't many such engines distributed under an Open Source license to be used offline, on your private IT infrastructure with your sensitive data.

Why would I want to semantically annotate my content?

Lod-datasets_2009-03-27_coloredLinking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.

Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.

I you are not yet convinced please have look at this BBC use case and this 3 minutes video by the fine freebase folks.

How to use fise?

Setting up a fise instance

You can test fise using the online demo or you can download a snapshot of the all-in-one executable jar launcher (67MB) or you can build your own instance from source. If you want to run your local instance just launch it with a java 6 virtual machine as follows:

 java -Xmx512M -jar eu.iksproject.fise.launchers.sling-0.9-20100802.jar

And point your browser to http://localhost:8080 instead of http://fise.demo.nuxeo.com in the following examples.

Overview of the fise web interface

Once the server is up and running, fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:

 

 

The engines endpoint

Let us focus on the /engines endpoint. The view first list the active registered analysis components and then ask for a user input. Type an English sentence that mentions famous or non famous people, organizations and places such as countries and cities. I your are lazy, just copy and paste some article from a public news feed such as wikinews and submit your content with "Run engines". Depending on the registered engines and the length of your content, the processing time will typically vary from less than one second to around a minute.

 

Submitting text content to the /engines endpoint using the web interface

Submitting text content to the /engines endpoint using the web interface


By default fise launches three engines in turns:

 

Overview of the extracted entities in the submitted text 

Overview of the extracted entities in the submitted text.

 

Using the REST API

 

Up until now we have used the web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode.

The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of there documents. The detailed documentation if the REST API is available on a per-endpoint basis in the Web UI by clicking  on the "REST API" link in the top right corner of the page:

Rest-api-link 

Accessing the inline documentation for the REST API

 

Here is a sample call to the engines endpoint REST interface using the curl command line tool:

 

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain"
--data "Fise can detect famous cities such as Paris."
http://fise.demo.nuxeo.com/engines/

Please note that the output of this call will be formatted as text/turtle (a standard RDF serialization format). Again this is not meant to be consumed by regular human beings but only by machines (or semantic web engineers). The list of available serialization formats is detailed in the inline documentation.

The road ahead

To close this first blog post, here are a couple of un-prioritarized improvements I would like to implement in the coming months.

Multi-lingual support

Right now the packaged engines can only deal with English text content. We plan to progressively add statistical models for other languages as well.

Relations extraction

Right now if you submit a sentence that starts with "United Kingdom prime minister David Cameron declared to the press..." you will get an output such as:

David-cameron

"David Cameron" is detected as a person but not recognized since the fise index was built on a DBpedia dump extracted before his election. Furthermore fise is currently not able to extract the relation between the entity "David Cameron" and the entity "United Kingdom". In future versions of fise we plan to extract the role "prime minister" that links the person to the country. This should be achievable by combining syntactic parsing with semantic alignment of english words with an ontology such as DBpedia.

Extracting relations between entities will help knowledge workers incrementally build large knowledge bases at a low cost. For instance, this can be very interesting for economic intelligence or data-driven journalism: imagine automatically building the social networks of public figures from news feed and their relationships with business entities such as companies and financial institutions for instance.

Integration with Nuxeo EP

Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.


Category: Product & Development
Tagged: Apache, Java
Check out the features of our latest Nuxeo Platform Download Nuxeo