Industry Insight, Product & Development, Updates.

Archive for the ‘Apache’ tag

Using HttpClient properly to avoid CLOSE_WAIT TCP connections

with 4 comments

As I discovered when debugging TCP connections stuck in the CLOSE_WAIT state for one of our customers, we were using HttpClient incorrectly. We’re not alone in this case, as you’ll find out if you google HttpClient CLOSE_WAIT, but it’s very non-intuitive. Even the official tutorial is wrong, so I’m describing the issue here.

Apache HttpClient is usually used like this in basic mode:

But this is not enough.

The issue is that releasing the connection makes it available again to the HttpClient instance, but does not close it, because HTTP 1.1 is used and it can pipeline further requests to the same host:port in the same connection.

Even though the server may have decided to close its end of the connection, on our client side the connection is still open and will stay that way until an attempt to read from it is made (at which point the client will detect that …

Written by

February 26th, 2013 at 6:49 am

Seven reasons why Nuxeo uses Java for open source ECM awesomeness

with one comment

So today’s the 7th of July (7/7), and also, not so coincidentally, the day Oracle has chosen as the official day to launch Java 7.

We weren’t invited to the party, but let’s take a break anyway from our Tour de Nuxeo series to look at the 7 reasons why we’re happy to be using Java (Java 6, actually, we’re in no rush to adopt Java 7) for the Nuxeo Enterprise Content Management Platform.

7. Write once, run anywhere: it’s not a myth

We develop Nuxeo applications with confidence that the application will run on Linux, Windows and Mac OS.

Of course, we run integration tests on all three platforms just to be sure.

6. It’s fast and robust

It’s not the 90′s anymore. The JVM, with the integration of the HotSpot technology ten years ago, is now on par (i.e. only 10% to 2x slower on most …

A Berlin Buzzwords 2011 wrap up

without comments

Last week I had the opportunity to attend the second edition of the Berlin Buzzwords conference and then to participate in the Semantic / NLP hackathon hosted by Neofonie. Here is my personal executive digest. For a more comprehensive overview, most of the slides are online and the videos should follow at some point.

Part 1 – the conference


CC By Tim Lossen

Berlin Buzzwords is developer conference with a focus on scalable data processing, storage and search, mostly using Apache projects such as Hadoop, HBase, Solr & Lucene, Mahout and related projects. This second edition attracted more than 400 developers from all over the world including about a third who are Apache committers.

Hadoop MapReduce is no silver bullet

The idea that appealed to me the most across talks is that the MapReduce model is far from being an optimal way to do large scale distributed

Apache Chemistry Goes TLP

without comments

Just a few days ago Apache Chemistry became a top-level project (TLP) of the Apache Software Foundation (ASF). As a reminder, Chemistry provides client libraries implementing the CMIS spec in Java, Python, PHP and .NET, and a server-oriented library for Java. Learn more about this on the Chemistry page.

Before becoming a TLP, Chemistry was just an incubating project, guided in its growth by the Incubator, like all Apache projects when they begin life in the ASF. So what does the move to a TLP signify? Briefly:

  • the project's scope is now clearly established,
  • the community around it has been deemed large enough to sustain the project,
  • a Project Management Committee (PMC) has been formed to make internal decisions,
  • the PMC and other committers have become acquainted with the Apache Way of doing things,
  • successful releases of the software have been done.

Now that Chemistry is a TLP, it means a …

Written by

February 24th, 2011 at 3:07 pm

Mining Wikipedia with Hadoop and Pig for Natural Language Processing

with 7 comments

The context: semantic knowledge extraction from unstructured text

In a previous post we introduced fise an open source semantic engine
now being incubated at the ASF under the new name: Apache
. Here is a 4 minute demo that
explains how such a semantic engine can be used by a document management system
such as Nuxeo DM to tag documents with entities instead of potentially ambiguous words:

The problem with the current implementation, which is based on
OpenNLP, is the lack of readily
available statistical models for Named Entity Recognition in languages such as French. Furthermore, the existing models are restricted to the detection of few entity
classes (right now the English models can detect people, place and organization names).

To build such a model, developers have to teach or train the system
by applying a machine learning algorithm on an annotated corpus of data.
It is very …

Written by

January 11th, 2011 at 11:53 am

CMIS and Chemistry at ApacheCon 2010

without comments

ApacheCon 2010 starts next week in Atlanta, and if you want to know more about the CMIS standard from OASIS, Apache Chemistry, and OpenCMIS then you should come! On Wednesday, Nov 3 (at 2pm) I'll be presenting a talk on these topics. Here's the abstract:

Get your content under control with CMIS and Apache Chemistry

The CMIS standard provides an answer to most issues met by typical content-centric applications by offering a common model and a set of services for ECM interoperability. In this session we'll first provide an introduction to the CMIS services and bindings, then we'll offer a view of the landscape of the different ECM providers and clients implementing CMIS, and we'll finish with practical examples of the uses of OpenCMIS, the Apache Chemistry (Java) library, designed to help you easily write CMIS applications.

I hope to meet you there!…

Written by

October 25th, 2010 at 3:26 pm

Posted in Product & Development

Tagged with , , , ,

Introducing fise, the Open Source RESTful Semantic Engine

with 5 comments


Edit: fise is now known as the Stanbol Enhancer component of the Apache Stanbol incubating project.

As a member of the IKS european project Nuxeo contributes to the development of an Open Source software project named fise whose goal is to help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon.

 As such concepts might be new to some readers, the first part of this blog post is presented as a Q&A. 

What is a Semantic Engine?

A semantic engine is a software component that extracts the meaning of a electronic document to organize it as partially structured knowledge and not just as a piece of unstructured text content.

Current semantic engines can typically:

  • categorize documents (is this document written in English, Spanish, Chinese? is this an article that should be filed under the  Business, Lifestyle, Technology categories?

Apache Chemistry meeting wrap up

without comments

Last week a meeting took place in Munich between the main developers behind the two Java Chemistry projects (Chemistry and OpenCMIS). I can say that the meeting was a success and that the two codebases are now in the process of actively being merged!

People from Open Text (our host for this week), SAP, Alfresco and Nuxeo were present. There were lively discussions about many technical points, but following the Apache rules of conduct for such meetings all points were summarized each day to the mailing-list for larger visibility and input by the whole community (see the archives here). The remaining work to do for this merge will be logged in the Apache JIRA issue tracker, again to provide visibility.

Once the current code base is stabilized, which we hope will take no more than one or two weeks, we want to make a first 0.1-incubating release, in …

Written by

April 19th, 2010 at 5:15 pm

Nuxeo CMIS Update

without comments

Thanks to lots of progress in Apache Chemistry, to which Nuxeo is contributing, and through updated Nuxeo Chemistry bindings, the support for CMIS in Nuxeo is getting quite good.

For more practical info on using CMIS in Nuxeo, including download links, see

Note that our demo server at has been updated as well.

Below are most of the new features available since the last release.

Better search

Fulltext search with CONTAINS() has been implemented so that you can do queries like:

SELECT cmis:name FROM cmis:document WHERE CONTAINS(‘foobar’)

(The full scope of the fulltext search syntax, with ORing of words and negation, is not there yet.)

You can now also use the IN_TREE() and IN_FOLDER() predicates.

The SQL keywords are now case-insensitive as the spec requires, and complex boolean functions have been fixed.


A number of fundamental features form the CMIS domain model are now complete: object …

Written by

January 21st, 2010 at 6:53 pm

Posted in Product & Development

Tagged with , , ,

IKS Semantic Search Workshop Wrap-Up (Rome 2009)

without comments


Last week I had the pleasure to attend the second
organized by the IKS
in Rome. The goal of this 4 years project is to
develop a software stack and a set of design guidelines to help CMS
developers leverage the promises of knowledge oriented software and Linked Data.

In the following I will give a brief overview of some of the
discussions that happened during those four days and a summary
of the Scribo project I presented during the demo sessions the
last day. A more complete coverage of the event can be found the event page
of the IKS wiki.

Materialized semantic indexes

Rupert Westenthaler from the Salzburg Research team is working
on a very interesting prototype to make CMS applications able to perform
fast complex graph queries on a knowledge base by materializing named
graph queries into flat Lucene indexes and tracking the knowledge …

Written by

November 18th, 2009 at 12:21 pm