Archive for the ‘Apache’ tag
As I discovered when debugging TCP connections stuck in the CLOSE_WAIT state for one of our customers, we were using HttpClient incorrectly. We’re not alone in this case, as you’ll find out if you google HttpClient CLOSE_WAIT, but it’s very non-intuitive. Even the official tutorial is wrong, so I’m describing the issue here.
HttpClient is usually used like this in basic mode:
But this is not enough.
The issue is that releasing the connection makes it available again to the
HttpClient instance, but does not close it, because HTTP 1.1 is used and it can pipeline further requests to the same host:port in the same connection.
Even though the server may have decided to close its end of the connection, on our client side the connection is still open and will stay that way until an attempt to read from it is made (at which point the client will detect that …
So today’s the 7th of July (7/7), and also, not so coincidentally, the day Oracle has chosen as the official day to launch Java 7.
We weren’t invited to the party, but let’s take a break anyway from our Tour de Nuxeo series to look at the 7 reasons why we’re happy to be using Java (Java 6, actually, we’re in no rush to adopt Java 7) for the Nuxeo Enterprise Content Management Platform.
7. Write once, run anywhere: it’s not a myth
We develop Nuxeo applications with confidence that the application will run on Linux, Windows and Mac OS.
Of course, we run integration tests on all three platforms just to be sure.
6. It’s fast and robust
It’s not the 90′s anymore. The JVM, with the integration of the HotSpot technology ten years ago, is now on par (i.e. only 10% to 2x slower on most …
Last week I had the opportunity to attend the second edition of the Berlin Buzzwords conference and then to participate in the Semantic / NLP hackathon hosted by Neofonie. Here is my personal executive digest. For a more comprehensive overview, most of the slides are online and the videos should follow at some point.
Part 1 – the conference
Berlin Buzzwords is developer conference with a focus on scalable data processing, storage and search, mostly using Apache projects such as Hadoop, HBase, Solr & Lucene, Mahout and related projects. This second edition attracted more than 400 developers from all over the world including about a third who are Apache committers.
Hadoop MapReduce is no silver bullet
The idea that appealed to me the most across talks is that the MapReduce model is far from being an optimal way to do large scale distributed …
Just a few days ago Apache Chemistry became a top-level project (TLP) of the Apache Software Foundation (ASF). As a reminder, Chemistry provides client libraries implementing the CMIS spec in Java, Python, PHP and .NET, and a server-oriented library for Java. Learn more about this on the Chemistry page.
Before becoming a TLP, Chemistry was just an incubating project, guided in its growth by the Incubator, like all Apache projects when they begin life in the ASF. So what does the move to a TLP signify? Briefly:
- the project's scope is now clearly established,
- the community around it has been deemed large enough to sustain the project,
- a Project Management Committee (PMC) has been formed to make internal decisions,
- the PMC and other committers have become acquainted with the Apache Way of doing things,
- successful releases of the software have been done.
Now that Chemistry is a TLP, it means a …
The context: semantic knowledge extraction from unstructured text
In a previous post we introduced fise an open source semantic engine
now being incubated at the ASF under the new name: Apache
Stanbol. Here is a 4 minute demo that
explains how such a semantic engine can be used by a document management system
such as Nuxeo DM to tag documents with entities instead of potentially ambiguous words:
The problem with the current implementation, which is based on
OpenNLP, is the lack of readily
available statistical models for Named Entity Recognition in languages such as French. Furthermore, the existing models are restricted to the detection of few entity
classes (right now the English models can detect people, place and organization names).
To build such a model, developers have to teach or train the system
by applying a machine learning algorithm on an annotated corpus of data.
It is very …
ApacheCon 2010 starts next week in Atlanta, and if you want to know more about the CMIS standard from OASIS, Apache Chemistry, and OpenCMIS then you should come! On Wednesday, Nov 3 (at 2pm) I'll be presenting a talk on these topics. Here's the abstract:
The CMIS standard provides an answer to most issues met by typical content-centric applications by offering a common model and a set of services for ECM interoperability. In this session we'll first provide an introduction to the CMIS services and bindings, then we'll offer a view of the landscape of the different ECM providers and clients implementing CMIS, and we'll finish with practical examples of the uses of OpenCMIS, the Apache Chemistry (Java) library, designed to help you easily write CMIS applications.
I hope to meet you there!…
Edit: fise is now known as the Stanbol Enhancer component of the Apache Stanbol incubating project.
As a member of the IKS european project Nuxeo contributes to the development of an Open Source software project named fise whose goal is to help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon.
As such concepts might be new to some readers, the first part of this blog post is presented as a Q&A.
What is a Semantic Engine?
A semantic engine is a software component that extracts the meaning of a electronic document to organize it as partially structured knowledge and not just as a piece of unstructured text content.
Current semantic engines can typically:
- categorize documents (is this document written in English, Spanish, Chinese? is this an article that should be filed under the Business, Lifestyle, Technology categories?
Last week a meeting took place in Munich between the main developers behind the two Java Chemistry projects (Chemistry and OpenCMIS). I can say that the meeting was a success and that the two codebases are now in the process of actively being merged!
People from Open Text (our host for this week), SAP, Alfresco and Nuxeo were present. There were lively discussions about many technical points, but following the Apache rules of conduct for such meetings all points were summarized each day to the mailing-list for larger visibility and input by the whole community (see the archives here). The remaining work to do for this merge will be logged in the Apache JIRA issue tracker, again to provide visibility.
Once the current code base is stabilized, which we hope will take no more than one or two weeks, we want to make a first 0.1-incubating release, in …
Thanks to lots of progress in Apache Chemistry, to which Nuxeo is contributing, and through updated Nuxeo Chemistry bindings, the support for CMIS in Nuxeo is getting quite good.
For more practical info on using CMIS in Nuxeo, including download links, see http://doc.nuxeo.org/xwiki/bin/view/Main/CMIS.
Note that our demo server at http://cmis.demo.nuxeo.org/ has been updated as well.
Below are most of the new features available since the last release.
Fulltext search with CONTAINS() has been implemented so that you can do queries like:
SELECT cmis:name FROM cmis:document WHERE CONTAINS(‘foobar’)
(The full scope of the fulltext search syntax, with ORing of words and negation, is not there yet.)
You can now also use the IN_TREE() and IN_FOLDER() predicates.
The SQL keywords are now case-insensitive as the spec requires, and complex boolean functions have been fixed.
A number of fundamental features form the CMIS domain model are now complete: object …
Last week I had the pleasure to attend the second
workshop organized by the IKS
project in Rome. The goal of this 4 years project is to
develop a software stack and a set of design guidelines to help CMS
developers leverage the promises of knowledge oriented software and Linked Data.
In the following I will give a brief overview of some of the
discussions that happened during those four days and a summary
of the Scribo project I presented during the demo sessions the
last day. A more complete coverage of the event can be found the event page
of the IKS wiki.
Materialized semantic indexes
Rupert Westenthaler from the Salzburg Research team is working
on a very interesting prototype to make CMS applications able to perform
fast complex graph queries on a knowledge base by materializing named
graph queries into flat Lucene indexes and tracking the knowledge …