I have been at Michael Salib's talk about Xapian, "Stupidity and laser cat toys: Indexing the US Patent Database with Xapian and Twisted"

Xapian is a probabilistic text search engine.

Michael used to index the US Patent Database, wich is pretty big indeed.He wrote a python wrapper called Xapwrap, that you can get here:


Michael explained that Xapian was prefered to Lucene because It easier to wrap into Python and provided faster queries and a better precision.

I'm waiting for Michael to upload the slides on the EP sites to give more precise feedback on this.

More info on PyLucene here:
http://www.sauria.com/~twl/conferences/pycon2005/20050325/Pulling Java Lucene into Python.html(PyCon05 notes)

feature-wise, Xapian has eveything needed to run a scalabale text engine.(stemming based on snowball, meta-indexes, etc..) It optionnally uses twisted's python.log for logging.

I have the feeling that Xapian would fit pretty well as an external indexer for z3

(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)