Seen at Europython: Xapian text search engine


Thu 30 June 2005 By nuxeo

I have been at Michael Salib's talk about Xapian, "Stupidity and laser cat toys: Indexing the US Patent Database with Xapian and Twisted"




Xapian is a probabilistic text search engine.



Michael used to index the US Patent Database, wich is pretty big indeed.He wrote a python wrapper called Xapwrap, that you can get here:


http://divmod.org/projects/xapwrap


Michael explained that Xapian was prefered to Lucene because It easier to wrap into Python and provided faster queries and a better precision.




I'm waiting for Michael to upload the slides on the EP sites to give more precise feedback on this.



More info on PyLucene here:
http://www.sauria.com/~twl/conferences/pycon2005/20050325/Pulling Java Lucene into Python.html(PyCon05 notes)




feature-wise, Xapian has eveything needed to run a scalabale text engine.(stemming based on snowball, meta-indexes, etc..) It optionnally uses twisted's python.log for logging.





I have the feeling that Xapian would fit pretty well as an external indexer for z3


(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)


Category: Product & Development