Seen at Europython: Xapian text search engine
I have been at Michael Salib's talk about Xapian, "Stupidity and laser cat toys: Indexing the US Patent Database with Xapian and Twisted"
Xapian is a probabilistic text search engine.
Michael used to index the US Patent Database, wich is pretty big indeed.He wrote a python wrapper called Xapwrap, that you can get here:
http://divmod.org/projects/xapwrapMichael explained that Xapian was prefered to Lucene because It easier to wrap into Python and provided faster queries and a better precision.
I'm waiting for Michael to upload the slides on the EP sites to give more precise feedback on this.
More info on PyLucene here: http://www.sauria.com/~twl/conferences/pycon2005/20050325/Pulling Java Lucene into Python.html(PyCon05 notes)
feature-wise, Xapian has eveything needed to run a scalabale text engine.(stemming based on snowball, meta-indexes, etc..) It optionnally uses twisted's python.log for logging.
- Lucene features: http://lucene.apache.org/java/docs/features.html
- Xapian features: http://www.xapian.org/features.php
I have the feeling that Xapian would fit pretty well as an external indexer for z3
(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)
Subscribe to Feed
Follow us on Twitter