« EP 2005 -Day 1 | Main | Reinout van Rees's EuroPython reports »

Jun 30, 2005

Seen at Europython: Xapian text search engine

I have been at Michael Salib's talk about Xapian, "Stupidity and laser cat toys: Indexing the US Patent Database with Xapian and Twisted"


Xapian is a probabilistic text search engine.

Michael used to index the US Patent Database, wich is pretty big indeed.He wrote a python wrapper called Xapwrap, that you can get here:

http://divmod.org/projects/xapwrap

Michael explained that Xapian was prefered to Lucene because It easier to wrap into Python and provided faster queries and a better precision.


I'm waiting for Michael to upload the slides on the EP sites to give more precise feedback on this.

More info on PyLucene here: http://www.sauria.com/~twl/conferences/pycon2005/20050325/Pulling Java Lucene into Python.html(PyCon05 notes)


feature-wise, Xapian has eveything needed to run a scalabale text engine.(stemming based on snowball, meta-indexes, etc..) It optionnally uses twisted's python.log for logging.


I have the feeling that Xapian would fit pretty well as an external indexer for z3

(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)

Comments

About Us

We're the friendly employees of Nuxeo, a leading open source software vendor, which develops a complete Enterprise Content Management (ECM) software platform to help companies better produce, process, publish, archive, expose and find their information from digital assets to transactional documents.

» Follow us @nuxeo (Twitter)

» Connect on LinkedIn

» Visit Nuxeo.com

 

Customize & Configure
Nuxeo • Studio

Nuxeo • DM
Online Trial

Nuxeo • DM
Download

Nuxeo • DAM
Download

Nuxeo Connect support