|
|
|
[OOo] New Thesaurus file format for OOo 2.0
The thesaurus file format will change from OOo version 1.x to 2.x
The engine, myThes has been developped by Kevin Hendricks (OOo lingucomponent project lead). A standalone version is available at http://lingucomponent.openoffice.org/thesaurus.html The new format is based on WordNet from Priceton Univerity http://www.cogsci.princeton.edu/~wn/ The main changes introduced are
This new format is incompatible with old one. So existing thesaurus will not work in OOo 2.0 I'm working on a small program translating the old thesauruses to new format. It is an OOo macro accessing thesaurus API (mainly the com.sun.star.linguistic2.Thesaurus service available in OOo 1.1.x and the old .idx file which is plain text). Once the data transformed (the .dat file is created), the new index .idx file is generated using a perl script Kevin wrote. It is almost finished and will be released under free licence so that other native-lang OOo projects can transform their own thesaurus if needed. Concerning morphological informations (verb, noun, adjective ...) that are actually missing for all entries, Myriam's work (see her blog) will be of great help generating these informations. Important announcement: Join the Nuxeo team and contribute to the Nuxeo project! We have open positions in France and the UK for open source Java EE developers and sales engineers, both junior and senior.
Posted by Laurent Godard @ 03/03/2005 12:34 PM.
-
Categories:
openoffice
-
0 comments
|
Nuxeo Bloggers: Log in! Search Nuxeo Blogs
About this blog
Laurent Godard
Nuxeo Bloggers
Photos and Pictures
|
|
Nuxeo -
Indesko -
Nuxeo 5 Project
All content is copyrighted by their author. CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS. |