Log in

Setting the 'Remember my name' option will set a cookie with your user name, so that when you next log in, your user name will already be filled in for you.

Don't forget to logout or exit your browser when you're done.

Having trouble logging in? Make sure to enable cookies in your web browser.

I have forgotten my password!


04/30/2008
New Nuxeo Architecture Slides Published

I’ve just posted on slideshare a set of slides that have been used at recent customers and partners presentations about the Nuxeo architecture.

Hope you’ll enjoy it. There are some more in the works.

Posted by adminsf @ 04/30/2008 06:37 PM. - Categories: java, nuxeo5 -  0 comments
02/28/2008
Nuxeo EP 5.1.3.2 released

We have released Nuxeo EP 5.1.3.2 earlier this week. This is a maintenance release primarily focussed on bug fixes and small improvements.

You can download it as

or:

The changelog for this release is available.

Minor, but anyway noteworthy improvements include:

  • Nuxeo EP now works on a Java 6 JVM, which can lead to significant performance improvements (up to 100% for certain workloads, according to some internal benchmarks).

  • Full text indexing has been improved and made faster.

  • Some tweaks have been done to enable communication with the new (soon to be released and announced) “LiveEdit” plugin for Internet Explorer, Firefox and MS-Word.

Posted by Stéfane Fermigier @ 02/28/2008 02:37 PM. - Categories: nuxeo5 -  0 comments
02/19/2008
Interview on ComputerWorld UK
I've been interviewed by Glyn Moody, one of the best journalists covering open source, for the ComputerWorld UK web site.
France is not a country many would associate with free software startups, but that's changing – not least because the French government is showing itself far more receptive to open source than its UK counterpart. One of the leading companies of this new Nouvelle Vague is Nuxeo, which was set up by Stefane Fermigier, now its CEO.
Link: Open Enterprise Interview: Stefane Fermigier.
Posted by Stéfane Fermigier @ 02/19/2008 09:57 AM. - Categories: nuxeo, nuxeo5 -  0 comments
02/18/2008
Nuxeo EP 5.1.3 and 5.1.3.1 released

Nuxeo EP 5.1.3 has been released in January, with 235 enhancements over the previous version, Nuxeo EP 5.1.2, which had been released in October.

A bugfix release (5.1.3.1) was also done a few weeks after with 20 bugfixes and enhancements.

Download it now (107 Mb).

Posted by Stéfane Fermigier @ 02/18/2008 07:28 PM. - Categories: ecm, nuxeo, nuxeo5 -  0 comments
01/15/2008
Upcoming Nuxeo 5.1.3 release + Updated roadmap

The Nuxeo EP 5.1.3 release is coming along nicely after a small delay due to the holiday break, and will be tagged in a couple of days.

The roadmap for the project in 2008 has been updated accordingly.

Feel free to discuss it in the mailing list or the forum if you have questions, suggestions or to supply missing information.

As can be seen, Nuxeo 5.1.3 will not be just a maintenance release, as we’ve been able to add new features by creating new plugins (ex: WebDAV, Portlets, SSO, etc.) thanks to a now quite stable infrastructure and API.

Nuxeo 5.1.3 will also be based on Nuxeo Runtime and Core 1.4, which feature some improvements while staying compatible with the previous version (1.3). The switch to Nuxeo Core 1.4 was done in December and has proved very stable.

After the 5.1.3 release, we’re going to focus on the 5.2 (trunk) work, so as to move to Seam 2 for the web platform. Most new features will still be developed as new components, and will either be delivered with the next maintenance release (Nuxeo 5.1.4) or with Nuxeo 5.2, or both, depending on technical feasibility and customers needs or community contributions.

Posted by Stéfane Fermigier @ 01/15/2008 11:07 PM. - Categories: java, nuxeo5 -  0 comments
10/26/2007
Graphics, RichClient & Exhibitions
September was a productive month for sure, here are some of our last creations.

First, Lise has been working on an extension of Nuxeo EP for mail management and it's pretty :-)
Nuxeo Courrier Logo

Nuxeo Courrier Actions

Nuxeo Courrier AddressBook


We have been working also on the Nuxeo RCP appearance. First, we gave it a splashscreen. Sun is still working on integrating the tabs, so I made the icons (most of them come from Tango, famfamfam, are a mix or were entierely made vectorally from sratch). We even made linux/win/osX icon! Next step would be to theme the backgrounds & text. Here are some screenshots of custom graphics in action.

Nuxeo RCP Splashscreen

Nuxeo RCP

Nuxeo RCP

Nuxeo RCP

Nuxeo RCP

Note that Nuxeo RCP & Courrier can be downloaded in the sandbox of the Nuxeo source code.


As you may know, Nuxeo was present at Le Forum Des Acteurs du Numérique in Paris, IFRA in Vienna and at DocumationUK in London, so we made half-a-dozen banners (up to 2 meters-long!). At these exhibitions we had three all-new booklets ( Corporate presentation, Nuxeo ECM Stack & Nuxeo Connect support). We also gave "i luv ECM" badges, they were a success! You can find some photos in our Nuxeo Flickr, where you can find some pictures of Steve Raby & the DOC. ^^
Nuxeo ECM wheel

Nuxeo Brochure

Nuxeo Bannerz

Nuxeo stand


Among other little things (we made nice signs for THE BUREAU & THE FRIGO offices - i'll show you when I get a digital camera), we started to theme the discussions section on nuxeo.org, the community site.

Nuxeo.org


At least, all the graphic material used for communication was re-injected in the whole new ECM Stack pages on nuxeo.com!

Nuxeo.com


I hope you all enjoy it, stay tuned for more ECM-loving pictures :-)
++
Posted by Thibaut Soulcié @ 10/26/2007 07:42 PM. - Categories: apogee, ecm, nuxeo, nuxeo5, rich_client, web -  0 comments
09/12/2007
Nuxeo Runtime adds support for scripting languages

Bogdan had added preliminary, yet powerful, support for scripting in Nuxeo Runtime, before leaving for well deserved vacations. This makes scripting available from all Nuxeo’s platform. Thanks to this new feature, you can easily uses scripts from your custom components. This can be very useful for a lot of use cases, like, dynamic rules (scripting language as DSLs), easily modifiable behaviours, light and powerful configuration / customization, etc. Scripts have access to the whole API thanks to Java scripting integration (JSR-223).

Moreover, scripts can also be run remotely thanks to the Nuxeo Runtime command line. This allow you to create a script on your administration machine, launch it on the remote platform and get the result back. It makes scripting a killer-feature for administration scripts (ex: expire content, bulk content modification, bulk refactoring of the content repository layout, etc.).

Last, but not least, we are working on a interactive shell (using Python or Groovy) to interact with Nuxeo’s platform.

Here is a quote a Bogdan’s mail to get more details:

Hi all,

I’ve just integrated scripting support through JSR 223 in nuxeo. This was integrated as a new project nuxeo-runtime-scripting which is in svn but it is was not yet added in the nuxeo.ear build (neither in runtime svn module)

For now only these scripting engine were integrated:

  1. jexl

  2. jruby

  3. groovy

  4. jython

  5. groovy

  6. js (rynho)

If needed I will add more later (like php for example). You can run scripts in nuxeo in 3 different ways:

  1. Put the script inside the nuxeo.ear/script directory (you should define this directory through a runtime var.) Then from the java code you can do:

     Framework.getService(ScriptingService.class).eval("my_script.js");
    

    where my_script.js is the script path relative to the script directory

    Or you can use the JSR 322 API:

     ScriptEngineManager factory = new ScriptEngineManager();
     ScriptEngine engine = factory.getEngineByName("js");
     engine.eval("absolute/path/to/my_script.js");
    
  2. Let the script inside your .jar and registered it under using name as a script component. The you can run the script as follow:

     Framework.getService(ScriptingService.class).getScript("myScript").eval();
    

    This method is caching the compiled script so it is only supported for languages that support compilation. (actually all the engines that comes in nuxeo)

  3. Run a script from remote :)

    This can be used for debug, testing or administration. You write a script locally then you run it against a remote Nuxeo EP server. The script will be send to the server and executed on the server then the server will return the result (including STDOUT and STDERR) to the client.

    For security reason this feature can be disabled using a runtime property on the server.

    Here is an example on how you can run a remote script:

       ScriptingClient client = new ScriptingClient("localhost", 62474);
       URL src = RemoteTest.class.getClassLoader().getResource("test.js");
       RemoteScript script = client.loadScript(src);
       script.eval();
    

    For the following js script:

       importPackage(java.lang);
       importPackage(org.nuxeo.runtime);
       importPackage(org.nuxeo.runtime.api);
       importPackage(org.nuxeo.runtime.model);
    
    
       runtime = Framework.getRuntime();
       name = runtime.getName();
       version = runtime.getVersion();
       desc = runtime.getDescription();
       println("Remote runtime: "+name+" v."+version);
       println(desc);
       println("---------------------------------------");
       println("Registered components:");
       println("---------------------------------------");
       regs = runtime.getComponentManager().getRegistrations();
       for (var i=0, size=regs.size(); i<size; i++){
           println(regs.get(i).getName());
       }
    

    The following will be printed on the STDOUT of the client:

      Remote runtime: OSGi NXRuntime v.1.4.0
      OSGi NXRuntime version 1.4.0
      ---------------------------------------
      Registered components:
      ---------------------------------------
      service:org.nuxeo.ecm.core.api.DocumentAdapterService
      service:org.nuxeo.ecm.core.repository.RepositoryService
      service:org.nuxeo.runtime.remoting.RemotingService
      service:org.nuxeo.runtime.EventService
      service:org.nuxeo.ecm.platform.login.LoginConfig
      ...
    

So, this new feature can be used to write pure script based Nuxeo components. Also in future I will try to configure tomcat to be able to run scripts inside servlets. This means to be able to write we pages in php or other supported language for Nuxeo EP ;-)

Bogdan

I think this open a wide range of new possibilities and ease of use for the Nuxeo Platform to allow you create innovative and powerful ECM applications (and not only, actually, since Nuxeo Runtime can be use to create any extensible application on the Java Platform).

Stay Tuned!

EB.

Posted by Eric Barroca @ 09/12/2007 11:57 PM. - Categories: java, nuxeo -  0 comments
09/05/2007
Steve Raby joins Nuxeo as UK and Nothern Europe Manager

As you already might have seen, Steve Raby has joined Nuxeo as Director for UK and Nothern Europe and head of our London-based UK branch. We have started to do some great work, already and I’m just blogging about this to add some personal touch on this. :-)

Steve, a Sun and JBoss veteran, is a strong asset for our company and, of course, brings some new blood to our vision and management structure. The work we are doing is already productive and we recently signed our first large UK customer (you are going to read about that soon).

It’s also a really interesting to experience a shared vision for the business and the same enjoyment for the Open Source model. Same customer service orientation. And same faith in the success. The beginning of a long story, I'm sure.

Welcome on board Steve! We are going to continue building a great and successful Open Source vendor… (okay, okay, you’ve already been there for two months :-) )

Stay Tuned!

EB.

PS: Read the full PR here

Posted by Eric Barroca @ 09/05/2007 03:50 PM. - Categories: nuxeo -  0 comments
09/04/2007
How to invoke method expressions with parameters in JSF?

The usual way to use EL expression in JSF could seem a little too restrictive for some of us who are used to scripting languages.

For instance, if you'd like to display a bean property, you will write a getter on it:

public class MyBean {

  String myProperty;

  public String getMyProperty() {
      return myProperty;
  }

}

Then you'll be able to write the following value expression in a template:

<h:outputText value="#{myBean.myProperty}" />

Now imagine that your bean has to perform a more complex task to retrieve the property, like calling a service, and pass parameters to it. Even if there is always the possibility to pass the parameter using a "f:parameter" tag, the bean API will look kind of awkward. The more natural way to do so is to write a method with this parameter, and find a way to call it from the template.

For instance, we could have:

public String getMyProperty(String param) {
   // execute any function to get the result
   return function(param);
}
<h:outputText value="#{myBean.getMyProperty('foo')}" />

Sadly, there is no way to do that using "pure" JSF implementations.

That's where facelets can be very handy. In a very nice blog post, Andrew Robinson explains how to pass method bindings to children components using the facelet user tag system. I will explain how Nuxeo uses the same tricks to invoke method expressions with parameters as regular value expressions.

First let's define the famous MethodValueExpression class, that will behave as a regular value expression but will invoke a method expression when trying to resolve the value:

public class MethodValueExpression extends ValueExpression implements
        Externalizable {

    public MethodValueExpression(MethodExpression methodExpression,
            Class[] paramTypesClasses) {
        this.methodExpression = methodExpression;
        this.paramTypesClasses = paramTypesClasses;
    }

    ...

    @Override
    public Object getValue(ELContext context) {
        // invoke method instead of resolving value
        Object res;
        try {
            return methodExpression.invoke(context, paramTypesClasses);
        }
        catch(Throwable t) {
            return null;
        }
    }

}

Nuxeo benefits from an extension to the EL provided by Seam: it makes it possible to use parameters on any method expression without having to configure parameter types. That's why parameter types classes are never actually set in the Nuxeo code.

When this is done, we can use facelets meta rules to use this class instead of the generic one. This is done via a component handler:

public class GenericHtmlComponentHandler extends HtmlComponentHandler {

    ...

    protected MetaRuleset createMetaRuleset(Class type) {
        MetaRuleset m = super.createMetaRuleset(type);
        if (ValueHolder.class.isAssignableFrom(type)) {
            m.addRule(GenericValueHolderRule.Instance);
        }
        return m;
    }

}

This configuration tells to use the GenericValueHolderRule class when setting a component attributes. This rule does not do much but use our MethodValueExpression when appropriate, e.g. when brackets are detected.

We can configure tags to use this handler in a facelet taglib:

<tag>
  <tag-name>outputText</tag-name>
  <component>
    <component-type>javax.faces.HtmlOutputText</component-type>
    <renderer-type>javax.faces.Text</renderer-type>
    <handler-class>org.nuxeo.ecm.platform.ui.web.tag.handler.GenericHtmlComponentHandler</handler-class>
  </component>
</tag>

Note that there is no need to use another term than "value" as shown in this code (using "genericValue") as the last rule added to the MetaRuleSet will apply first and override the default behaviour.

The nxh taglib, using the namespace "http://nuxeo.org/nxweb/html" redefines all basic jsf html tags to use this handler.

We could add any number of attributes to be dealt in the same way than "value": for instance, being able to write <nxh:outputText rendered="#{myBean.getProperty('foo')}" /> can be handy too.

Now it can a little painful to define a new taglib with this handler when reusing custom tag libraries. The Nuxeo tag library defines a new tag "nxu:methodResult", that will make the result of the given expression available in the variable map:

<nxu:methodResult name="prop" value="#{myBean.getMyProperty('foo'}">
  <h:outputText value="foo" rendered="#{prop == 'bar'}" />
</nxu:methodResult>

The variable named "prop" is available inside the methodResult tag, as a row variable in a "h:dataTable" tag.

This behaviour is achieved using a specific tag handler that will use the MethodValueExpression presented above:

public class MethodResultTagHandler extends MetaTagHandler {

    private final TagAttribute name;

    private final TagAttribute value;

    public MethodResultTagHandler(TagConfig config) {
        super(config);
        name = getRequiredAttribute("name");
        value = getRequiredAttribute("value");
    }

    public void apply(FaceletContext ctx, UIComponent parent)
            throws IOException {
        String nameStr = name.getValue(ctx);
        // parameter types evaluation not needed using Seam
        MethodExpression meth = value.getMethodExpression(ctx, Object.class,
                new Class[0]);
        ValueExpression ve = new MethodValueExpression(meth, paramTypesClasses);
        ctx.getVariableMapper().setVariable(nameStr, ve);
        this.nextHandler.apply(ctx, parent);
    }

}

This tag handler is linked to the MethodResult tag in a taglib file:

<tag>
  <tag-name>methodResult</tag-name>
  <handler-class>org.nuxeo.ecm.platform.ui.web.tag.handler.MethodResultTagHandler</handler-class>
</tag>

Nice, huh?

Nuxeo Tag Library documentation: http://maven.nuxeo.org/nuxeo-platform-parent/nuxeo-platform-ui-web/tlddoc/.

Complete code mentioned above is available here:

Posted by Anahide Tchertchian @ 09/04/2007 04:13 PM. - Categories: nuxeo5 -  0 comments
09/03/2007
Nuxeo EP: the Service Oriented ECM Platform

If you’re one of those people that believe that SOA is more than a buzzword surrounded by hype, then this blog might be worth reading. As you might guess, I’m one of those. And for real-world solutions. :-)

Nuxeo EP is built around two simple yet powerful concepts:

  • Services: a service is a component of the Nuxeo platform offering some feature to others (ok, so you do know what a service is! :-). From a technical point of view, in Nuxeo’s case a service is an OSGi bundle.

  • Extension points: a service might provide one or more extension points so that other services can contribute extensions to this point (to configure the service or extend it). Think Eclipse Equinox extension system ported to the server-side.

Basically, Nuxeo EP is a set of services that mutually extend themselves (plus a bunch of business specific configuration files and UI) to offer a complete set of high-level ECM Services, ready to be integrated into the Service Oriented Architecture of your Information System (IS).

This is the future of ECM Platform providers and here is why… ;-)

One company, different needs

As you might know, or guess, a company’s departments can have very different needs related to content management.

Let’s take a few examples:

  • marketing people want an application to manage their pictures and videos so that they can quickly find and get the right picture to illustrate their new “fact sheet”.

  • the legal dept want a collaboration system to share and collaborate on legal documents. They also want a document management system to store all kinds of legal documentation.

  • engineers want a full-blown ECM system to handle collaboration, document management for industrial documents (blueprints, specifications, operation manuals, etc.)

  • the accountants want a system to manage invoicing and payment processes which can track physical items (e.g. incoming paper invoices, acceptance papers, delivery proof, etc.) and interact with SAP where all numbers are stored

  • the QA people want to manage their organizational diagrams and processes to be under version control and workflow. They also want the engineers to use a document management system to enforce audit and compliance on produced documents (specs, op. manuals, etc.).

Of course, I could add dozens of examples, and I’m sure you could too.

All those needs might require very different UI, processes and business logic. But they still have some crucial common parts…

On to a Central Content Platform

Looking at those needs, besides their specifics, we can quickly define some common requirements:

  • Content storage: scalable and secure content storage for short term (e.g. press release) to long term storage (e.g. documents of specifications).

  • Flexible content model: to address all those needs, a flexible content model is required. Hence the content storage needs to be flexible to store any kind of content model.

  • Security and access control: all the managed content needs to be secure and access controls have to be carefully applied. Hence the need of global security and access.

  • Search: you need to search all this content. Hence the need for a flexible (to be adaptable to different content model) indexing and search engine. If you can search all the managed content using one UI it would be even better.

  • Process management / workflow: to support everything from simple approval processes (specification draft) through to complex business processes (invoicing) or complex hierarchical approval processes (legal docs, specifications) you need an enterprise process engine, deeply integrated with your content repository.

  • Relation management: wouldn’t it be great to be able to track dependencies between specifications documents, or between legal documentation and contracts? Or track links between pictures? Or maybe just track impacts between the specification and the operations manual? It might even be in the requirements! Well, to do this you need a powerful relation engine.

  • User Notification: People want to be able to subscribe to changes so that they are notified (via email or RSS feed) when documents change. Let’s set up a notification system with email and RSS support (and maybe IM or SMS).

  • Content Rendition: pictures need to be resized or cropped, word documents have to be converted as PDF after approval (e.g. for distribution or long term archiving), etc. You need an extensible content rendition system that allows you to define your renditions and maybe write your own rendition plugins.

  • Directories / Vocabularies: You need to manage lists of terms to populate lists of choices in metadata forms, workflow screens, etc. You might also want to lists to come from your ERP system (e.g. project codes, imputation codes, customer list), some other applications or LDAP servers (customer lists, user lists, etc.). You need a flexible service to centrally manage lists (flat or hierarchical), stored into SQL or LDAP, and bind them to forms in your application.

  • Audit: last but not least, you want all actions performed in the applications by users or other applications to be logged. You also might want to create reports from that data. Hence the need of a central audit trail.

This is what we define by Central Content Platform: a unique place offering content related services consumed by applications for end-users. End-users might see/use very different applications/UIs, services and storage are centralized to dramatically reduce maintenance cost and improve maintainability. And it’s much easier to secure (high-availability, physical protection, security audit, etc.) one central platform than each aspect of several different applications (with their own storage, language, platform, etc.).

One platform, many applications…

With Nuxeo Service Platform this pattern can become a reality. You can set up a scalable and reliable platform for ECM and make your business applications consume those services. Each application for end-users might be written in different languages, implement different paradigms, serve different users with different business needs.

Moreover, Nuxeo EP leverages standards and patterns to offer a wide range of communication systems. Java applications can use the java remoting system (EJB3 Remoting / POJO Remoting) and get access to the native API.

You prefer .NET, Ruby or PHP? Go on! Nuxeo EP also offer a wide range of Web Services (SOAP or REST) which enable integration with with virtually any software language / platform.

Need a workflow engine for your existing Spring based application? Just embed Nuxeo Runtime in your contract management app, connect to our Nuxeo EP instance and integrate your app with Nuxeo Workflow Service. Need advanced document storage with versioning and security? Just contribute your content type and store your documents into your Nuxeo EP’s Content Repository and access through the API or directly from HTTP links. We will take care of all complex document storage details such as access control, versioning, file streaming, transactions, etc. Need to add search? Plug your app to your Nuxeo EP’s Indexing and Search Service!

No more “one size fits all” ECM application

This is really what we think as the future of ECM. One application cannot fit all needs of content and information management in an organization straight out of the box. End-users ask for more and more adapted applications to improve their daily work flow. They require more security, ease of use, accountability, business focus… Why not avoid those “$10M, 3 years” burdens that made ERPs famous and deliver more to your users? More dynamic, more usable, more often, more complete…

ECM platforms should not be huge monolithic applications. The SOA pattern gives a golden opportunity to deliver great applications to your end users while keeping all the advantages of reusable and centralized software.

This is our real business. This means a lot to us. It’s available today. Try Nuxeo EP.

Still thinking Open Source cannot innovate?

Stay Tuned! ;-)

EB.

Posted by Eric Barroca @ 09/03/2007 03:03 PM. - Categories: ecm, java, nuxeo, nuxeo5 -  0 comments
08/30/2007
Nuxeo EP 5.1.0 GA aka "Memphis" released!

I know this might not be fresh news, but as there is no mention on blogs.nuxeo.com, here it is! :-)

Our core team haven’t rest a lot this summer… Nuxeo 5.1.0 GA (codenamed Memphis) has been tagged and released two weeks ago! This new released is a big step forward for our ECM platform. There is many new features and technical improvement in this release. And that’s great! :-) To get an overview of what’s new in this release, please see the document “New and Noteworthy”.

From a technical point of view the release of Nuxeo EP rely on strong foundations: Nuxeo Runtime 1.3.2 and Nuxeo Core 1.3.2 has been also released some days before the full platform.

For developers and integrators, all artifact have been seeded to our maven repository. Maintenance related fixes will be done in the 5.1 branch in our repository (which will roll out 5.1.x series) while major development work for new features will happen in the trunk (5.2.x series).

A special note on the performance and scalability front: we have done a huge work on this side and ran extensive performance testing (benchmark docs will be published in the following days). But most important… the platform is fully scalable at the service level. This means that platform services can be spread on any number of servers (which means you can tailor your deployment architecture to your application needs). Plus, the platform’s build system (fully based on Maven) allow to easily generate nuxeo.ear for each machine of your multi-machine deployment infrastructure, depending on services you select by configuring assemblies for each of your server (ex: nuxeo-search.ear for the search and indexing machine, nuxeo-core.ear for the content repository server, nuxeo-platform.ear for other services and nuxeo-web.ear for you web front-end).

Thanks a lot to the whole development team and the supporting community. This is a really good piece of software! On to the next version!… ;-)

To go further:

I hope to be able to blog more in the future to give more update on the software (new features, improvements, tips) and on the ECM in general. So much to say, so little time… ;-)

Stay Tuned!

EB.

Posted by Eric Barroca @ 08/30/2007 03:55 AM. - Categories: ecm, java, jboss, nuxeo, nuxeo5 -  0 comments
07/31/2007
Nuxeo 5.1 RC released - GA release scheduled for next week

We have been so busy the last couple of months working on customers projects that the 5.1 release has slipped a bit, but I’m happy to report that we have just released Nuxeo 5.1 RC.

The final release (Nuxeo 5.1.0.GA) will be made next week, and then we’ll spend the rest of August:

  • finishing and polishing the Nuxeo Book.

  • planning and starting the next iteration of Nuxeo (5.2, cf. the current roadmap).

  • Working on new customers projects (including some that feature the Apogee Project which has already seen a recent surge of activity).

  • getting some holidays :)

Posted by Stéfane Fermigier @ 07/31/2007 11:38 AM. - Categories: apogee, ecm, java, nuxeo, nuxeo5 -  0 comments
07/19/2007
The end of my work on French grammar checking
I have finished my first work on LanguageTool. I have adapted the tool to French grammar checking. The following resume presents the end of this work.
You can download the report ( Mémoire) and the slides ( Soutenance). They are written in French.

Work on rules

As I explained previously, at the beginning of the French grammar checker project, Myriam Lechelt has worked on An Gramadóir. She has written many disambiguation and correction rules. Since An Gramadóir was limited and did not really suit to French, it was abandoned.

During my work, I have converted An Gramadóir's rules to LanguageTool. Thanks to Marcin Miłkowski who implemented a disambiguator, according to my instructions, I could import disambiguation rules as well as correction rules. Moreover, I simplified them a lot and I considerably reduced their number thanks to the XML language.

Then, I have analysed a corpus of mistakes (V. Lucci et A. Millet, 1994, L'orthographe de tous les jours, enquête sur les pratiques orthographiques des français, Editions Champion) and I have extracted  new grammar rules from it.

LanguageTool can detect the following kind of mistakes :
  • phonetic proximity (confusion of homophones like ont and on, ça and sa, etc.)
  • mistakes in verb phrases (confusion between infinitive and past participle, conjugated form and past participle, etc.)
  • subject-verb agreement (personal pronoun or noun phrase with only a determiner and a noun)

Limits of the formalism

While working on the rules, I made tests that showed me the limits of the formalism of LanguageTool. Because of the rigid pattern matching on which it is based, if the patterns described in the rules do not exactly match the text, the rules become inefficient and prevent some mistakes from being detected. Moreover, it is necessary to foresee every wrong combination of words to describe them in the rules. It leads to a combinatory explosion of the number of rules, especially in noun phrases.

The formalism also generates lots of wrong alarms, because of ambiguities or wrong tags. Some mistakes can be detected simultaneously several times by different rules. And when a word is wrong, it can cause wrong alarms on nearby words, since the rules are based on the context.

New formalism

I have developed a new formalism to improve French grammar checking in LanguageTool. It is based on chunks and unification of features structures (see An alternative with chunks and unification). I mix a contextual syntactic theory (chunks, Abney) and a generative syntactic theory (unification, Chomsky). This is not a typical combination, but it makes possible to go further in grammar checking by delimiting an area in the sentence where all words must agree. It is then no longer necessary to describe all wrong combinations of words. Instead of listing agreement mistakes, inconsistencies are detected in phrases.

Conclusion

Thanks to my work for my MPhil, French grammar checking is available for OpenOffice.org. But there is still a lot of work left. It is necessary to create a tool compatible with the new formalism, and to build and analyse a corpus of mistakes to write new grammar rules.

A new approach for grammar checking

To improve grammar checking, I am considering another method which consists in doing at the same time the morphosyntactic analysis and the grammar checking, while the sentence is read. This "left-right" method is based on the principle of latencies (Tesnières, 1959). With the declaration of what is expected after a word or a phrase, inconsistencies can be detected, instead of listing all possible mistakes.
This approach will also solve the problem of the vicious circle in grammar checking. Indeed, for mistakes to be detected, the tagging must not be wrong. But for it to be correct, the text must not contain any mistake...
Posted by Agnes Souque @ 07/19/2007 04:34 PM. - Categories: indesko, openoffice -  0 comments
07/14/2007
Interview ratée du Debian Project Leader dans Le Monde

Le quotidien Le Monde a récemment fait paraître une interview du “Debian Project Leader”, Samuel Hocevar. La lecture de cette interview m’a passablement agacé, car d’une part elle propage à mon sens un certain nombre d’idées reçues sur le libre (“logiciels d’informaticiens pour informaticiens”) plutôt que d’en faire la promotion auprès d’un public assez large, et d’autre part elle s’attache plus à faire la promotion de Wikipedia que du logiciel libre.

Plutôt que de critiquer point par point les réponses de Samuel aux questions du journalistes, j’ai préféré refaire mes propres réponses à ces même questions.

Que pensez-vous du succès grandissant des logiciels Firefox ou OpenOffice ?

Le navigateur web Firefox et la suite bureautique OpenOffice.org partagent un certain nombre de caractéristiques qui expliquent leur succès actuel, auprès du grand public et de certaines administrations ou entreprises:

  • Ce sont des logiciels qui répondent aux deux principaux besoins génériques des utilisateurs d’informatique: l’accès au Web et la bureautique.

  • Ce sont des logiciels multi-plateformes: ils tournent à la fois sous Windows (qui reste la plateforme dominante du marché), sous Mac OS et sous Linux.

  • Ce sont des projets matures: Firefox est issu de la base de code du navigateur Netscape développée dans les années 90, OpenOffice.org de la suite StarOffice dont le développement a démarré en 1994.

  • De plus, il s’agit de projets qui disposent d’une force de travail importante, constituée en partie de personnel d’acteurs majeurs de l’informatique (IBM, Google, Sun, Novell…) qui ont un intérêt stratégique à contrer l’hégémonie de Microsoft sur le poste de travail.

  • Il y a un lien très fort entre ces deux logiciels et les standards ouverts sous-jacents: les standards du Web pour Firefox, le standard ISO Open Document Format pour OpenOffice.org. La lutte de lobbying intense à laquelle Microsoft se livre depuis plus d’un an pour faire normaliser son propre standard “Open XML” auprès de l’ISO (en dépit du bon sens: pourquoi créer une deuxième norme alors qu’il en existe déjà une?) et selon des méthodes peu consensuelles montre l’importance stratégique des standards dans l’informatique actuelle.

Il faut noter cependant une différence importante: le principal concurrent de Firefox est Internet Explorer (et, dans une moindre mesure, sur plateformes Mac OS, Safari) qui est intégré comme navigateur par défaut dans les systèmes Windows de Microsoft, donc vu comme gratuit par les utilisateurs, alors qu’OpenOffice.org se positionne principalement face à la suite Office de Microsoft, qui est onéreuse:

  • Pour arriver à convaincre une proportion significative d’utilisateurs d’installer Firefox plutôt que le navigateur par défaut, les développeurs de Mozilla doivent se différencier par la qualité et les fonctionnalités de leur produit. Ainsi, face à Microsoft qui, une fois qu’il a cru avoir gagné la “guerre des navigateurs” face à Netscape au début des années 2000, a cessé toute innovation sur son navigateur, Mozilla a introduit des dizaines d’innovations comme par exemple la navigation par onglets ou les bloqueurs de popups, innovations plébiscitées par les utilisateurs à tel point que Microsoft a été obligé de les copier dans IE 7.

  • Dans le cas d’OpenOffice.org, la différenciation se fait le plus souvent par le prix: la suite Office 2007 de Microsoft coûte, typiquement pour une PME, de 500 à 950 euros par poste, ce qui est du même ordre de grandeur que le prix d’un ordinateur bureautique d’entrée voire de milieu de gamme. C’est un coût important qui rend tentante l’offre gratuite d’OpenOffice.org.

Justement, comment voyez-vous l’avenir des logiciels libres grand public ?

Le succès de logiciels comme Firefox, OpenOffice.org ou VLC (logiciel de lecture vidéo lui aussi multi-plateformes) montrent que des logiciels clefs, qui peuvent représenter jusqu’à 80 ou 90% de l’utilisation quotidienne de l’informatique par un grand nombre d’utilisateurs, peuvent être des logiciels libres.

Ce succès d’un petit nombre de logiciels généralistes, et d’une “longue traîne” de logiciels plus spécialisés, sur des plateformes comme Windows et Mac OS, permet d’éduquer le grand public sur l’existence et la qualité des logiciels libres, et peut en amener un certain nombre à vouloir également s’intéresser à Linux en tant que système d’exploitation pour postes de travail. Les principaux éditeurs de systèmes d’exploitations basés sur Linux - Red Hat, Novell, Mandriva, Ubuntu - constatent actuellement une demande du grand public, et de la grande distribution, sur ce secteur, et plusieurs indices, dont le “flop” du lancement de Vista, laissent à penser que 2008 sera l’année du décollage de Linux auprès d’une partie du grand public.

Signalons par ailleurs un autre facteur de diffusion des logiciels libre: le logiciels embarqués dans du matériel spécialisé. En France, la Freebox, la NeufBox et de nombreuses autres “appliances” intègrent déja depuis plusieurs années un noyau Linux et de nombreux logiciels libres. Dans les pays émergents, des initiatives comme le One Laptop Per Child sont également une façon de diffuser des logiciels libres auprès de millions d’utisateurs, qui ne connaissent pas Windows et qui sont donc vierges de tout a priori.

De plus en plus d’entreprises privées et d’administrations passent aux logiciels libres, désormais considérés comme des “concurrents” par Microsoft. Existe-t-il encore des freins à leur généralisation ?

Dans le domaine des logiciels d’entreprises, la donne est très différentes: il n’y a pas un seul acteur qui domine outrageusement le marché, mais plusieurs acteurs dominants: Microsoft certes, mais aussi SAP, IBM, Oracle, EMC, etc.

Sur ce secteur, et principalement dans le domaine des logiciels serveurs, d’abord au niveau des logiciels d’infrastructure (communication, bases de données, monitoring, etc.) et progressivement au niveau des logiciels applicatifs (CRM, ERP, GED, ECM, etc.) les logiciels libre se sont dans certains cas déja imposés (ex: Apache) et dans la plupart des autres, connaissent une progression rapide.

Un frein notable à cette progression est l’ensemble des pratiques anti-concurrentielles de Microsoft, dénoncées par l’ensemble de l’industrie, et qui lui ont valu de très nombreux procès et un certain nombre de condamnations.

A part cela, il est plus approprié de parler d’inertie que de freins, car il s’agit de faire changer les mentalités des décisionnaires, de former les informaticiens de terrain, et de remplacer des investissements qui s’amortissent sur de 5 à 10 ans.

Une chose est claire pour toute le monde cependant: le logiciel libre a changé la façon dont les logiciels sont développés par les éditeurs. Ceux-ci se reposent de plus en plus (31% en 2006, probablement plus de 50% en 2007 ou 2008) sur des “briques” logicielles libres, mais aussi “ouvrent” de plus en plus leur modèles de développement. D’ici 5 ans, je suis certain que la plupart des éditeurs de logiciels d’entreprises auront intégré, d’une façon ou d’une autre, une partie des méthodes du libre dans leurs développement.

Comment voyez-vous l’évolution de la distribution de contenus culturels ?

On est dans un cas d’école d’innovation disruptive, selon le modèle de Clayton Christensen:

  • Les anciens acteurs dominants, les “majors”, s’accrochent à leur ancien modèle à présent obsolète et militent pour des législation drastiques dans ce domaine.

  • Une partie du public à trop rapidement adopté les pratiques d’échange en pair à pair ou sur des sites de mise en ligne de contenus, sans intégrer les limites légales et morales de leurs pratiques: échanger des contenus libre et contribuer de manière communautaire à leur développement, c’est bien; échanger des contenus soumis à des restrictions d’usage en dehors du droit à la copie privée, c’est mal.

  • Les pouvoir publics, faute d’une vision claire sur ce dossier, et sous l’influence des lobbies du passé, ont voté des lois très dures (DADVSI) visant à maintenir le statu quo.

  • Un petit nombre d’acteurs, à commencer par Apple, a saisi au moment opportun la rupture technologique et s’en est servi pour reconfigurer la chaîne de valeur de la diffusion des contenus autour de leur offre de produits (ex: l’iPod) et de services (ex: iTunes), réalisant ainsi le “hold-up du siècle” sur l’industrie musicale, et bientôt sur l’industrie cinématographique.

A mon sens, l’un des rôle majeurs d’un projet comme Wikipedia, qui montre la force mais aussi les limites du modèle de développement coopératif, est d’éduquer le public sur ces questions de production communautaire de contenu et sur les questions de droits d’usage des contenus, de montrer qu’il y a plusieurs modèles possibles, chacun avec ses forces, ses faiblesses et ses tabous, et qu’il est important de les connaître pour se comporter de manière citoyenne.

Posted by Stéfane Fermigier @ 07/14/2007 06:38 PM. - Categories: indesko, linux, mozilla, openoffice -  0 comments
07/13/2007
Mémoire et slides (Correction grammaticale du français)
Voici mon mémoire de recherche ainsi que les slides de soutenance.

Le travail que j'ai effectué est integré à LanguageTool et utilisable dans OpenOffice.org en tant qu'extension. Le fichier ainsi que les instructions d'installation sont disponibles sur le site de LanguageTool.

D'autres personnes ont commencé à écrire de nouvelles règles pour le français, pour augmenter la couverture de correction de LanguageTool. La base de règles évolue donc régulièrement.

Il reste encore beaucoup de travail...
Posted by Agnes Souque @ 07/13/2007 03:04 PM. - Categories: indesko, openoffice -  0 comments
Nuxeo Bloggers: Log in!
Nuxeo - Indesko - Nuxeo 5 Project
All content is copyrighted by their author.
CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS.