Document Actions

Nuxeo - All posts

06/25/2009
The Promises of Modern Chemistry

By now, most of you should have heard about CMIS, the upcoming specification that promises interoperability between many systems for common content management tasks. The CMIS specification is being driven by an OASIS Technical Committee and is currently still a draft; it is expected to be finalized late 2009 or early 2010.

I won't detail here all that CMIS will bring, this has been covered extensively already and will be even more in the future... No, the purpose of this article is to present Chemistry.

Chemistry

Chemistry is a new Apache project for CMIS that started incubating recently ("incubation" is the term used in the Apache Software Foundation for young projects that still have to prove themselves). Chemistry's goal is to provide general purposes libraries for interaction using CMIS between a server and a client. These libraries are mainly written in Java, but some JavaScript code has been added as well, and we're open to more.

Chemistry provides a high level API so that a developer can manipulate objects like documents or folders and can call simple methods on them without having to deal with details of a specific low-level communication transport. In addition to that, Chemistry also provides a SPI (Service Provider Interface) for backend developers, making it quite easy to use Chemistry to store documents in a project-specific manner.

Underlying this, Chemistry has implementations for the CMIS transports. CMIS specifies two mandatory transport protocol bindings (one extending AtomPub, for a lightweight RESTful HTTP interface, and another using SOAP for a WebService-based interface), and Chemistry will support both — and probably more in the future.

The current Chemistry code base has an initial version of the API/SPI together with some actual implementations around the AtomPub protocol. Already Chemistry can talk to itself (AtomPub client talking to AtomPub server) and store data in-memory (which is very handy for unit tests). Outside of the Apache code base, Nuxeo has also coded a backend to provide access to Nuxeo 5.2 repositories using Chemistry. Generic CMIS AtomPub clients like CMIS Explorer are able to see a Nuxeo repository through Chemistry for instance.

Chemistry Modules

The following modules will be available in Chemistry:

  • The APIs: a low-level SPI between a client and a server that mirrors the CMIS specification closely (it is expected that the SPI will be used when either the client or the server implements one of the HTTP protocols defined in CMIS), and a high-level API that wraps the SPI to provide more object-oriented notions of connections, folders and documents, and that hides the nitty-gritty details of the protocols.
  • A set of common Java utilities around CMIS, for instance a parser to turn CMIS SQL into an AST (Abstract Syntax Tree) that can be reused by different backends, or a generic in-memory implementation of the SPI and API for unit testing.
  • Four implementations of the SPI for the protocols defined by CMIS: an AtomPub server and client, and a SOAP server and client.
  • A generic implementation of the API-to-SPI wrapping, so that a third-party implementation of just the SPI can be plugged into the rest of the Chemistry framework. (Some of the four basic protocol implementations may also provide the full API when this is more efficient than using the generic wrapping.)
  • An implementation of the APIs as a JCR backend.
  • A set of generic tests for CMIS servers and client, providing an unofficial TCK for CMIS.

In the future, it is expected that more implementations of the APIs will be available, for example we envision new transports:

  • A WebDAV-based transport.
  • An HTTP-based transport less RESTish and more friendly to browsers and JavaScript.

And new backends:

  • A backend storing documents on the filesystem, with or without metadata.
  • A backend storing documents in the Google AppEngine Datastore.
  • A backend storing documents using Microsoft Windows SharePoint Services.

The Pieces of the Puzzle

As you can see, these modules will allow for wide interoperability between systems. Here's a graphical representation of the building blocks:

The User Application speaks the API:

The API can be implemented in many ways. First, it could be a direct backend:

Or, more commonly, the API will be implemented as a client binding for a specific protocol, SOAP of AtomPub:

Each protocol speaks in its own way on the wire:

And this is connected to a server that speaks the protocol as well:

Finally, behind the server, a backend has to store the actual information somewhere:

Anyone is welcome to create new pieces, for instance new protocol bindings:

Or new storage backends:

Now let's see how the main pieces can be plugged together.

The simplest connection is between an application and a direct backend:

If the backend only wants to deal with the SPI, its implementation can reuse the API-to-SPI to provide a full API experience:

When talking through a wire protocol, we plug together a client and a server:

The end result is an application talking to a backend through a wire protocol:

Of course we can get creative and plug many more together:

Development

All of this is still a work in progress (even the spec!), but you should expect rapid changes in the available features in the coming months as the spec settles down, more code is written, more test cases are written, and more testing against third-party implementations is done.

If you're interested in helping, please join the list chemistry-dev@incubator.apache.org by sending an empty email to chemistry-dev-subscribe@incubator.apache.org.

Posted by Florent Guillaume @ 06/25/2009 03:26 PM. - Categories: ecm, java, nuxeo, nuxeo5 -  0 comments
06/23/2009
Florent Guillaume on CMIS and Apache Chemistry

A few weeks ago I gave an interview to Irina Guseva of CMSWire. We touched the subjects of strategic value of CMIS, Apache Chemistry project history, partnerships, open source, future plans around CMIS, and more.

Chemistry has extremely ambitious plans. We believe that it can become the de facto bridge between most of the Java-based content-oriented products, allowing a very wide variety of back-ends and applications to be connected together. And actually Java is not the sole language that this project is targeting, as David Nuescheler is also working on a JavaScript library for CMIS. In the coming month you should see an exponential increase in the functionality that Chemistry provides...

You can read the full article at CMSWire.

Posted by Florent Guillaume @ 06/23/2009 03:30 PM. - Categories: ecm, java, nuxeo, nuxeo5 -  0 comments
05/31/2009
Already 10000 views for the Nuxeo DM 5.2 teaser

My short musical screencast about Nuxeo DM 5.2, which I made two weeks ago to showcase the new features of Nuxeo DM 5.2, has already received 10000 views on nuxeo.tv, our TV channel dedicated to screencasts, presentations and interviews about the Nuxeo products, technology and community.

I you haven't seen it already, here it is:

Then download Nuxeo DM 5.2.

Posted by adminsf @ 05/31/2009 04:31 PM. - Categories: nuxeo5 -  0 comments
04/17/2009
Témoignage OEM en vidéo - Nel Taurisson (SkinSoft)

J’étais hier à une présentation de SkinMuseum, application de gestion de collections de musées dévelopées par la jeune entreprise innovante SkinSoft.

Nel Taurisson, responsable de la R&D chez SkinSoft, a accepté de répondre à mes question au cours d’une courte interview vidéo que vous trouverez ci-dessous.

Transcription de l’interview

Bonjour, je suis Nel Taurisson, je m’occupe de la R&D chez SkinSoft.

SkinSoft, c’est une société qui développe des applications sur la base de Nuxeo. Notre première application est une application de gestion de collections de musées. On va ensuite s’orienter vers des applications métiers: photothèque, médiathèque, bibliothèque.

Ca fait plus d’un an maintenant qu’on travaille en recherche et développement sur ce sujet. On est parti sur la base de Nuxeo, après avoir regardé de tous les côtés ce qu’on pouvait utiliser comme moteur. On est parti sur Nuxeo pour l’architecture de composants, pour l’architecture globale du produit, parce que c’est une plateforme sur laquelle on arrive très bien à développer, et parce qu’il y a une vraie communauté, qui marche bien, qui répond bien.

Voilà, c’est un beau produit et on essaye de faire de beaux produits avec.

Posted by adminsf @ 04/17/2009 08:33 AM. - Categories: java, nuxeo, nuxeo5 -  0 comments
03/09/2009
Java Management Interface, coming in 5.2 RC1

Credit Where It's Due

This blog post would not have been possible without the diligent and thoughtful assistance of Stéphane Lacoin. Stephain gave me the big clues on how to configure the current Nuxeo 5.2 "head" to make all the JMX stuff "turn on" and also personally tracked down and squashed a number of bugs that made this article possible (including one in JBoss!). Somebody, please give that man a croissant!

Not Quite in 5.2.0.m4

This blog post is about a feature that did not quite make it into the 4th milestone release of Nuxeo 5.2. It is in the current source code build (or you can get it from the nightly snapshots) but I thought those that are waiting for the release candidate release might be interested to see what's "coming down the pike." As of the time of this writing, it is expected to be in the 5.2 RC1 release.

JMX: The Java Management Extensions

The Java Management Extensions or JMX debuted with Java 1.5. These extensions allow an client application to manage and monitor a collection of devices, computers, and services. A simple example might be a three-tiered web application, deployed on three separate servers. One would like to all the feedback for monitoring sent to a single place, often called the management client, where a human can look at the incoming data or the management client itself can process the data with some type of analysis procedure. The data from a three-tiered web application would include data from all the software layers such as database, application server, and application itself, but might also include reports from the network connecting the layers, and some type of hardware monitoring daemons as well. Once this has been analyzed, by a human, computer, or both, then one may want to take some time of actions to manage the objects being monitored; in the example of the three-tiered application one can easily a imagine a scenario where you would want to begin an orderly shutdown procedure. Coordinating all the layers of software, on different machines, to do this gracefully is clearly a management problem!

Configuring Nuxeo

You need to install one extra bundle into your build of Nuxeo 5.2 to get this to work, the bundle is nuxeo-runtime-management. You should put the bundle in <nuxeo-install-dir>/server/default/deploy/nuxeo.ear/plugins. Nuxeo is smart enough to "pick up" this additional modules and its features the next time you fire it up just because it is present. So, go ahead and restart (or start) your server with the script bin/run.sh (or run.cmd for windows).

Connecting JConsole to the running Nuxeo

With Nuxeo now running and exposing its management interfaces you need to hook up a JMX client to see what's going inside the Nuxeo system. You can do this with the program "jconsole" that is supplied with the Java development kit (at least if you got your JDK fairly recently). When you fire it up, you'll need to enter these three values:

???

The first one is the tricky one, obviously! The hostname of the machine running nuxeo in this example is "localhost" and the nuxeo server, at the time of this writing, exposes its java management interface on port 2009. Normally, the nuxeo server does not require a username and password, the second and third values, since the management service is "turned off" by default and you have to turn it on by installing bundles.

Getting information about the system state

Once you have connected, you will see the set of mbeans (management beans) that are exposed by Nuxeo:

???

When you drill down into a category you can get statistics about the objects in that category. In the previous screenshot, for example, if you click on the "metric" you will get information about the number of http sessions that have been created and destroyed while the nuxeo server is running. Also shown in the previous screen shot is the "probe" mbean. This mbean does a probe of the repository periodically (about every 30 seconds) and keeps track of the successes and failures. Here's what you would see if you drilled down into that item:

???

So, from this display you can see that the probe has been run 25 times without failures and the last probe took 111 milliseconds.

Running Methods

You can not only see data with the JMX interface to Nuxeo, but you can execute management functions as well. For example in the previous screen shot, if you click on the "Operations" button you will be presented with a list of methods that you can execute from the management interface. One of these is "disable" that you can use to turn off the probe behavior. When you run a method via the management interface, normally you get a message like this:

???

More fun, though, is to map the Nuxeo Runtime's inventory with the RuntimeInventory's factory object. You can click on that object (shown in the upper left) of the probe screenshots under category "nx" then category "Runtime" inventory. You will have your choice of a number of methods on that screen that give information about your system, but the most interesting one is "bindTree". The result of clicking this button to invoke that method is shown in this screenshot:

???

This takes all the bundles and components that the runtime knows about and makes them available via the JMX interface, so you can get some statistics about them, although in many cases this is quite minimal. You can, of course, unbind that tree of objects with "unbindTree" as shown in the screenshot above.

Getting audited isn't all bad...

Athough it's getting perilously close to tax day for our friends in the United States (tax days come in September in France), we are going to bring up the subject of auditing anyway. Many of the actions that users peform when using Nuxeo are audited (assuming you have deployed the auditing bundle) and some summary statistics can be seen via the JMX interface. The following screen shot gives you a feeling for what types of actions can be seen via the NXAuditService category:

???

The JMX interfaces are a nice way to interact with enterprise software systems like Nuxeo and not only keep an eye on how they are running but also perform many basic management functions. This is going to be a standard feature of Nuxeo starting with the 5.2 GA release and you can expect that we will gradually be exposing more functionality through JMX is we go forward.

If you have questions or comments about Nuxeo and JMX, or this article, drop me a note at ismith [at] nuxeo [point] com. We would especially love to hear from folks who have specific needs for functions to be exposed by Nuxeo through the management interface.

Posted by Ian Smith @ 03/09/2009 06:01 PM. - Categories: ecm, nuxeo5 -  0 comments
03/05/2009
Two more soldiers down

Double your pleasure, Double your fun

This has been a busy week on the book front and general purpose Nuxeo chicanery. I've got a number of things cooking right now so that I can show off features of the platform. Both of these features are of the "should be available to the public soon" type, so I've had to actually garner time from the developers to get them to give me demos and let me capture screenshots. I hope both of those posts are available tomorrow. I have the text of one written and the other should not be a major hurdle.

I've completed two more chapters, one fairly short one about XMap and Apache Commons Logging. I think it's only fair to admit that I really dislike Apache Commons Logging beacuse it gives people the distinct idea that writing more logging frameworks is ok. I can't believe the Apache Foundation green-lighted a meta-framework for logging when they already had log4j. Oh well. The other part of the short chapter is about the XML un-pickling code we use, which is packaged up into the library XMap. We use this all over the place in nuxeo to read in snippets of XML and turn it into Java objects.

The more meaty chapter is about Users and Groups. I got a lot of help, particularly on understanding directories and the test infrastructure needed for UserManager, from Anahide T. Big props to her! In this lesson, we construct a new class that interacts with the UserManager to create a new Group. Although it doesn't do that much, there is quite a bit to working with the UserManager service because of the wide variety of situations in which Nuxeo can be deployed. We have to work with flat text files, databases of user records of all sorts, and LDAP to be good enterprise citizens. I think that folks that read this chapter will come away realizing, more than anything, that messing around with users and groups in a big organization is serious business!

As always, the whole text can be found at http://www.nuxeo.com/static/book-draft.

Posted by Ian Smith @ 03/05/2009 09:30 PM. - Categories: ecm, nuxeo5 -  0 comments
03/04/2009
Selenium and Ajax Requests

Following Lance Ivy's excellent post (http://codelevy.com/articles/2007/11/05/selenium-and-ajax-requests), here's an easy way to write Selenium tests for Ajax requests when you're not using Prototype directly, but using Ajax4JSF or RichFaces.

Add this to your user-extensions.js:

/**
 * Registers with the a4j library to record when an Ajax request
 * finishes.
 *
 * Call this after the most recent page load but before any Ajax requests.
 *
 * Once you've called this for a page, you should call waitForA4jRequest at
 * every opportunity, to make sure the A4jRequestFinished flag is consumed.
 */
Selenium.prototype.doWatchA4jRequests = function() {
  var testWindow = selenium.browserbot.getCurrentWindow();
  // workaround for Selenium IDE 1b2 bug, see
  // http://clearspace.openqa.org/message/46135
  if (testWindow.wrappedJSObject) {
      testWindow = testWindow.wrappedJSObject;
  }
  testWindow.A4J.AJAX.AddListener({
    onafterajax: function() {Selenium.A4jRequestFinished = true}
  });
}

/**
 * If you've set up with watchA4jRequests, this routine will wait until
 * an Ajax request has finished and then return.
 */
Selenium.prototype.doWaitForA4jRequest = function(timeout) {
  return Selenium.decorateFunctionWithTimeout(function() {
    if (Selenium.A4jRequestFinished) {
      Selenium.A4jRequestFinished = false;
      return true;
    }
    return false;
  }, timeout);
}

Selenium.A4jRequestFinished = false;

Instead of using pauses or waitForCondition (writing some esoteric javascript test to detect that the Ajax call ended), you can then write something as simple as:

<tr>
  <td>watchA4jRequests</td>
  <td></td>
  <td></td>
</tr>

... (command triggering the ajax call) ...

<tr>
  <td>waitForA4jRequest</td>
  <td>10000</td>
  <td></td>
</tr>
</pre>

For Selenium beginners, the Javascript code has to be placed in a file named user-extensions.js and passed as an attribute to the Selenium Server command line option "-user-extensions <file>". When using Selenium IDE, it can be set in the options menu (don't forget to close the IDE and restart it for this to be taken into account).

Posted by Anahide Tchertchian @ 03/04/2009 07:12 PM. - Categories: java, web -  0 comments
Cross validation with jsf

I'm happy to have found an elegant (and easy) way to handle cross-validation of JSF components. The idea is to add an hidden input and bind it to a validator, passing component ids to validate as attributes:

<h:inputHidden value="needed" validator="#{myBean.validatePassword}">
   <f:attribute name="firstPasswordInputId"
     value="#{layout.widgetMap['firstPassword'].id}" />
   <f:attribute name="secondPasswordInputId"
     value="#{layout.widgetMap['secondPassword'].id}" />
</h:inputHidden>

Note that here, component ids are retrieved from the layout widget ids, but any id bindings will do (taking example on what's done when adding a "for" attribute to a h:message tag for instance).
Note also that the hidden input has to be placed after the referenced components in the page, so that they have been validated, and had their local values set.
The "needed" value is just here because the tag requires its "value" attribute to be set on my box, even if it's useless for us.

The validation method can then retrieve the components values:

public Object retrieveInputComponentValue(UIComponent anchor, String componentId) {
    Map attributes = anchor.getAttributes();
    String inputId = (String) attributes.get(componentId);
    UIInput component = (UIInput) anchor.findComponent(inputId);
    return component.getLocalValue();
}

public void validatePassword(FacesContext context, UIComponent component, Object value) {
    Object firstPassword = retrieveInputComponentValue(component, "firstPasswordInputId");
    Object secondPassword = retrieveInputComponentValue(component, "secondPasswordInputId");
    if (!firstPassword.equals(secondPassword)) {
        FacesMessage message = new FacesMessage(FacesMessage.SEVERITY_ERROR, "Password mismatch", null);
        throw new ValidatorException(message);
    }
}

Note that all components have to be in the same container for this to work properly as is (UIComponent#findComponent method restricts its search to the nearest container). Of course, this retrieval method can be adapted to handle more tricky situations, or to handle other kinds of component types.

What's the improvement compared to what i've seen so far? Well I've seen recommendations to use an hidden input too, but it's retrieving the referenced components values binding the components to some fields in the backing bean. So in this case, you had to add getters and setters for each component, and you had to do it for every set of components following the same validation criterion on a given page. Other recommendations involved writing a custom component handling several other ones, assuming dependent components will be rendered in the same part of the page... overkill, right?

In other words, here you can perform the same validation several times in the same page... with less code.

And it's prettier ;-)

Posted by Anahide Tchertchian @ 03/04/2009 05:58 PM. - Categories: java, web -  0 comments
02/18/2009
Final sprint before Nuxeo 5.2 (GA)

The final (GA) release of Nuxeo EP 5.2 should be ready in less than a month now. Here are some notes on how we plan to proceed with the final sprint until the release:

  • you will find below a list of the issues we’d like to address before the release (some of them might already have been addressed).

  • most of these issues have been affected to teams or developers, who are working on them during this sprint.

  • so next release, 5.2 RC, is scheduled in two weeks (Friday 27th) and hopefully will be feature-complete (and fairly stable).

  • the final release will happen after some testing and eventual bugfixing, hopefully less than a couple of weeks later.

  • the Jira (http://jira.nuxeo.org/browse/NXP) has been updated accordingly (almost, it’s not completely in sync with the list below, we will try a slightly different system when we will start planning the next releases to ensure that the Jira state accurately reflects the products and release backlogs).

Task we must do before GA

  • Ensure Java 6 compatibility for webservices:

    The build will remain in Java 5, but we will support deployment on a 1.6 JVM.

  • Events:

    There are some errors in the logs while processing some events. This seems to only happend for specific events (emptyDocumentModelCreated), but this still needs to be checked and fixed. We should also unit test all async listeners since it’s now possible.

  • Data Migration:

    We need to be able to migrate content from the JCR backend to the new VCS (SQL-based) backend.

  • WebServices:

    We still have the “old” webservice based in JBossWS and JAX-RPC.

    These needs to go away :

    • there are issues with Java 6
    • looks like we can not run JBossWS and Metro in the same box

    => We will migrate the 3 WS (Audit, Core, IndexingGateway) to Metro.

  • Visible Content Store (aka VCS, aka Visible SQL, aka SQL repository):

    • We will support H2, PostgreSQL, MySQL, Oracle, MS-SQL
    • All supported DBs must be tested in hudson.
  • Workflow:

    • Workflows needs to be plugged back to the event and notification services.
  • Annotations: It must be possible to delete an annotation.

  • Packaging cleanup:

    We need to check the packaging :

    • remove depracated/compat packages
    • remove cache packages

Task we should do before GA

  • UI cleaup on JSF WebApp:

    Some look & feel work will be done by GUnit team.

    We will also do some cleanup :

    • review order of tabs and namming
    • fix / re-enable tooltips
    • re-enable D&D inside the browser (it is only 50% usable in M4)
    • enable popup context menu on full rows
    • refactor summary screen
  • More FileManager plugins:

    • IO-Plugin
    • Bundle file mac
    • create mail addon
  • Picture book:

    • test, fix and package well
  • Finish WebWorkspace

  • REST API:

    • Check and document the existing REST API
  • Update and align sample project on 5.2 GA

  • Complete / update the Nuxeo Book and the Nuxeo tutorials

  • Extend audits views:

    We can easily provide for each workspace a timeline of what happends. As a default implementation there will be no security filtring on event logs.

Nice to have

  • Show case for NXThemes and WebWidgets:

    • it would be great to have a default usage for WebWidgets

    => use it for the default DashBoard

  • Mail Drop box:

    The idea is to reuse the sceduled email fetcher to feed an InBox associated to each workspace

    => Have an email adresse for each workspace

    => Add a mailto link to workspace summary

  • NxWSS and MOSS:

    Add a WSS URL to each workspace summary to enable direct access via WSS clients.

Dead for GA (will be done in 5.2.1)

  • GWT integration:

    Search Center will be the showcase for GWT integration.

  • Flex:

    We should release the connector and the samples, and communicate how to use them via some blogs.

  • DynSearch:

    • Replaced by Search Center
  • Thumbs management and nice web folder contents view

  • Smart Folder / Saved search

  • UserWorkspace improvements

Posted by Stéfane Fermigier @ 02/18/2009 10:46 PM. - Categories: nuxeo5 -  0 comments
I1N on I18N

Lost In Translations

I've updated the book, again. I've added a new chapter to the book on how to make a Nuxeo bundle behave correctly in different locales:

Chapter 7: Internationalization and Localization

This is a chapter that is near and dear to my heart as a beginning (or dare I say intermediate?) student of la langue français. One thing that I didn't really get into this too much in the chapter, although it's important to me these days, is how quickly you can process (or misprocess) things when the language is your own. Some of the applications on my desktop here in Paris are in French and I find myself glancing at things like menus and thinking it says "Close File" when really it's saying something totally different. The brain can really play tricks on you when things are buried way down there will all that reptile stuff!

I have now put together 40 some-odd pages of text and images for Java programmers and have yet to put a line of Java code in the book! I think, though, we are reaching the limit of that and the plans for the next chapters will include some Java chicanery, although we are not yet ready to graduate from XML entirely...

Posted by Ian Smith @ 02/18/2009 06:57 PM. - Categories: nuxeo5 -  0 comments
02/17/2009
Video: "10 reasons why Nuxeo is using GlassFish"

La présentation que j’ai faite chez Sun le 10 février dernier, sur le thème “10 reasons why Nuxeo is using GlassFish” a été filmée. La vidéo vient d’être mise en ligne:

Vous pouvez également toujours consulter les slides.

Posted by Stéfane Fermigier @ 02/17/2009 11:57 PM. - Categories: java, nuxeo, nuxeo5 -  0 comments
02/11/2009
"10 reasons why Nuxeo is using GlassFish" presentation

Sun has launched GlassFish Portfolio yesterday.

During the pre-launch press conference in Paris, we’ve been invited by Sun to present how we are integrating GlassFish in our open source ECM technology stack. (Many thanks to Alexis for the invitation).

I outlined the top 10 reasons why GlassFish is a good match to both our open source business model and our technology needs:

  • 10: GlassFish Embedded
  • 9: Provisioning and administration (Update Center)
  • 8: Standard process (JCP), preview of cool technologies (JAX-RS, EJB 3.1 / Java EE 6)
  • 7: Interoperability (ex: Metro)
  • 6: Developers agility (short startup time, scripting support)
  • 5: Documentation, support
  • 4: Reference implementation of the Java EE 5 standard
  • 3: HK2 microkernel (modularity, OSGi, service orientation, dependency injection)
  • 2: Quality, enterprise-readiness, performance and scalability
  • 1: Momentum, open source community

The Slides are here:

View more presentations from nuxeo. (tags: nuxeo ecm)

Update: the video is also available (in French):

Posted by adminsf @ 02/11/2009 09:44 AM. - Categories: java, nuxeo, nuxeo5 -  0 comments
02/07/2009
Already Dead, Yet Alive, Operating Systems Crack Me Up...but...
Look, I get a laugh out of reading the announcements of new (or rebuilt old) operating systems. I am frankly pretty skeptical that world needs a great many more operating systems now that it has 1 poor, big commercial, closed-source one, 1 good free, small-or-large, open source one, and 1 good, small-or-large, half-open-source-half-closed-source one. From a market standpoint, I'm not sure we need more since it's really unclear where the gap is that is going to be filled by something else that can grow into something better than either of these; the alternative of replace-all-at-once cannot even be accomplished by one of the largest, richest companies in the world and is, in my view, too much work to be useful.

So, this new Phatom OS would have been just another laugh-at-somebody-else's-expense... except something caught my eye.

The clever bit, at least to me when I read it, was that by targeting *only* virtual machines you can actually produce a situation where the underlying operating system doesn't matter much. This is clever and appeals to my sense of the macroeconomics (see above) since it allows you to get something small out the door, get some customers on it for some special/custom operations, and grow into something bigger.

I have somewhat reached this agnostic point already. Nuxeo is a large system, but it runs on basically anything that has a decent JVM. There are some "environmental" issues with windows that make it a bit tougher to support than Mac or Linux (either 32 or 64), but from the standpoint of 99% of the development, the underlying OS doesn't matter at all. So, all-of-the-sudden we have myriad gaps in the market! If the OS runs the JVM well (or the CLR, if you care) you can switch to it. That is the only thing that really matters to me anyway, and with the JCK it seems feasible that this could be guaranteed to a reasonable level of compatibility.

Once this is achieved--and it's not trivial--you can start differentiating your OS with extra Java packages that exploit super-cool-feature-X. Who cares what it is, it is something that *might* entice someone to stay on (get stuck with) the OS. Similarly, if you support some weirdo hacks in your OS that allows some JVM operation (or, better yet, a whole method or class) to run 10X faster, then you might be able to differentiate on performance. (This is risky, though, because the linux weasels will steal it in 10 mins! Look at the Tomcat native support if you need any more proof!)

This idea should have occured to me before. It strikes me, having thought through this now, that it might be the sole reason anybody wants Solaris anymore.

Posted by Ian Smith @ 02/07/2009 03:28 AM. - Categories: java -  0 comments
02/06/2009
Presentation On "What Is ECM" by JM Pascal (in English)
Posted by Ian Smith @ 02/06/2009 06:31 PM. - Categories: ecm -  0 comments
Nuxeo 5.2 Milestone 4 Feature: Conversions and Previews

Exploiting OpenOffice

I decided to start off with the good stuff for this post. I couldn't really call this article, "How you really don't need Microsoft Office anymore because the open source tools are good enough" because I was afraid of Bill Gates. Or, to be more specific, I was afraid his hired goons; I would guess that he has the best goons money can buy, and lots of 'em! Anyway, if you haven't been following the progress of OpenOffice, you should catch up on things; the product has come a long way since of the ... cough, ahem.... "rough" builds of the early days. It runs really smoothly now, and being open source it's designed to allow other programs to leverage the great work that the Open Office (O-O) developers have done. (I am sure there is a climate-controlled cave somewhere for the poor slaves that toil away in anonymity on MS Office. I'm sorry, to those folks, for their working conditions, but the good news is that their company is becoming more like other companies now.)

The "leveraging the work of others" part is where this blog post connects with Nuxeo EP 5. Open Office (starting around version 2), exported a service that programs can connect to and use the imaging capabilities of O-O. The Nuxeo engineering folks did a number of nice things that use this capability--and we could only do this legally because we are open source too! Woo-hoo!

Seeing It In Action

I have mentioned the "Preview" tab that's now available in Nuxeo 5.2 milestone 4 in a previous blog post. One of the cool things that happens when you have O-O running on your system is that the MS Office formats "just work" in the Preview tab. O-O has to be running for this to work, so you should start it yourself if you want to play along with this post at home. So, to demonstrate, I've created a workspace with a few documents in it about cloud computing:

???

These are three files I found with google as demo material... I won't vouch for the quality of their information! These are, from top to bottom, a PDF file, a Powerpoint presentation, and a MS Word file. Lets see what happens if we click on the word file and then switch to the preview window:

???

So, what has happened here is that the MS Word document has been sent to O-O for conversion to HTML, then Nuxeo has rendered that converted version into a Preview pane. If you are a regular user of O-O, you are probably saying "Ho, hum, I've been doing that for years." Well, maybe you should revisit my previous post on annotating documents, huh? That little eye and the annotation service work for MS Word documents as it does for images:

???

As you would expect, the same holds true for the PowerPoint presentation--you can see it in the preview panel. Somewhat cooler, is you actually get some sensible controls to actually read the presentation as well. Here is a snapshot from the presentation:

???

Note the extra controls at the top like "Continue" (perhaps should be "Next Slide") and "Last Page." This is a pretty nice preview for not only not running MS Office, but not paying for it at all!

PDF Magic

So, of course, I'm going to show you a screen snap of a preview of a PDF document. Before I do that, I should mention that the PDF imaging is actually not being provided by the O-O system that I have been raving about but by another linux tool called pdftohtml that is part of the (very impression) popper PDF imaging project. Ok, here it is, sans any gratuituous annotations:

???

To return to my previous ranting about how cool O-O is, O-O does have a story to tell about PDF, but in the other direction. To generate a preview of PDF, as was done above, you need to render PDF and then figure out how to best display that in HTML. O-O is good at going to PDF from other formats, like those associated with MS Office. Thus, when you have OpenOffice available, you get a slightly different Summary tab as I am showing here:

???

The "Generate PDF" link will give you a PDF version of the document by rendering it with O-O and then sending it to your browser, without even needing a copy of MS Office! Sweet!

Internals Note

If you thought it was a little weird that you had to start up OpenOffice yourself to get these features, well, you are right. That was just to make it a bit easier to explain. In fact [extra coolness points here] there is a new part of Nuxeo 5.2 milestone 4 that manages a copy of Open Office for you, behind the scenes, if you configure your server to turn this on. Since Open Office is based on Java and Nuxeo is based on Java--and all three are open source--we actually use the OpenOffice code directly (rather than through a unix pipe or something) which should give us much better reliability as we move forward in the 5.2 GA release. There are a number of options about how you would like this to work, such as how many resources you are willing to dedicate to this slave version of O-O, in the configuration directory in the file ooo-config.xml.

I hope this gives you some hope that you don't have to keep paying the tax to run Office applications in the content of your content management system! If you have questions or comments about this article or anything else related to ECM or Nuxeo, drop them to me at ismith [at] nuxeo [point] com.

Post Scriptum

Secretly, over a period of about a year, I worked with folks at Fortune 500 company on a daily basis that were an MS Office-only company. They never knew I was not running Office...it's not a drop-in replacement or 100% compatible, but it is good enough, now.

Posted by Ian Smith @ 02/06/2009 03:34 AM. - Categories: nuxeo5, openoffice -  0 comments
Learning Nuxeo (Book Draft): New Chapter: Running Nuxeo

Another day, another chapter

Today, I've updated the book site with chapter 3 (really the second meat-n-potatoes chapter). This chapter explains a bit about how to run the Nuxeo server, how the Nuxeo server is related to JBoss and proposes some excercise for you to do to explore how to use Nuxeo. These exercises are probably "old hat" to folks with a lot of ECM experience, but if you are new to the genre you are likely to find them quite enlightening about how an organization can use the basic Nuxeo (and ECM) tech.

Great Feedback

After the website debuted yesterday the chairman of Nuxeo, Stefane Fermigier, suggested that I check out this new tool called Intense Debate. I have to say it was a good call! IntenseDebate is a tool for allowing you to get feedback about content on your website or blog. It's a bit better integrated if you put it on a blog, but some hackery allowed me to get into our tech-writing loop, so we can have pages that take your feedback, let you vote on other people's comments, etc Woot!

I hope you enjoy this installment and we'll be back tomorrow. You can, if you prefer, send me feedback the 20th century way by sending email to me : ismith [at] nuxeo [point] com.

Posted by Ian Smith @ 02/06/2009 12:29 AM. - Categories: ecm, nuxeo5 -  0 comments
02/05/2009
Learning Nuxeo (Book Draft)

Learning Nuxeo

I've been asked by some of the good folks here at Nuxeo to coordinate work on a new book about programming Nuxeo EP 5. Unlike the Nuxeo book, which is intended to be a reference to all the details of the platform, this book is intended to be more of a gentle introduction. This book will help readers gradually master the concepts in Nuxeo 5 (we hope the book will be ready by the time of Nuxeo EP 5.2 GA) and be able to develop software for the platform.

The first few chapters are gelling, and we've been looking for feedback. So, we've decided to gradually open up the book for comments from the Nuxeo community, or, I suppose, from any other community that can provide useful help!

My plan is to reveal a new chapter each workday or so, on the book draft's website, and perhaps try to highlight some of the good feedback I have gotten from the community about previous chapters.

Feeback Of The Day

In response to seeing an early draft of the book a couple of people asked "who is the book for and what's it about?" So, I added the prefatory material to try to give folks more of a "roadmap" to the book to help people see where it is going.

Please send your comments to ismith [at] nuxeo [point] com and I promise to try to incorporate them.

Posted by Ian Smith @ 02/05/2009 04:11 AM. - Categories: ecm, nuxeo5 -  0 comments
02/04/2009
New Feature of Nuxeo 5.2M4 - Annotations

Annotations may, in fact, rock!

If you haven't played with the 5.2 milestone 4 release. yet, you should! There are some amazingly cool new things in there; I'm going to be trying to get through some of them this week--despite all my packing for my upcoming move!

If you aren't aware, there has been a work in progress with the W3C for some time create a standard called Annotea. The idea of this standard is to have inter-operable annotations on web pages (really URLs). So, being an open-source and generally standards-following sort of company, we used that standard to build this new feature into Nuxeo 5.2. That said, we had to extend it in some pretty cool ways to make it work well within nuxeo.

Previews

This should be the subject of a separate blog entry, but I'll put a bit here since you access annotations through the preview system. When you look at a document through the web UI now, you'll see a "preview" tab. This tab, based on how you configure the system, generates some HTML based on the content of the document--but without downloading it. What HTML it generates is configurable based on what its mime type is or what type of document "type" it is inside nuxeo. Don't worry, it's all totally down with the document model so it can understand documents that have multiple underlying files, documents that supply their own previews inside the document itself, etc.

Anyway, let's try it out:

???

In the image above, you see the first attempt by the new president of the United States to take the oath of office (more on that in a second!) on January 20th. You can see the first lady, the first kids, the president-elect Barack Obama, and the chief justice of the United States Supreme Court.

(You should also note that you view the image and all its associated image metadata--like the data embedded in jpegs generated by digital cameras--through some of the other tabs above.)

You should notice two small icons left of the image. The first is the small "eye" in the upper left and the small triangle just to the left of the bottom of the image. The little triangle is an "open and close" gizmo for closing or opening up the annotation content.

I'm not too good with the names of the first kids, so I'm going to put an annotation on the image to see if somebody can help me. I do this by selecting a region of the image with my mouse and I get this:

???

In the comment box I type my question and then click "Submit." The result is this:

???

The "eye" control in the upper left turns off the highlighting of the regions in the image if you don't want to see the annotated regions.

So, I had some fun at the chief justice's expense. I added another annotation, just like before but this time to the face area of justice Roberts. Once I submitted this annotation, I then rolled my mouse over this "hot region" of the image and got this:

???

Playing With Text

Up to this point, I've been talking about images, so we were really in an area not specified by the Annotea spec. Much better covered, at least by the Annotea folks, is text. Let's trying doing something similar, this time with the text of the oath of office, which I've uploaded into Nuxeo. I uploaded flat text and got this in my preview tab:

???

You may think you know where this going....ok, you probably do. It works in a similar way to the image one:

???

By the way: If your instance of Nuxeo has a copy of open office installed and in use for format conversion you can annotate a ton of different document formats...

Relations and Audit

So, you think you're a hot-shot because you guessed how it would with text, huh?

But did you guess that it actually works, under the covers, using the Nuxeo relations system? It uses the mechanism of linking documents to link the annotations to the source document. If you want to download the file, you can get a "wad" that includes annotations and other associations.

Anybody out there a big fan of the Adobe PDF annotation tool (70 USD?) and want to comment on how it compares? Does Adobe have an annotation server? They have some text that references their "linkbase" but I couldn't find anything of substance... ?"

I also have it on good authority (e.g. I talked to the developers!) that all the usual Nuxeo auditing capabilities will be available when this annotation system is shipped in the first Release Candidate.

is this cool or what?
Posted by Ian Smith @ 02/04/2009 03:05 AM. - Categories: nuxeo5 -  0 comments
02/03/2009
A New Image: A Bit Closer To Turnkey

Improved Image For Amazon EC2 Users

This instance is built from the Nuxeo EP 5.2.0.m4 distribution. Great work to all the folks who contributed to it!

I have built a new version, for those that want to have something closer to a "turnkey" image that they can run on amazon ec2. If you missed my previous blog post, then you may want to go back to it and make sure you have the proper background and understanding of amazon ec2.

The new image ami is:

ami-12997e7b

The major improvements with this AMI over the previous one (besides running 5.2.0.m4):

  • The jboss server has been locked down somewhat so it's closer to something you can use without too much installation hassle. The most dangerous security problems have been dealt with. See /usr/local/nuxeo-ep-5.2.0.m4/banned for the removed parts of jboss and some more info. It's probably not production ready, but the worst offenders have been removed.
  • The tomcat acceleration (libtcnative1.so) has been installed so that serving static files is basically the same speed as a native apache.
  • Data in the repostiory is stored in postgres.
  • There is proper configuration and a script to help you boot your instance so that you use the "big" disk on an instance that is mounted on /mnt on amazon.
  • The server is configured to run on port 80 and binds to all interfaces by default.

This is not to say it's totally ready. There is still some work to do before it's truly "turnkey":

  • Jena is not putting its data into postgres. I tried a great many different things, but could not convince it to use postgres, so I fell back to using Derby for annotations and relations.
  • No automatic backups to S3 to make sure the data is safe (but it seemed silly to bother until the problem with Jena + postgres gets fixed).

What to do to use it

The image is 64 bit and I recommend using a m1.large image.

Once your image is booted, log into it as root and in your home directory use this command:

./instance-start.sh

After that, it'll take a bit to start up and you can see your instance by hitting your server with a web browser on /nuxeo and have fun!

Be sure to change your Administrator password!

Future restarts

The instance-start script does some things besides just start the server. It initializes the instance filesystem and sets up some database configuration that is needed to run nuxeo. After the first time, you should use the normal run and shutdown scripts in /usr/local/nuxeo-ep-5.2.0.m4/bin as is normal.

The script above tries to keep you from running it multiple times--this will blow up because databases already exist and other such state--so it is probably easier to just destroy the instance and try again.

Posted by Ian Smith @ 02/03/2009 04:39 AM. - Categories: linux -  0 comments
CMIS meeting notes
Last week took place the first face-to-face meeting of the OASIS CMIS Technical Committee (TC).

This first meeting was very productive, and allowed very constructive discussions. I'll try to retrace below the gist of the conversations around the topics I found most interesting, sometimes these were conversation I had with just one or two people, or topics related to that.

The outlook from these discussions, and from the scope of the spec itself, is very positive. I believe that within a year CMIS will start to actively redefine the world of content management systems, which will be an opportunity both for big vendors who will see easier adoption of their solutions by customers concerned by lock-in or interoperability, and for smaller vendors whose products will be able to take advantage of a much broader spectrum of connectors to third-party systems.

Schedule

The most important news at the end of these three days is that there is enormous support in the TC for CMIS 1.0 to be released as soon as reasonably possible, as it is felt by all that a simple and solid spec that can be implemented and used ASAP by everyone is paramount. Due to time constraints inherent in the OASIS standardization process, this is likely to be in late 2009 or early 2010 -- and that's if we can polish the current draft and fix the problems within something like two months!

Existing capabilities

During this meeting it was stressed many times that the goal of CMIS is not to define the semantics for new features that a repository could implement, but to provide access to existing features of existing repositories, so that they can interoperate.

This implies that complex features, non-standard features, or features that are common but implemented with a wide variety of semantics, have to be out of scope for CMIS 1.0. When features are exposed through CMIS, there is a duty to make sure that this can be done by almost everyone without rethinking the repository's architecture.

Retention & Hold

For those not familiar with the terms, a retention policy describes the rules along which documents will be kept for a certain time then archived or destroyed, and holds are typically put on documents for legal purposes to prevent their destruction when companies are being sued or subpoenaed. A way to specify and discover documents that can have various holds put on them, or various retention policies, is critical to all the Record Management folks. It's not clear what can be standardized though, as there is a huge amount of possible semantics for such policies. Note also that record management features are explicitly out of scope for CMIS 1.0 (for the very reason that variations between repositories are enormous).

Tagging

While initially it can be seen as very simple concept, tagging is more than just the setting of a multi-valued property (MVP) on a document (à la Dublin Core "subjects"). A complete tagging solution can involve the following features:

  • Adding metadata to a tag: tag date and author are typical of the use case here. The tags are then seen as either a "rich" property, or as a relationship (carrying metadata) to a concept in a taxonomy.
  • Tagging an object on which one doesn't have write access: this is quite common and typical of the bookmarking community, think "del.icio.us". Of course to allow this you can't use a basic MVP as you don't have write access to the document being tagged.
  • Does tagging an object change its last-modification date? This may be constrained by the implementation of the tags.
  • Changing many tags at the same time. This is typical of the "tag normalization" use case, where a TagMaster has determined that two tags have to be merged. Some batch modification features may be useful in this case.
  • Querying tags for things like: most common tags, less used tags, tags most used together. Also querying for documents in relation to their tags, for instance documents sorted by total tag weight, documents with the most tags, documents most recently tagged. Or even people having added most tags, etc.
  • Maintaining the tag cloud, or taxonomy: in a system with many tags, maintaining the tag cloud becomes a full-time job. Having relationships between similar tags, merging them, weighting them, is important.

For CMIS 1.0 this will be hard to standardize, but there's still some time left for something simple to be proposed by interested parties.

Transactions

The fact that transaction capabilities are not mentioned in the spec was surprising to some. This is due to the fact that too many vendors don't support them. In addition, WS-Transaction can be used to get transactions spanning several requests when using the SOAP bindings, so repositories having them can still expose them in this manner.

Events and notifications

Having a CMIS repository notify the outside world would be very powerful, and has been mentioned as quite useful in the context of user email notification as well as unified search. However CMIS is a protocol-based spec, where a client sends commands to a server and receives answers, so there is no simple way in CMIS 1.0 to expose direct notification capabilities. Registering code that can be executed by the repository on certain events would be useful as well, but again CMIS is a language-neutral spec and cannot standardize this.

REST

While what we have today with the AtomPub bindings may not be by-the-book REST, we need a simple protocol that can be used by simple tools (and many scripting languages) to do simple access to the repository in a few lines, or even that can be used directly by JavaScript in a browser. The AtomPub bindings are here for this, and many clients can take advantage of them today, although the way some things are exposed may not be perfect (there was consensus on making sure that the bindings are as close as possible to the best practices of AtomPub). It was also noted that, for what it's worth, as a marketing term "REST" now carries a lot of weight and its presence in the spec has already been a significant factor in the adoption or interest (internal or not) in CMIS by various vendors.

There was discussion of using WebDAV, which would fit very well with the concept of navigating folders and finding documents, and already has many clients. The reason why this is not in the spec today instead of AtomPub is basically historical, there was no-one in the group to push for WebDAV when the spec was initially created, and it seems that IBM is very pro-AtomPub :) As we all want a CMIS 1.0 soon, AtomPub won't be replaced, but there may be side work going on so that post-1.0 we can find ways for different repositories to expose their CMIS features through WebDAV in a compatible manner.

RI/TCK

It was not felt that a Reference Implementation (RI) would bring much, as it is expected that many vendors will have implementations of CMIS very soon, including several open source ones. In any case, it's not the job of an OASIS TC to write software. Regarding a Technology Compatibility Kit (TCK), most people agree that it would be nice to have something, either in an abstract format or as executable test cases. Here I feel that the ball is in the camp of open source vendors; we can easily get together and pool our unit testing resources to turn them into a nice TCK. It won't be a deliverable of the OASIS CMIS TC though, and won't be normative, although conceivably the TC can formally approve a given version.

ACLs

Of all the points that really merit further work before a 1.0 version can be considered, ACLs ranked highest -- practically everyone agrees that ACLs should be in the spec in some form. However, ACLs are also one of the features that vary most between repositories, so common ground will be hard to find. Nevertheless, ACLs are crucial to some use cases. Unified search was the most frequently mentioned, but many people also have the simple use case of being able to inform a repository that a given document is now readable by Bob.

A simple way of being able to express positive ACLs (but not blocking), and to give a hint that a given ACL is inherited or not (whatever the meaning of "inherited"), would be a good step toward interoperability. If ACLs find their way into the spec, it is likely that the separate notion of a Policy will disappear.

Search

The use cases of Federated Search (an engine that, when queried, delegates the search to many repositories and then aggregates the results) and Unified Search (an engine that somehow crawls many repositories to build a database of what's in them, and can then be directly queried) have been discussed a lot, especially unified search as it impacts a number of other features.

One feature needed is something allowing the discovery of permissions, to be able to serve search results without having to check with the repository for each document if access can be granted; this will presumably involve some kind of ACLs. Even if such permission discovery does not reflect the full security policy applicable to a document, it can still be useful to weed out some of the documents and improve the efficiency of the search. Another feature needed is something allowing the discovery of what has changed in the repository since a previous crawl; this can be done either through push/events (but as mentioned above this would be out of scope for CMIS 1.0), or through pull/polling/querying to retrieve some kind of journal of the last changes, including deleted documents; this feature is sometimes called an Event Journal or a Transaction Log, and the problem is to make it available efficiently outside the repository for the benefit of search engines.

Next steps

For the TC the coming weeks will be busy, but we hope that very soon a new draft closer to 1.0 will be available to try to resolve some of the issues listed above (and a few others I skipped over). Expect news very soon! And, of course, any feedback to the TC will be reviewed carefully, please submit yours (the cmis-comment mailing-list is listed at the bottom of the CMIS committee page).
Posted by Florent Guillaume @ 02/03/2009 12:20 AM. - Categories: ecm, nuxeo5 -  0 comments
Last modified: 03/29/2007 08:08 PM

Nuxeo Bloggers: Log in!
Nuxeo - Indesko - Nuxeo 5 Project
All content is copyrighted by their author.
CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS.