Document Actions
10/18/2006
What's the point of JCR?

Nuxeo is switching its ECM to Java, and we're using JCR for our document storage. JCR (Java Content Repository, standardized by JSR-170 and the upcoming JSR-283) is a young specification with a promising future — but what's its point, you may ask, as all existing content management systems are already storing content very well without it? Its goal is interoperability between vendors, which will make it possible for people who write applications needing to store content to have a unified API for such manipulations. All major content repository vendors are active in the JSR-283 expert group, and all are working on JCR bindings for their various proprietary repositories.

Of course a standardized and wildly successful way of manipulating content already existed before JCR: SQL. But SQL and JCR have a different focus:

  • SQL is a language; it manipulates rows and is geared toward generic relation manipulation,
  • JCR is a Java API; it manipulates nodes and is geared toward hierarchical manipulation (parent-children).

SQL and JCR have quite different underlying data models:

  • SQL's model is that of tables with fixed schemas, and relations between tables,
  • JCR's model is that of a tree of nodes with flexible schemas and with parent-children relations as the main focus — although other types of relations exist.

JCR also offers higher level features than SQL, notably workspace and version management.

For many kinds of applications, there is a focus on being able to arrange documents in folder hierarchies, and to have a wild variety of structure for these documents. In this case, a storage model based on JCR is much more suited than something based on SQL.

It should be noted that many things in the computing world already are based on the notion of folder hierarchies storing arbitrary documents:

  • filesystems are a tree of (unstructured) documents,
  • Revision control systems are based on filesystem concepts but add a lot of structure on top of them,
  • WebDAV (along with DAV and DELTAV) is a protocol that addresses documents using a path, and where documents have a flexible set of properties,
  • most proprietary content management systems are based on (or offer the notion of) organizing the content in a hierarchy,
  • Nuxeo's CPS is itself based on this classic model.

This is why JCR has emerged as a common and useful storage API for all these use cases.

For an ECM framework like Nuxeo 5, JCR interoperability can be seen from two different directions:

  • JCR provides an API that we can use to store our content, to make us vendor-independent and flexible regarding storage,
  • JCR provides an API through which we can expose our content, which makes our platform usable by any external system that understands it — we're ourselves a vendor providing JCR bindings.

For its initial release, Nuxeo 5 is focusing on the use of JCR as its main storage implementation and uses Jackrabbit to store most of our content. In the future, we'll also provide JCR bindings so that the high-level content we provide can be directly accessed from external applications using JCR too.

Posted by Florent Guillaume @ 10/18/2006 05:40 PM. - Categories: cps, ecm, java, nuxeo5 -  0 comments
Nuxeo 5: Unifying the content APIs

In a content management system, the actual data that the system or the users manipulate comes from many kinds of sources. Content can come from a JCR repository, or from a relational database, or from an LDAP directory, or from a semantic storage engine like Jena, or from any other kind of open or proprietary storage engine.

But fundamentally all these kinds of content, which I'll call "records", aren't very different:

  • a record can be created, viewed, modified, deleted,
  • a record can often be copied or moved,
  • a record obeys a schema that can be known to the system, this means that its individual fields are strictly typed,
  • when being viewed or modified, a record has a user interface that is based on forms, labels, widgets, depending on the schema,
  • records can be searched and a result set returned,
  • records can be listed in a compact form (search results, folder contents, user dashboard, workflow workitems, RDB table listing, user information browsing, etc.),
  • records have an identity (like a unique id) or a location (like a path), sometimes both.

One of the strengths of CPS is to use a common abstraction for many of these concepts, embodied in the CPSSchemas component. In Nuxeo 5 we want to go further and provide even more integration for all these, the base components for these abstractions are NXCore and NXTypeManager.

The reasons to strive for convergence are numerous:

  1. this merges into unique concepts things that had been previously separated because of different implementation choices. For instance, an LDAP schema is not fundamentally different from an SQL schema (or from an XML Schema if one is interested in the relevant subset).
  2. this gives the programmers a common API for all data-related operations, which means more reusability. For instance changing an attribute in an LDAP entry doesn't have to be different from changing the title of a document or changing a value in an RDB row; processing a list of search results to display them in a table doesn't have to be different from processing the children of a folder to display the folder's contents.
  3. this gives the framework developers a way to optimize some operations because of commonalities in the underlying implementations. For instance you don't need three kinds of events dispatching for "LDAP entry modified", "RDB row modified" or "document modified".
  4. this gives the users a unified way of manipulating different kinds of data when there's really no need to have a different UI for them. When a user fills a form it's really the same process whether he's modifying his personal preferences, adding a keyword to a document, or changing a quantity in an RDB row.
  5. this allows very simple migrations between storage technologies, when these are felt necessary. A customer could start with an LDAP database for its user base and later have the need to move them to an RDB table. User entries in an RDB table may need to be versioned and moved to a JCR storage. An application should survive all these with only configuration changes and no code to rewrite.

It should be noted that this means that JCR is in no way the primary storage model for Nuxeo 5, it's only the first one to be implemented. In the future, it will be possible to store documents in LDAP or an RDB. When a suitable storage model is devised and implemented, you'll be able to apply workflow or versioning to RDB-based documents for instance.

This convergence is quite exciting to us, and our goal is to allow people to build complex applications with Nuxeo 5 in a more straightforward manner.

Posted by Florent Guillaume @ 10/18/2006 12:58 PM. - Categories: cps, ecm, java, nuxeo5 -  0 comments
Last modified: 01/25/2005 03:18 PM

Nuxeo Bloggers: Log in!
Nuxeo - Indesko - Nuxeo 5 Project
All content is copyrighted by their author.
CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS.