Object-Relational mapping in Zope

I've been interested in object-relational mapping in Zope for a while. I see this as the keystone of a proper enterprise-grade application server, and I've been investigating existing tools and what can be done with them. Here's a quick rundown.

ZSQL methods

Using ZSQL methods (a standard feature of Zope), you can have your application talk to an SQL database, but you're not really dealing with object-relational mapping, there are no transparently persistent objects. So let's skip it.

SQLObject / sqlos

SQLObject is a pure python mapper, and it is used in Zope 3 through sqlos. They provides declarative mapping for your classes where you can write for instance:

class Person(SQLOS):
    implements(IPerson)
    _columns = [StringCol("username", length=20, notNull=True),
                StringCol("firstname", length=20, notNull=True),
                StringCol("lastname", length=20, notNull=True)]

Then any instance of a Person will actually be stored in SQL behind your back. A unique id is generated for each Person and also stored in SQL. There are various facilities to provide relations between tables, and map them to lists in the python objects.

Archetypes + SQLStorage

SQLStorage is an Archetypes storage that uses SQL as a backend. You can write an Archetypes schema such as:

schema = BaseSchema + Schema((
    StringField("username", required=1, primary=1, searchable=1,
                storage=PostgreSQLStorage(),
                 index="FieldIndex",
                ),
    StringField("firstname", required=1,
                storage=PostgreSQLStorage(),
                ),
    StringField("lastname", required=1,
                storage=PostgreSQLStorage(),
                ),
    )) + TemplateMixin.schema

class PersonSQL(TemplateMixin, BaseContent):
    archetype_name = "Person SQL"
    schema = schema

SQLStorage relies on the Archetypes UIDs to uniquely identify objects. A Relation field can be used to have relations to other objects. Note that Archetypes objects stored through SQLStorage still have a presence in the ZODB, which means it's not a solution if you totally want to get rid of Data.fs bloat.

As you can see in the way things are declared, both SQLObject and Archetypes require you to specify in your code that you will use SQL for some objects. Things are not "transparent" from the programmer's point of view.

Ape

Ape (Adaptable Persistence Engine) is a framework to do object-relational mapping at a lower level than the above two solutions, because it works at the ZODB Storage level. Ape is used as a ZODB Mount Point inside Zope. In fact, Ape is not tied to SQL but is designed to do configurable and flexible mapping to anything; storing to the filesystem is another option provided by default.

Ape relies on the ZODB framework, and especially its classes for persistence and transaction management, backed by the storage APIs themselves. This means that from the programmer's point of view, there's no difference when using Ape than when using a FileStorage or a ZEO ClientStorage. Everything is transparent.

This is a big advantage for the programmer, but the downside is that the structure of the tables in the SQL database is chosen by the framework. However Ape's default SQL mapper already tries hard to provide data mapping in a natural way; for instance all properties are made available in a natural manner, object titles or containment relationship are also naturally expressed.

Hornet

Hornet is an SQL bridge (alpha software for now) that also works at the ZODB layer. In contrast to Ape, it is much more geared toward existing datasets, or regular SQL access to tables. It requires you to define schemas in code too, but once this is done object access is totally transparent, as in Ape. You have to use code like:

PySchema(fieldsets=(
    FieldSet("employee_type", label="Employee Type", widgets=(
        StringWidget("code", label="Employee Code"),
        StringWidget("description", label="Description"),
        )),
    ))

This is then used by the mount point and the underlying Hornet framework for a given table to decide how to map table lines to objects. (The schema is also used to autogenerate forms.)

I want more

To me Hornet and Ape are promising because I believe they integrate at the right level. Ape is better because it doesn't require explicit schema declaration, and that's very important when you have flexible objects where the users add new fields on document instances (which happens all the time in CPS using FlexibleTypeInformation-based content objects).

I believe a large part of the success of the infamous Rails has been that it makes it trivial to interface to an existing SQL database, provided it's regularly named. And I want to bring Zope to that level, but that's just going to be a side effect of a number of other goals.

First, I want to get rid of the catalog and move it to its proper place, which is in a relational database. The ZCatalog is a hack (a successful one I must admit) that tries to bring into the ZODB world something that has been working much better for ages in the relational world. The ZCatalog indexes use a truly gigantic number of small persistent objects to work. Querying an index trashes the ZODB cache which could be used for better things (to cache the objects the user is really accessing). Also the ZCatalog fails to provide any relational features, like JOIN queries. The sorting is weak (you cannot sort on something that's in a full-text index without adding another index). Finally, the "metadata" is duplicating lots information that can be found elsewhere.

If you store your data in SQL, then you can dump the ZCatalog implementation and delegate indexing and searching to SQL.

Second, I want to store blobs in the filesystem. This can be done at the application level by various products such as CPS DiskFileField, Archetypes ExternalStorage, chrism's Blob product, and others. It can also be done transparently at the ZODB storage level using Ape, and that's a much simpler way to do it.

Finally I want my application-specific data (document metadata, document hierarchy, versioning information, various customer specific record-like objects, user definitions, access rights, etc) to be stored in SQL because most of them really live naturally in SQL tables. And because I want to do aggregates on them, like sums and means.

To achieve this, I plan on using the most flexible (and underused) framework available, which is Ape. I'll write various classifiers, and mappers so that a typical CMF or CPS site can be mapped naturally to SQL. I'll also replace the catalog by an implementation that does SQL queries. This will not be a simple endeavour, but I feel this is the only way to get what I truly need in the end, without sacrifying any flexibility.

If Ape proves to hard to work with (because it imposes its own framework of mapping), I'll go the Hornet way of writing a storage directly, with flexible enough policies for the mapping to SQL or the filesystem (or LDAP for that matter).

It goes without saying that this will apply to Zope 3, because that's part of what I think an ECM system should provide. But there's no reason not to make it work in Zope 2, and that's probably where I'll start.

Update: I was asked how this would fit with the "do things in python" philosophy that Martijn has been rightly pushing. Ape is first a pure python library that provides a new database class for use with ZODB (which itself can be considered independent of Zope). Ape also provides Zope-specific mappers that know how to best deal with a number Zope-specific classes. Whenever possible, I'd like to keep to that organisation (also keep in mind that a good separation will make it reusable both by Zope 2 and Zope 3).

Important announcement: Join the Nuxeo team and contribute to the Nuxeo project! We have open positions in France and the UK for open source Java EE developers and sales engineers, both junior and senior.

Like this post? Share it:


Trackback Pings

Trackback URL for this entry:
http://blogs.nuxeo.com/sections/blogs/florent_guillaume/2005_08_11_object_relational/tbping
» Where Dejavu fits in the ORM cosmos from The Hand of FuManChu
Florent Guillaume has written a [good survey of his personal ORM options for Zope3](http://blogs.nuxeo.com/sections/blogs/florent_guillaume/2005_08_11_object_relational). I thought I'd take the opportunity to discuss [Dejavu](http://www.aminus.org/rbre...

Tracked on 08/11/2005 07:38 PM

Posted by Florent Guillaume @ 08/11/2005 05:45 PM. - Categories: cps, python, zope, zope3 -  0 comments

Nuxeo Bloggers: Log in!
Nuxeo - Indesko - Nuxeo 5 Project
All content is copyrighted by their author.
CPSSkins is Copyright © 2003-2006 by Jean-Marc Orliaguet. | CPS is Copyright © 2002-2006 by Nuxeo SAS.