Object-Relational mapping in Zope


Thu 11 August 2005 By Florent Guillaume

I've been interested in object-relational mapping in Zope for a
while. I see this as the keystone of a proper enterprise-grade
application server, and I've been investigating existing tools and what
can be done with them. Here's a quick rundown.


ZSQL methods


Using ZSQL methods (a standard feature of Zope), you can have your
application talk to an SQL database, but you're not really dealing with
object-relational mapping, there are no transparently persistent
objects. So let's skip it.


SQLObject / sqlos


SQLObject is a pure python
mapper, and it is used in Zope 3 through sqlos. They provides
declarative mapping for your classes where you can write for instance:


class Person(SQLOS):
implements(IPerson)
_columns = [StringCol("username", length=20, notNull=True),
StringCol("firstname", length=20, notNull=True),
StringCol("lastname", length=20, notNull=True)]

Then any instance of a Person will actually be stored in SQL behind
your back. A unique id is generated for each Person and also stored in
SQL. There are various facilities to provide relations between
tables, and map them to lists in the python objects.


Archetypes + SQLStorage


SQLStorage
is an Archetypes storage
that uses SQL as a backend. You can write an Archetypes schema such as:


schema = BaseSchema + Schema((
StringField("username", required=1, primary=1, searchable=1,
storage=PostgreSQLStorage(),
index="FieldIndex",
),
StringField("firstname", required=1,
storage=PostgreSQLStorage(),
),
StringField("lastname", required=1,
storage=PostgreSQLStorage(),
),
)) + TemplateMixin.schema

class PersonSQL(TemplateMixin, BaseContent):
archetype_name = "Person SQL"
schema = schema

SQLStorage relies on the Archetypes UIDs to uniquely identify
objects. A Relation field can be used to have relations to other objects.
Note that Archetypes objects stored through SQLStorage still have a
presence in the ZODB, which means it's not a solution if you totally
want to get rid of Data.fs bloat.


As you can see in the way things are declared, both SQLObject and
Archetypes require you to specify in your code that you will
use SQL for some objects. Things are not "transparent" from the
programmer's point of view.


Ape


Ape (Adaptable Persistence Engine) is a framework to
do object-relational mapping at a lower level than the above two
solutions, because it works at the ZODB Storage level. Ape is used as a
ZODB Mount Point inside Zope. In fact, Ape is not tied to SQL but is
designed to do configurable and flexible mapping to anything; storing to
the filesystem is another option provided by default.


Ape relies on the ZODB framework, and especially its classes for
persistence and transaction management, backed by the storage APIs
themselves. This means that from the programmer's point of view, there's
no difference when using Ape than when using a FileStorage or a ZEO
ClientStorage. Everything is transparent.


This is a big advantage for the programmer, but the downside is that
the structure of the tables in the SQL database is chosen by the
framework. However Ape's default SQL mapper already tries hard to
provide data mapping in a natural way; for instance all properties are
made available in a natural manner, object titles or containment
relationship are also naturally expressed.


Hornet


Hornet is an SQL
bridge (alpha software for now) that also works at the ZODB layer. In
contrast to Ape, it is much more geared toward existing datasets, or
regular SQL access to tables. It requires you to define schemas in code
too, but once this is done object access is totally transparent, as in
Ape. You have to use code like:


PySchema(fieldsets=(
FieldSet("employee_type", label="Employee Type", widgets=(
StringWidget("code", label="Employee Code"),
StringWidget("description", label="Description"),
)),
))

This is then used by the mount point and the underlying Hornet
framework for a given table to decide how to map table lines to objects.
(The schema is also used to autogenerate forms.)


I want more


To me Hornet and Ape are promising because I believe they integrate
at the right level. Ape is better because it doesn't require explicit
schema declaration, and that's very important when you have flexible
objects where the users add new fields on document instances (which
happens all the time in CPS using FlexibleTypeInformation-based content
objects).


I believe a large part of the success of the infamous Rails has been
that it makes it trivial to interface to an existing SQL database,
provided it's regularly named. And I want to bring Zope to that level,
but that's just going to be a side effect of a number of other goals.


First, I want to get rid of the catalog and move it to its proper
place, which is in a relational database. The ZCatalog is a hack (a
successful one I must admit) that tries to bring into the ZODB world
something that has been working much better for ages in the relational
world. The ZCatalog indexes use a truly gigantic number of small
persistent objects to work. Querying an index trashes the ZODB cache
which could be used for better things (to cache the objects the user is
really accessing). Also the ZCatalog fails to provide any relational
features, like JOIN queries. The sorting is weak (you cannot sort on
something that's in a full-text index without adding another index).
Finally, the "metadata" is duplicating lots information that can be
found elsewhere.


If you store your data in SQL, then you can dump the ZCatalog
implementation and delegate indexing and searching to SQL.


Second, I want to store blobs in the filesystem. This can be done at
the application level by various products such as CPS DiskFileField,
Archetypes ExternalStorage, chrism's Blob product, and others.
It can also be done transparently at the ZODB storage level using Ape,
and that's a much simpler way to do it.


Finally I want my application-specific data (document metadata,
document hierarchy, versioning information, various customer specific
record-like objects, user definitions, access rights, etc) to be stored
in SQL because most of them really live naturally in SQL tables. And
because I want to do aggregates on them, like sums and means.




To achieve this, I plan on using the most flexible (and underused)
framework available, which is Ape. I'll write various classifiers, and
mappers so that a typical CMF or CPS site can be mapped naturally to
SQL. I'll also replace the catalog by an implementation that does SQL
queries. This will not be a simple endeavour, but I feel this is the
only way to get what I truly need in the end, without sacrifying any
flexibility.


If Ape proves to hard to work with (because it imposes its own
framework of mapping), I'll go the Hornet way of writing a storage directly, with
flexible enough policies for the mapping to SQL or the filesystem (or LDAP for that
matter).


It goes without saying that this will apply to Zope 3, because
that's part of what I think an ECM system should provide. But there's no
reason not to make it work in Zope 2, and that's probably where I'll
start.


Update: I was asked how this would fit with the "do things in python" philosophy that Martijn has been rightly pushing. Ape is first a pure python library that provides a new database class for use with ZODB (which itself can be considered independent of Zope). Ape also provides Zope-specific mappers that know how to best deal with a number Zope-specific classes. Whenever possible, I'd like to keep to that organisation (also keep in mind that a good separation will make it reusable both by Zope 2 and Zope 3).


Category: Product & Development