Caching is hard

Fri 25 February 2005 By Florent Guillaume

Caching is a difficult problem, especially when needing to
correctly invalidate entries. Invalidating is difficult when
there may be several caches spread across ZODB connection
(this is the case for 'v' attributes), or spread across ZEO
clients. Accessing a 'v' attribute in another ZODB
connection (i.e., usually, a different thread) is not really
feasible, and accessing another ZEO client that may be across
the world is impossible directly.

Often there are mitigating factors that make the invalidation
not really necessary. For instance if the caching is
only done for the duration of the current request, an
appropriate cleanup of a 'v' attribute can be done at the
end of the request by indirectly using a destructor of the
request object. This is what is done by the CMF's
MemberDataTool to hold on to the wrapped member objects for
the duration of the request, but not longer.

A much better way to do coarse-grained invalidation accross
threads and clients is to do it indirectly, through the use of
something that can communicate with all these threads and
clients: a persistent object. The idea is that all cached
values are cached with a generation number, the generation at
the moment of their creation. When retrieving a value from the
cache, the generation it has is compared to the current
generation, and the cached value is deemed invalid if they
don't match. This generation number is stored in a persistent
object (that is therefore visible to all). When an
invalidation must be done, the persistent generation number is
incremented (to avoid ConflictError problems, a
conflict-resistent class such as BTree.Length should be used).
This solution works when a coarse-grained invalidation is
suitable, and when don't happen too often.

A bug was recently reported to me that involved caching, but
it turned out to be a different class of problem. The problem
was that, through a cache, a value holding to persistent
references was shared among threads. But it is illegal to use
an object from one ZODB connection in another. When threading
is involved (which it often is in Zope), problems arise. In
this bug, the object itself (a user) was not persisted, but it
had to references to persistent objects (the user folder and a
directory). In the end I found it better to not store the user
object in the cache, but just a dictionary of information
needed to recreate the user object.

Category: Product & Development