[Chris Withers]
> This is with whatever ZODB ships with Zope 2.8.5...

Do:

    import ZODB
    print ZODB.__version__

to find out.

> I have a Stepper (zopectl run on steroids) job that deals with lots of
> big objects.

Can you quantify this?

> After processing each one, Stepper does a transaction.get().commit().

Note that "transaction.commit()" is a shortcut spelling.

> I thought this was enough to keep the object cache at a sane size,

It does not do cacheMinimize().  It tries to reduce the memory cache to the
target number of objects specified for that cache, which is not at all the
same as cache minimization (which latter shoots for a target size of 0).
Whether that's "sane" or not depends on the product of:

    the cache's target number of objects

times:

    "the average" byte size of an object

ZODB has no say of its own about either of those.

> however the job kept bombing out with MemoryErrors, and sure enough it
> was using 2 or 3 gigs of memory when that happened.
>
> I fiddled about with the gc module and found that, sure enough, object
> were being kept in memory. At a guess, I inserted something close to the
> following:
>
> obj._p_jar.db().cacheMinimize()
>
> ...after each 5,000 objects were processed (there are 60,000 objects in
> total)
>
> Lo and behold, memory usage became sane.
>
> Why is this step necessary? I thought transaction.get().commit() every so
> often was enough to sort out the cache...

See above.  For most people it works OK.  If `cn` is the Connection, then

    cn._cache.cache_size is the target number of non-ghost objects
    cn._cache.ringlen() is the current number of non-ghost objects

At a transaction boundary, the cache gc method run tries to make ringlen()
<= cache_size, and that's all.

For example, using all defaults:

>>> ZODB.__version__  # probably the version you're using
'3.4.2'

This loads a million-element OOBTree (the construction of which I won't show
here):

>>> len(t)
1000000

The number of non-ghost objects is then approximately 1e6/15 (the number of
leaf-node OOBuckets in that tree; there are more than that because of
non-leaf interior OOBTree nodes, but the leaf nodes account for the bulk of
it):

>>> cn._cache.cache_size, cn._cache.ringlen()
(400, 67067)

At a transaction boundary, a cache gc pass is run to try to reduce the
number of non-ghost objects to cache_size:

>>> transaction.commit()
>>> cn._cache.cache_size, cn._cache.ringlen()
(400, 400)

So it booted 67067 - 400 = 66667 non-ghost objects.

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to