Re: [pylons-devel] Re: ZODB News

Mike Orr Tue, 29 Sep 2015 18:57:08 -0700

On Tue, Sep 29, 2015 at 6:40 AM, Bastian Kuberek <bkube...@gmail.com> wrote:
> Could you please provide some brief reasoning on why ZODB is so brilliant? I
> have learned about it a few years ago when getting into pyramid but I have
> never used or seen it being used. I would love to get some insight on why I
> would use it and what would be a good use case to use it.


I haven't used ZODB but I've used Durus, which is a simpler database
following the same concept. An object database allows you to store
hierarchical Python data structures naturally: objects containing
objects, and dicts containing dicts of objects. You don't have to
shoehorn them into unpythonic relational tables or key-value stores or
use a complicated ORM. You just open the database and get the root
object, which behaves like a dict, and everything is under it
following keys and subkeys. Pyramid's Traversal mechanism is designed
for it, where the tree is the actual database structure and each node
(aka context) is an object in the database.

The main difference between ZODB and Durus as I understand it is ZODB
has a thread-safe layer while Durus is for single-threaded programs
(or with user-managed synchronization). This simplifies the code
significantly. So I'll describe Durus which I know, but I think ZODB
works similarly.

In Durus the root object is a PersistentDict. You can put ordinary
Python items into it, and they'll all be pickled with the dict. So
later if you ask for one item, it has to unpickle the whole dict to
retrieve it or check if it exists. But if you have a class that
subclasses Persistent, and you create an instance and put it into the
dict, then it will be pickled separately, so that when you ask for it
again the database only has to unpickle one object. So intelligently
using the database revolves around deciding which levels of nested
objects to make Persistent. It comes with PersistentDict,
PersistentList, and PersistentHash classes. A persistent dict or list
pickles all nonpersistent items and stores references to the
persistent items. A PersistentHash is like a dict but it pickles the
nonpersistent items in buckets of 16 (or whichever scale you choose).
My application had three kinds of data (three subdicts), and the
largest one had 5000 objects, each with fifty attributes and a list of
large strings. I had to do a lot of searching through the objects as
well as retrieving individual ones. I compared the performance between
several persistence levels, and found that PersistentHash used less
memory and was faster than using either persisent objects or a regular
dict, at least for that dataset.

The database is append-only for reliability, has commit/rollback, and
you can undo the last several transactions. There's a pack routine to
rebuld the database squeezing out versions of things.

I liked being able to store hierarchical data simply without having to
translate it into SQL data types and relationships and deal with a
complex ORM. But it has tradeoffs which are inherent in the database
design. I hated SQL because it's such an ancient clunky syntax, but
after I put my Durus database into production I realized that I missed
writing one-line queries. In SQL you can just write ten words to count
all the rows where some condition exists, or display the records in a
grid. In Durus you have to write a Python program to do a 'for' loop
through the database. That got annoying for ad-hoc queries.

Another issue is that the hierarchical model works better for 1:many
relationships than many:many. In an object database you embed objects
in their natural parents, and then query them as
``root[parent][child]``. But if you have an address that's used in two
places, you can't embed it in both places without duplicating it. So
you have to embed a reference to it, and then manually follow the
reference.

A third issue is that its append-only mode is unsuitable for counters
or frequent field updates. Even the littlest field change requires
repickling the entire record. Actually, PostgreSQL works like that too
because it stores records as immutable tuples, but PostgresQL reuses
space from deleted tuples, while Durus just appends until you pack.
Redis is probably the most useful for counters and updating timestamps
because it can change memory in place.

So it all comes down to the nature of your data and what you want to
do with it, whether a SQL database or object database or key-value
database is the most suitable for it.

-- 
Mike Orr <sluggos...@gmail.com>

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pylons-devel+unsubscr...@googlegroups.com.
To post to this group, send email to pylons-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/pylons-devel.
For more options, visit https://groups.google.com/d/optout.

Re: [pylons-devel] Re: ZODB News

Reply via email to