On Tue, Sep 29, 2015 at 6:40 AM, Bastian Kuberek <bkube...@gmail.com> wrote: > Could you please provide some brief reasoning on why ZODB is so brilliant? I > have learned about it a few years ago when getting into pyramid but I have > never used or seen it being used. I would love to get some insight on why I > would use it and what would be a good use case to use it.
I haven't used ZODB but I've used Durus, which is a simpler database following the same concept. An object database allows you to store hierarchical Python data structures naturally: objects containing objects, and dicts containing dicts of objects. You don't have to shoehorn them into unpythonic relational tables or key-value stores or use a complicated ORM. You just open the database and get the root object, which behaves like a dict, and everything is under it following keys and subkeys. Pyramid's Traversal mechanism is designed for it, where the tree is the actual database structure and each node (aka context) is an object in the database. The main difference between ZODB and Durus as I understand it is ZODB has a thread-safe layer while Durus is for single-threaded programs (or with user-managed synchronization). This simplifies the code significantly. So I'll describe Durus which I know, but I think ZODB works similarly. In Durus the root object is a PersistentDict. You can put ordinary Python items into it, and they'll all be pickled with the dict. So later if you ask for one item, it has to unpickle the whole dict to retrieve it or check if it exists. But if you have a class that subclasses Persistent, and you create an instance and put it into the dict, then it will be pickled separately, so that when you ask for it again the database only has to unpickle one object. So intelligently using the database revolves around deciding which levels of nested objects to make Persistent. It comes with PersistentDict, PersistentList, and PersistentHash classes. A persistent dict or list pickles all nonpersistent items and stores references to the persistent items. A PersistentHash is like a dict but it pickles the nonpersistent items in buckets of 16 (or whichever scale you choose). My application had three kinds of data (three subdicts), and the largest one had 5000 objects, each with fifty attributes and a list of large strings. I had to do a lot of searching through the objects as well as retrieving individual ones. I compared the performance between several persistence levels, and found that PersistentHash used less memory and was faster than using either persisent objects or a regular dict, at least for that dataset. The database is append-only for reliability, has commit/rollback, and you can undo the last several transactions. There's a pack routine to rebuld the database squeezing out versions of things. I liked being able to store hierarchical data simply without having to translate it into SQL data types and relationships and deal with a complex ORM. But it has tradeoffs which are inherent in the database design. I hated SQL because it's such an ancient clunky syntax, but after I put my Durus database into production I realized that I missed writing one-line queries. In SQL you can just write ten words to count all the rows where some condition exists, or display the records in a grid. In Durus you have to write a Python program to do a 'for' loop through the database. That got annoying for ad-hoc queries. Another issue is that the hierarchical model works better for 1:many relationships than many:many. In an object database you embed objects in their natural parents, and then query them as ``root[parent][child]``. But if you have an address that's used in two places, you can't embed it in both places without duplicating it. So you have to embed a reference to it, and then manually follow the reference. A third issue is that its append-only mode is unsuitable for counters or frequent field updates. Even the littlest field change requires repickling the entire record. Actually, PostgreSQL works like that too because it stores records as immutable tuples, but PostgresQL reuses space from deleted tuples, while Durus just appends until you pack. Redis is probably the most useful for counters and updating timestamps because it can change memory in place. So it all comes down to the nature of your data and what you want to do with it, whether a SQL database or object database or key-value database is the most suitable for it. -- Mike Orr <sluggos...@gmail.com> -- You received this message because you are subscribed to the Google Groups "pylons-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to pylons-devel+unsubscr...@googlegroups.com. To post to this group, send email to pylons-devel@googlegroups.com. Visit this group at http://groups.google.com/group/pylons-devel. For more options, visit https://groups.google.com/d/optout.