I've followed this thread with interest since I have a Zope site with tens of millions of entries in BTrees. It scales well, but it requires many tricks to make it work.
Roche Compaan wrote these great pieces on ZODB, Data.fs size and scalability at http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes and http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter . My own in-house product is similar to GoogleAnalytics. I have to use a cascading BTree structure (a btree of btrees of btrees) to handle the volume. This is because BTrees do slow down the more items they contain. This is not a ZODB limitation or flaw - it is just how they work. My structure allows for fast inserts, but they also allow aggregation of data. So if my lowest level of BTrees store hits for a particular hour in time then the containing BTree always knows exactly how many hits were made in a day. I update all parent BTrees as soon as an item is inserted. The cost of this operation is O(1) for every parent. These are all details but every single one influenced my design. What is important is that you cannot just use the ZCatalog to index tens of millions of items since every index is a single BTree and will thus suffer the larger it gets. So you must roll your own to fit your problem domain. Data warehousing is probably a good idea as well. My problem domain allows me to defer inserts, so I have a queuerunner that commits larger transactions in batches. This is better than lots of small writes. This may of course not fit your model. Familiarize yourself with TreeSets and set operations in Python (union etc.) since those tools form the backbone of catalogueing. Hedley _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )