[ZODB-Dev] ZODB 3.9/3.8 incompatibility (was Re: Data.fs size grows non-stop)

2009-12-09 Thread Jim Fulton
On Wed, Dec 9, 2009 at 12:06 PM, Marius Gedminas  wrote:
...
> (Supporting both ZODB 3.8 and 3.9 is kinda tricky, but with some very
> ugly hacks I managed.)

This sounds like something that needs to be fixed. Can you share some of the
issues you ran into? (Or maybe file bugs reports.)

In particular, I think it should be a goal that it isn't too hard to
write code that
works with both ZODB 3.8 and 3.9.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Marius Gedminas
On Wed, Dec 09, 2009 at 06:09:36PM +0200, Marius Gedminas wrote:
> On Wed, Dec 09, 2009 at 02:42:58PM +0100, Pedro Ferreira wrote:
> > Hello,
> > > Just zodbbrowser with no prefix:
> > >
> > >   http://pypi.python.org/pypi/zodbbrowser
> > >   https://launchpad.net/zodbbrowser
> > >
> > > It's a web-app: it can connect to your ZEO server so you can inspect the
> > > DB while it's being used.
> > >   
> > We tried this, but we currently get an error related with the general 
> > security policy for zope.app. Maybe we need to install Zope?
> 
> No, it should be self-contained.
> 
> I'm usually testing with the Zope 3.4 KGS.

I've just released zodbbrowser 0.6.1 which works with ZODB 3.9.x and
latest versions of other packages too.

(Supporting both ZODB 3.8 and 3.9 is kinda tricky, but with some very
ugly hacks I managed.)

Marius Gedminas
-- 
Any sufficiently advanced Operating System is indistinguishable from Linux.
-- Jim Dennis


signature.asc
Description: Digital signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Jim Fulton
On Wed, Dec 9, 2009 at 10:54 AM, Pedro Ferreira
 wrote:
...
> Well, at least we won't have to rewrite the whole bucket... but still, it
> would be much nicer to fragment the list
> in smaller chunks. We could use an OOBTree instead... but something less
> complex would suffice... any suggestions?

It's hard for people to suggest something without knowing your usage pattern.
Do you want something that is effectively a set? Is integer indexing important?
What are the items in the list?

> We are really considering replacing all our indexes with something more
> efficient and ZODB-friendly... do you thing zope.catalog would be a good
> choice?

A catalog is simply an organization of indexes, typically implemented
with BTrees.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Christian Theune
On 12/09/2009 04:58 PM, Pedro Ferreira wrote:
>
>> OOBTrees are complex?
>>
> No, I meant something list-like, rather than key-value.

zc.blist implements a BTree-based list.

-- 
Christian Theune · c...@gocept.com
gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1
Zope and Plone consulting and development

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Marius Gedminas
On Wed, Dec 09, 2009 at 02:42:58PM +0100, Pedro Ferreira wrote:
> Hello,
> > Just zodbbrowser with no prefix:
> >
> >   http://pypi.python.org/pypi/zodbbrowser
> >   https://launchpad.net/zodbbrowser
> >
> > It's a web-app: it can connect to your ZEO server so you can inspect the
> > DB while it's being used.
> >   
> We tried this, but we currently get an error related with the general 
> security policy for zope.app. Maybe we need to install Zope?

No, it should be self-contained.

I'm usually testing with the Zope 3.4 KGS.  Easiest way to do that is to
check out the sources and use buildout:

  bzr get lp:zodbbrowser
  cd zodbbrowser
  python bootstrap.py
  bin/buildout
  bin/zodbbrowser --zeo localhost:1234 

I'll test it with the latest zope.* packages from PyPI and see if I can
reproduce the error.  Feel free to report a bug (and attach the
traceback you get) here: https://bugs.launchpad.net/zodbbrowser

> This would be a very handy tool.
> > I'd suggest dumping the last few transactions with one of the ZODB
> > scripts (fsdump.py perhaps) and seeing what objects get modified.
> >   
> That's what we've being doing, and we got some clues. We've modified 
> Jim's script in order to find out which OIDs are being rewritten, and 
> how much space they are taking, and this is a fragment of it:
> 
> OID class_name total_size percent_size n_pickles min_size avg_size max_size
> '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683 
> 1977885 2004241 2026518
> '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683 
> 1616904 1635889 1651956
> '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522 
> 20% 28513 418230 419315 420294
> '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238 
> 307112 314379 320647
> '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238 
> 190816 195216 199007
> '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3% 
> 1953 880615 884903 887285
> [...]
> 
> As you can see, we have an OOBucket occupying more than 2MB (!) per 
> write.

Ouch.

> That's almost 17GB only considering the last 1M transactions of 
> the DB (we get ~3M transactions per week). We believe this bucket 
> belongs to some OOBTree-based index that we are using, whose values are 
> Python lists (maybe that was a bad choice to start with?). In any case, 
> how do OOBuckets work? Is it a simple key space segmentation strategy, 
> or are the values taken into account as well?
> Our theory is that an OOBTree simply divides the N keys in K buckets, 
> and doesn't care about the contents. So, since we are adding very large 
> lists as values, the tree remains unbalanced, and since new contents 
> will be added to this last bucket, each rewrite will imply the addition 
> of ~2MB to the file storage.
> Will the replacement of these lists with a persistent structure such as 
> a PersistentList solve the issue?

It should definitely help.  Now, if modify one of the lists in a bucket,
it needs to append the whole 2 megs of data to the Data.fs.  If you'd
used a PersistentList, it would only need to append as much data as your
persistent list contains.  Assuming there are ~30 lists in a bucket,
you'd get 30x space savings.

Are those lists long?  Are they modified often?  What do they contain?
I'm sure you can get better space efficiency by redesigning the data
structure.

Marius Gedminas
-- 
Dijkstra probably hates me
-- Linus Torvalds, in kernel/sched.c


signature.asc
Description: Digital signature
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Pedro Ferreira

> OOBTrees are complex?
>   
No, I meant something list-like, rather than key-value.
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Andreas Jung
Am 09.12.09 16:54, schrieb Pedro Ferreira:
> We could use an OOBTree instead... but something less
> complex would suffice... any suggestions?

OOBTrees are complex?

-aj
<>___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Pedro Ferreira
Hello,
>
> In the case of BTrees, yes.  I assume your OOBuckets are used within
> OOBTrees. (?)
>
>   
Yes. I was not sure whether OOBTrees behaved exactly like a BTree.
>> So, since we are adding very large
>> lists as values, the tree remains unbalanced,
>> 
>
> No, they trees tend to stay fairly well balenced wrt keys.
>   
Well, I mean, they are balanced, but the contents are not distributed in 
an homogeneous way... which is the expected behavior since they don't 
care about the values.
>   
>> and since new contents
>> will be added to this last bucket,
>> 
>
> Why would the contents only be added to one bucket?
>   
Because we're using this OOBTree as a date index and each time we add 
new contents they're added to the entry that concerns the current day. 
Currently, the bucket that contains the current day also contains some 
days that themselves contain huge lists of contents. This will probably 
change as the buckets split, but we want to correct this once and for all...
>
>> Will the replacement of these lists with a persistent structure such as
>> a PersistentList solve the issue?
>> 
>
> It might help, but if the lists are very large, you'll still have a problem
> because a persistent list is still stored in one database record.
>   
Well, at least we won't have to rewrite the whole bucket... but still, 
it would be much nicer to fragment the list
in smaller chunks. We could use an OOBTree instead... but something less 
complex would suffice... any suggestions?

We are really considering replacing all our indexes with something more 
efficient and ZODB-friendly... do you thing zope.catalog would be a good 
choice?

Thanks once again,

Pedro
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Jim Fulton
On Wed, Dec 9, 2009 at 8:42 AM, Pedro Ferreira
 wrote:
...
> We've modified
> Jim's script

Cool. Storage iterators are so simple and allow a wide variety of analyses.

> in order to find out which OIDs are being rewritten, and
> how much space they are taking, and this is a fragment of it:
>
> OID class_name total_size percent_size n_pickles min_size avg_size max_size
> '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683
> 1977885 2004241 2026518
> '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683
> 1616904 1635889 1651956
> '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522
> 20% 28513 418230 419315 420294
> '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238
> 307112 314379 320647
> '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238
> 190816 195216 199007
> '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3%
> 1953 880615 884903 887285
> [...]
>
> As you can see, we have an OOBucket occupying more than 2MB (!) per
> write. That's almost 17GB only considering the last 1M transactions of
> the DB (we get ~3M transactions per week). We believe this bucket
> belongs to some OOBTree-based index that we are using, whose values are
> Python lists (maybe that was a bad choice to start with?). In any case,
> how do OOBuckets work?

Buckets themselves are essentially just sorted lists of key-value pairs.

> Is it a simple key space segmentation strategy,

In the case of BTrees, yes.  I assume your OOBuckets are used within
OOBTrees. (?)

> or are the values taken into account as well?

No.

> Our theory is that an OOBTree simply divides the N keys in K buckets,
> and doesn't care about the contents.

Right.

> So, since we are adding very large
> lists as values, the tree remains unbalanced,

No, they trees tend to stay fairly well balenced wrt keys.

> and since new contents
> will be added to this last bucket,

Why would the contents only be added to one bucket?


> each rewrite will imply the addition
> of ~2MB to the file storage.

That's definately a problem.

> Will the replacement of these lists with a persistent structure such as
> a PersistentList solve the issue?

It might help, but if the lists are very large, you'll still have a problem
because a persistent list is still stored in one database record.

Jim

-- 
Jim Fulton
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Laurence Rowe
2009/12/9 Pedro Ferreira :
> Hello,
>> Just zodbbrowser with no prefix:
>>
>>   http://pypi.python.org/pypi/zodbbrowser
>>   https://launchpad.net/zodbbrowser
>>
>> It's a web-app: it can connect to your ZEO server so you can inspect the
>> DB while it's being used.
>>
> We tried this, but we currently get an error related with the general
> security policy for zope.app. Maybe we need to install Zope?
> This would be a very handy tool.
>> I'd suggest dumping the last few transactions with one of the ZODB
>> scripts (fsdump.py perhaps) and seeing what objects get modified.
>>
> That's what we've being doing, and we got some clues. We've modified
> Jim's script in order to find out which OIDs are being rewritten, and
> how much space they are taking, and this is a fragment of it:
>
> OID class_name total_size percent_size n_pickles min_size avg_size max_size
> '\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683
> 1977885 2004241 2026518
> '\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683
> 1616904 1635889 1651956
> '\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522
> 20% 28513 418230 419315 420294
> '\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238
> 307112 314379 320647
> '\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238
> 190816 195216 199007
> '\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3%
> 1953 880615 884903 887285
> [...]
>
> As you can see, we have an OOBucket occupying more than 2MB (!) per
> write. That's almost 17GB only considering the last 1M transactions of
> the DB (we get ~3M transactions per week). We believe this bucket
> belongs to some OOBTree-based index that we are using, whose values are
> Python lists (maybe that was a bad choice to start with?). In any case,
> how do OOBuckets work? Is it a simple key space segmentation strategy,
> or are the values taken into account as well?
> Our theory is that an OOBTree simply divides the N keys in K buckets,
> and doesn't care about the contents. So, since we are adding very large
> lists as values, the tree remains unbalanced, and since new contents
> will be added to this last bucket, each rewrite will imply the addition
> of ~2MB to the file storage.

BTree buckets have no concept of the size of their contents, they
split when their number of keys reaches a threshold (30 for OOBTrees).

> Will the replacement of these lists with a persistent structure such as
> a PersistentList solve the issue?

The list would then be stored as a separate persistent object, so
changes to the bucket would not rewrite the entire list object. The
downside of this is that your application may become slower as reading
the contents of the index will incur additional object loads. Zope2's
ZCatalog stores index data as tuples in BTrees, but only a small
amount of metadata is stored (so the buckets are maybe 30-60KB). It
sounds like you are storing a large amount of metadata in the index,
or perhaps inadvertently indexing something. I've seen similar
problems caused by binary data ending up in a text index (where a
'word' ended up being several megabytes). Load the object to check the
problem is large values, rather than large keys.

Laurence
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Data.fs size grows non-stop

2009-12-09 Thread Pedro Ferreira
Hello,
> Just zodbbrowser with no prefix:
>
>   http://pypi.python.org/pypi/zodbbrowser
>   https://launchpad.net/zodbbrowser
>
> It's a web-app: it can connect to your ZEO server so you can inspect the
> DB while it's being used.
>   
We tried this, but we currently get an error related with the general 
security policy for zope.app. Maybe we need to install Zope?
This would be a very handy tool.
> I'd suggest dumping the last few transactions with one of the ZODB
> scripts (fsdump.py perhaps) and seeing what objects get modified.
>   
That's what we've being doing, and we got some clues. We've modified 
Jim's script in order to find out which OIDs are being rewritten, and 
how much space they are taking, and this is a fragment of it:

OID class_name total_size percent_size n_pickles min_size avg_size max_size
'\x00\x00\x00\x00%T\x89{' BTrees.OOBTree.OOBucket 17402831841 30% 8683 
1977885 2004241 2026518
'\x00\x00\x00\x00%T\x89|' BTrees.OOBTree.OOBucket 14204430890 24% 8683 
1616904 1635889 1651956
'\x00\x00\x00\x00\x04dUH' MaKaC.common.indexes.StatusIndex 11955954522 
20% 28513 418230 419315 420294
'\x00\x00\x00\x00%\xa0%\x7f' BTrees.OOBTree.OOBucket 3532998238 6% 11238 
307112 314379 320647
'\x00\x00\x00\x00%\xa0%\x80' BTrees.OOBTree.OOBucket 2193843302 3% 11238 
190816 195216 199007
'\x00\x00\x00\x00\x04\x8e\xb6\x04' BTrees.OOBTree.OOBucket 1728216003 3% 
1953 880615 884903 887285
[...]

As you can see, we have an OOBucket occupying more than 2MB (!) per 
write. That's almost 17GB only considering the last 1M transactions of 
the DB (we get ~3M transactions per week). We believe this bucket 
belongs to some OOBTree-based index that we are using, whose values are 
Python lists (maybe that was a bad choice to start with?). In any case, 
how do OOBuckets work? Is it a simple key space segmentation strategy, 
or are the values taken into account as well?
Our theory is that an OOBTree simply divides the N keys in K buckets, 
and doesn't care about the contents. So, since we are adding very large 
lists as values, the tree remains unbalanced, and since new contents 
will be added to this last bucket, each rewrite will imply the addition 
of ~2MB to the file storage.
Will the replacement of these lists with a persistent structure such as 
a PersistentList solve the issue?

Thanks for all the help everyone,

Regards,

Pedro
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev