Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Erik Enge

On Thu, 26 Apr 2001, Dieter Maurer wrote:

 Are the imported object CatalogAware?

They are.

 The old (pre 2.3.1) catalog implementation was know not to be very
 storage friendly. If a significant portion of the catalog indexes
 would be affected by imports, then you would see a quadratic storage
 increase.

I'm using Zope 2.3.1b1 so that shouldn't be a problem?


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Erik Enge

On Thu, 26 Apr 2001, Chris McDonough wrote:

 This level of growth doesn't seem like a sane level of growth... what
 Zope version are you using?

Zope 2.3.1b1 
 
  Someone told me that ZEO and bulk-adding could be a thing to look at...
 
 Isn't bulk-adding what you're doing now?

It is, but I'm not using ZEO.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Chris Withers

Erik Enge wrote:
 
  The old (pre 2.3.1) catalog implementation was know not to be very
  storage friendly. If a significant portion of the catalog indexes
  would be affected by imports, then you would see a quadratic storage
  increase.
 
 I'm using Zope 2.3.1b1 so that shouldn't be a problem?

Yes, it will be. I'd go for 2.3.2b2 and make sure you you run the catalog
updater.
I'm just about to upload a product which will update all the ZCatalogs in a
ZODB.

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Erik Enge

On Fri, 27 Apr 2001, Chris Withers wrote:

 Erik Enge wrote:
  
  I'm using Zope 2.3.1b1 so that shouldn't be a problem?
 
 Yes, it will be. [...]

So the bug in Zope 2.3.1b1 which makes the ZODB grow dramatically is
gone in Zope 2.3.2b2?



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Chris Withers

Erik Enge wrote:
 
 On Fri, 27 Apr 2001, Chris Withers wrote:
 
  Erik Enge wrote:
  
   I'm using Zope 2.3.1b1 so that shouldn't be a problem?
 
  Yes, it will be. [...]
 
 So the bug in Zope 2.3.1b1 which makes the ZODB grow dramatically is
 gone in Zope 2.3.2b2?

Not so much a bug as a complete rewrite of the data structures used by ZCatalog.
After the amount of work Chris M and Jim F put in, I don't think they'd be happy
having it described as 'just a bug' ;-)

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-27 Thread Chris McDonough



 Erik Enge wrote:
 
  On Fri, 27 Apr 2001, Chris Withers wrote:
 
   Erik Enge wrote:
   
I'm using Zope 2.3.1b1 so that shouldn't be a problem?
  
   Yes, it will be. [...]
 
  So the bug in Zope 2.3.1b1 which makes the ZODB grow dramatically is
  gone in Zope 2.3.2b2?

 Not so much a bug as a complete rewrite of the data structures used by
ZCatalog.
 After the amount of work Chris M and Jim F put in, I don't think they'd be
happy
 having it described as 'just a bug' ;-)

Bug is fine.  ;-)

Here's a FAQ about it:

http://www.zope.org/Members/mcdonc/HowTos/UpgradeToNewCatalog/index_html


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-26 Thread Erik Enge

On Thu, 5 Apr 2001, Michael R. Bernstein wrote:

 I'm trying to find out of there is a point where you start getting
 non-linear performance penalties for additional objects (storing,
 retreiving, or indexing).

I've just finished adding a somewhat small number of objects: 5000.
For every 1000th object, the Data.fs seemed to grow to about 900MB; that's
when things started going slow, in a non-linear fashion (this is more a
hunch than something I payed much attention to).

I paused the script (fancy Unix-command: ^Z) for every 1000th object,
packed the database (which shrunk to 19.5MB!  Hmpf.) and restarted the
script (again, fancy Unix-command: fg).  Then I was back to the same
speed as I initially had.

Does ZODB have a problem with big Data.fs files?  Not that I
know.  However, I do have a really fast SCSI-subsystem here so that
shouldn't be a big problem either.

I did some copying around with a couple of gigs, and it seems that my
hunch is right: ZODB does not have a problem with big Data.fs files, the
hardware does.

This could be caused indirectly by ZODB if it does too many operations on
the file, but I'm not too conserned about that.  Ie. a solution could be
to have ZODB play around with the Data.fs at a less frequent pace, or do
it in another fashion.  However, that's not really solving any problems,
unless ZODB is a total maniac with the filesystem.

I'm converting to ReiserFS this afternoon, maybe that will improve things
a bit.

Someone told me that ZEO and bulk-adding could be a thing to look at...


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-26 Thread Chris McDonough

Erik Enge wrote:
 
 On Thu, 5 Apr 2001, Michael R. Bernstein wrote:
 
  I'm trying to find out of there is a point where you start getting
  non-linear performance penalties for additional objects (storing,
  retreiving, or indexing).
 
 I've just finished adding a somewhat small number of objects: 5000.
 For every 1000th object, the Data.fs seemed to grow to about 900MB; that's
 when things started going slow, in a non-linear fashion (this is more a
 hunch than something I payed much attention to).
 
 I paused the script (fancy Unix-command: ^Z) for every 1000th object,
 packed the database (which shrunk to 19.5MB!  Hmpf.) and restarted the
 script (again, fancy Unix-command: fg).  Then I was back to the same
 speed as I initially had.

This level of growth doesn't seem like a sane level of growth... what
Zope version are you using?


 
 Does ZODB have a problem with big Data.fs files?  Not that I
 know.  However, I do have a really fast SCSI-subsystem here so that
 shouldn't be a big problem either.
 
 I did some copying around with a couple of gigs, and it seems that my
 hunch is right: ZODB does not have a problem with big Data.fs files, the
 hardware does.
 
 This could be caused indirectly by ZODB if it does too many operations on
 the file, but I'm not too conserned about that.  Ie. a solution could be
 to have ZODB play around with the Data.fs at a less frequent pace, or do
 it in another fashion.  However, that's not really solving any problems,
 unless ZODB is a total maniac with the filesystem.
 
 I'm converting to ReiserFS this afternoon, maybe that will improve things
 a bit.
 
 Someone told me that ZEO and bulk-adding could be a thing to look at...

Isn't bulk-adding what you're doing now?

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-26 Thread Dieter Maurer

Erik Enge writes:
  I've just finished adding a somewhat small number of objects: 5000.
  For every 1000th object, the Data.fs seemed to grow to about 900MB; that's
  when things started going slow, in a non-linear fashion (this is more a
  hunch than something I payed much attention to).
  
  I paused the script (fancy Unix-command: ^Z) for every 1000th object,
  packed the database (which shrunk to 19.5MB!  Hmpf.) and restarted the
  script (again, fancy Unix-command: fg).  Then I was back to the same
  speed as I initially had.
Are the imported object CatalogAware?
This would imply that each object's import rewrites part of the catalog
indexes and lets your ZODB grow drastically.

The old (pre 2.3.1) catalog implementation was know not to
be very storage friendly. If a significant portion of the
catalog indexes would be affected by imports, then you
would see a quadratic storage increase.

The new BTree implementation in 2.3.1 might reduce the problem.


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Erik Enge

On Thu, 5 Apr 2001, Michael R. Bernstein wrote:

 I'm trying to find out of there is a point where you start getting
 non-linear performance penalties for additional objects (storing,
 retreiving, or indexing).

I don't know, but I feel that is the case.  Actually, I know it is the
case, but I don't know what is causing it.  I know what isn't helping
though; CatalogAwareness.  I added 2000 objects with XML-RPC.  No other
queries were done during that period.  For each object about 70 DTML
Method/Documents were added.  The first couple of hundres went with a pace
of 2-3 seconds per object.  After that it started to get real slow, and
when I reached about 500 I was down to 5 seconds per object.  I killed
that script, rewrote it to only add 20-25 DTML Methods/Documents and
removed the CatalogAwareness and whoosh!  Under 1 second for each object
and it stayed like that for the entire 2000 objects.

The server is a 1GHz thingy with 1GB RAM.  It wasn't working too hard - it
seemed.
 
 Meanwhile Erik, what approach *did* your programmer take?

Well, the obviously more correct one. :)  He just made the files (that I
were going to index in a Catalog) stay on the filesystem and wrote some
nice regexps to do the searching I though I needed the speed of the
Catalog to do (yeah, yeah, I'm a rookie).

Thanks Jim! :)  
 
 I'll look forward to it.

Ok, and you know what to do if you haven't heard from me and the year is
not 2001 any more ;)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Chris Withers

Zopista wrote:
 
 Hear, hear. The cost of the incremental cataloguing is horrific. One of our
 developers when we started with Zope a year ago wrote a great script that
 imported a bunch, restarted zope, packed it, restarted it, imported a bunch
 more to get optimal performance. We dont do it like that any more ;)

Doesn't the new catalog/btree implementation fix this?

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Andy McKay

As I said this was a year ago... but still incremental cataloging is very
expensive.
--
  Andy McKay.


- Original Message -
From: "Chris Withers" [EMAIL PROTECTED]
To: "Zopista" [EMAIL PROTECTED]
Cc: "Erik Enge" [EMAIL PROTECTED]; "Michael R. Bernstein"
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, April 09, 2001 7:31 AM
Subject: Re: [Zope-dev] 27 million objects.


 Zopista wrote:
 
  Hear, hear. The cost of the incremental cataloguing is horrific. One of
our
  developers when we started with Zope a year ago wrote a great script
that
  imported a bunch, restarted zope, packed it, restarted it, imported a
bunch
  more to get optimal performance. We dont do it like that any more ;)

 Doesn't the new catalog/btree implementation fix this?

 cheers,

 Chris

 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Andy McKay

Any cataloguing and un-cataloguing of an object is expensive, c'mon you are
changing all the indices, vocabulary and so on. You never notice it normally
for 1 - 10 things, but run an import script of 1 and catalog each object
as it gets added (rather than all of them at the end) and you'll notice the
difference. (This script was cataloguing 250,000 mail messages, one at a
time. Big no-no)
--
  Andy McKay.


- Original Message -
From: "Chris Withers" [EMAIL PROTECTED]
To: "Andy McKay" [EMAIL PROTECTED]
Cc: "Erik Enge" [EMAIL PROTECTED]; "Michael R. Bernstein"
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, April 09, 2001 10:52 AM
Subject: Re: [Zope-dev] 27 million objects.


 Andy McKay wrote:
 
  As I said this was a year ago... but still incremental cataloging is
very
  expensive.

 How come? I always thought this was one of Zope's strong points as opposed
to,
 say, Lotus Notes' batch view buildign paradigm...

 cheers,

 Chris



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Chris Withers

Andy McKay wrote:
 
 Any cataloguing and un-cataloguing of an object is expensive, c'mon you are
 changing all the indices, vocabulary and so on. 

Yup...

 You never notice it normally
 for 1 - 10 things, but run an import script of 1 and catalog each object
 as it gets added (rather than all of them at the end) and you'll notice the
 difference. (This script was cataloguing 250,000 mail messages, one at a
 time. Big no-no)

Hmmm... I see your point now, and I think it's along the lines of "indexing
250,000 objects is slow, it's just when you take the performance hit".
Correct?

If so, then I don't really understand:
 Hear, hear. The cost of the incremental cataloguing is horrific.

How does this cost differ from non-incremenetal cataloguig?

Erm... actually... what is non-incremental cataloguing and how do you do it?

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Michael R. Bernstein

Andy McKay wrote:
 
 Any cataloguing and un-cataloguing of an object is expensive, c'mon you are
 changing all the indices, vocabulary and so on. You never notice it normally
 for 1 - 10 things, but run an import script of 1 and catalog each object
 as it gets added (rather than all of them at the end) and you'll notice the
 difference. (This script was cataloguing 250,000 mail messages, one at a
 time. Big no-no)

Perhaps I expressed myself poorly.

What I am watching out for is evidence that adding,
indexing, reindexing, or retreiving *a single object* (or a
small set of objects), takes longer if there are more
objects stored/indexed already.

In other words, does the time to
store/index/reindex/retreive an object change (for the
worse) depending on whether there are 10,000 objects,
100,000 objects or 10,000,000 objects stored/cataloged in
the ZODB/ZCatalog?

Previously, the fact that searching performance suffered
depending on a combination of number of total objects and
the size of the result set (irrespective of the batch size,
apparently), came to light, and has apparently been fixed.
Now searching performance scales with the number of
cataloged objects.

So, are there any non-linear gotchas waiting for me?

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Andy McKay

 Andy McKay wrote:
 
  Any cataloguing and un-cataloguing of an object is expensive, c'mon you
are
  changing all the indices, vocabulary and so on. You never notice it
normally
  for 1 - 10 things, but run an import script of 1 and catalog each
object
  as it gets added (rather than all of them at the end) and you'll notice
the
  difference. (This script was cataloguing 250,000 mail messages, one at a
  time. Big no-no)

 Perhaps I expressed myself poorly.

Yeah I think me and Chris have wandered off the point. Since my experience
with a large catalog was a year ago I will shut up until I have something
valuable to add :)

Cheers.
--
  Andy McKay.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-09 Thread Sin Hang Kin

Concerning Zcatalog:

1. Search of vocabulary is fully matched, partial search, trancate search or
fuzzy search will be useful for searcher.
2. If we catalog all things in one catalog, then we need to make filter when
we want to search on one particular portation.
3. I would like to suggest :

   putting cataloger in branches and make them catalog only those they were
interested. for example, if we put catalog in every branch of tree, then
they catalog only that branch.

  catalog can be set to propagate their collection into higher level. This
can be done in background. Then a search on a higher level will result in :
1. result in that catalog which meet the search specification and result
from the lower level of those the higher level does not known as which match
the search specification.

In such settings, for example, we can arrange the catalog into multi-layer
or have catalog specialize in one section. And have the propagate
hierarchical which they update their upper level catalog in batch. The user
is allow to search in a level and get all results up to that level and
below.

for example, in zope.org, we can put catalog in every document level, which
catalog one document and one mother catalog which combine all catalogs from
every document and one level up to handle all zope site. The we can search
specifically in whatever level we want.

It seems very difficult for all the catalog hierarchical to share their
storage, so there will be some storage inefficent. But this can provide the
necessary scalability.



Rgs,

Kent Sin
-
kentsin.weblogs.com
kentsin.imeme.net


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-05 Thread Michael R. Bernstein

Erik Enge wrote:
 
 The programmer solving our problems with the post codes has solved it in a
 different way than what I would've done (his method is way superior), so
 we're not ending up adding all addresses as Zope Objects.

Oh well. Does anyone else have any setups that store truly
massive (50k, 100k, 1M, you know, *lots*) numbers of
objects? Preferably stored in a BTree of some sort
(ZPatterns Rack, BTree folder, etc.). the objects can be
simple ZClasses, or almost anything else. I'm trying to find
out of there is a point where you start getting non-linear
performance penalties for additional objects (storing,
retreiving, or indexing).

Meanwhile Erik, what approach *did* your programmer take?

 Therefore, I don't have any benchmark tests available.  We are going to
 transfer some 10GB of data at a later stage though (within a month), and
 that could result in some tests being done - if so, I'll send you an
 email.  :-)

I'll look forward to it.

Cheers,

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] 27 million objects.

2001-04-05 Thread Andy McKay

I did try that but gave up when it got very unwieldy. Mind you that was
nearly a year ago before btree folders and I knew what the heck I was
actually doing in Zope.

Cheers.
--
  Andy McKay.


- Original Message -
From: "Michael R. Bernstein" [EMAIL PROTECTED]
To: "Erik Enge" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, April 05, 2001 7:24 AM
Subject: Re: [Zope-dev] 27 million objects.


 Erik Enge wrote:
 
  The programmer solving our problems with the post codes has solved it in
a
  different way than what I would've done (his method is way superior), so
  we're not ending up adding all addresses as Zope Objects.

 Oh well. Does anyone else have any setups that store truly
 massive (50k, 100k, 1M, you know, *lots*) numbers of
 objects? Preferably stored in a BTree of some sort
 (ZPatterns Rack, BTree folder, etc.). the objects can be
 simple ZClasses, or almost anything else. I'm trying to find
 out of there is a point where you start getting non-linear
 performance penalties for additional objects (storing,
 retreiving, or indexing).

 Meanwhile Erik, what approach *did* your programmer take?

  Therefore, I don't have any benchmark tests available.  We are going to
  transfer some 10GB of data at a later stage though (within a month), and
  that could result in some tests being done - if so, I'll send you an
  email.  :-)

 I'll look forward to it.

 Cheers,

 Michael Bernstein.

 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )