Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Hedley Roos skrev:
> I've followed this thread with interest since I have a Zope site with
> tens of millions of entries in BTrees. It scales well, but it requires
> many tricks to make it work.
>
> Roche Compaan wrote these great pieces on ZODB, Data.fs size and
> scalability at 
> http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
> and 
> http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
>   

Thanks for those links, interesting stuff.  :-)

> My own in-house product is similar to GoogleAnalytics. I have to use a
> cascading BTree structure (a btree of btrees of btrees) to handle the
> volume. This is because BTrees do slow down the more items they
> contain. This is not a ZODB limitation or flaw - it is just how they
> work.
>   

Something like Google Analytics I'd be interested in too, it wasn't the 
aim for this
thread but something that's been bobbing around in my head.  Is this 
something
you're thinking of releasing or is it "too good/bad to share"?

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Peter Bengtsson
For huge inserts like that, have you looked at the more modern
alternatives such as Tokyo Cabinet or MongoDB?
I heard about an experiment to transfer 20 million text blobs into a
Tokyo Cabinet. The first 10 million inserts were superfast but after
that it started to take up to a second to insert each item.
I'm not famililar with how good they are but I know they both have
indexing. And I'm confident they both have good Python APIs.
Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID".

2009/4/27 Hedley Roos :
> I've followed this thread with interest since I have a Zope site with
> tens of millions of entries in BTrees. It scales well, but it requires
> many tricks to make it work.
>
> Roche Compaan wrote these great pieces on ZODB, Data.fs size and
> scalability at 
> http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
> and 
> http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
> .
>
> My own in-house product is similar to GoogleAnalytics. I have to use a
> cascading BTree structure (a btree of btrees of btrees) to handle the
> volume. This is because BTrees do slow down the more items they
> contain. This is not a ZODB limitation or flaw - it is just how they
> work.
>
> My structure allows for fast inserts, but they also allow aggregation
> of data. So if my lowest level of BTrees store hits for a particular
> hour in time then the containing BTree always knows exactly how many
> hits were made in a day. I update all parent BTrees as soon as an item
> is inserted. The cost of this operation is O(1) for every parent.
> These are all details but every single one influenced my design.
>
> What is important is that you cannot just use the ZCatalog to index
> tens of millions of items since every index is a single BTree and will
> thus suffer the larger it gets. So you must roll your own to fit your
> problem domain.
>
> Data warehousing is probably a good idea as well.
>
> My problem domain allows me to defer inserts, so I have a queuerunner
> that commits larger transactions in batches. This is better than lots
> of small writes. This may of course not fit your model.
>
> Familiarize yourself with TreeSets and set operations in Python (union
> etc.) since those tools form the backbone of catalogueing.
>
> Hedley
> ___
> Zope maillist  -  z...@zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )
>



-- 
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Lennart Regebro
On Mon, Apr 27, 2009 at 17:57, Morten W. Petersen  wrote:
> OK.  Well, I'm concerned about how much a database would grow.  I'm thinking
> if
> I use one BTree for all the entries, would the database grow just a little
> or a lot when
> you start getting into the millions of entries when inserting one small
> item?

Growth is a problem only if you are going to modify these entries a lot.

> Mm.  Yes, Plone is a bit sluggish, that's why I want to write a purely
> Zope-based app.

Absolutely.

> Mm.  I guess I could be OK with one "index", it being the id/path of the
> object.  However,
> it would be nice to build for the future and include the ability to search
> all objects.  Maybe
> a combination of the two could work.

Yeah, for full text search you would definietly benefit from the full
text indexes that the catalog has.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Hedley Roos
I've followed this thread with interest since I have a Zope site with
tens of millions of entries in BTrees. It scales well, but it requires
many tricks to make it work.

Roche Compaan wrote these great pieces on ZODB, Data.fs size and
scalability at 
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
and 
http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
.

My own in-house product is similar to GoogleAnalytics. I have to use a
cascading BTree structure (a btree of btrees of btrees) to handle the
volume. This is because BTrees do slow down the more items they
contain. This is not a ZODB limitation or flaw - it is just how they
work.

My structure allows for fast inserts, but they also allow aggregation
of data. So if my lowest level of BTrees store hits for a particular
hour in time then the containing BTree always knows exactly how many
hits were made in a day. I update all parent BTrees as soon as an item
is inserted. The cost of this operation is O(1) for every parent.
These are all details but every single one influenced my design.

What is important is that you cannot just use the ZCatalog to index
tens of millions of items since every index is a single BTree and will
thus suffer the larger it gets. So you must roll your own to fit your
problem domain.

Data warehousing is probably a good idea as well.

My problem domain allows me to defer inserts, so I have a queuerunner
that commits larger transactions in batches. This is better than lots
of small writes. This may of course not fit your model.

Familiarize yourself with TreeSets and set operations in Python (union
etc.) since those tools form the backbone of catalogueing.

Hedley
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Peter Bengtsson skrev:
> From experience I find that BTrees are very fast to write to and pick
> out items from. Even in the millions. (Never gone into the tens of
> millions or further)
> Also, when it comes to browsing stuff I find SQL faster and easier to
> work with. An added advantage of a RDBMS is that you get the indexing
> seamlessly built in (no need to bridge zbrain.getObject()) and it
> makes it easier to optimize and figure out which indexes help and
> which indexes slow you down which is something that is far from
> obvious with a ZCatalog approach.
>   

Right.  But wouldn't profiling indexes in Zope be as easy as wrapping the
index search method in a function that does time.time before and after
the search?  :-)

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen
Lennart Regebro skrev:
> On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen  
> wrote:
>   
>> So far, I've been contemplating disabling undo (if that's possible),
>> 
>
> I doubt that it would make a difference. The Undo functionality comes
> out of the database being logging, and changing that would mean pretty
> much a complete rewrite.
>   

OK.  Well, I'm concerned about how much a database would grow.  I'm 
thinking if
I use one BTree for all the entries, would the database grow just a 
little or a lot when
you start getting into the millions of entries when inserting one small 
item?

>> and using BTree structures, maybe segmenting objects into different groups
>> (folders) to further speed up lookups.
>> 
>
> Yes, in my experience putting small objects in to BTree structures is
> quite fast. You may be talking about BTreeFolders, and in that case I
> don't know, I haven't done any sort of performance testing on those, I
> have used BTrees directly though, and that was fast. I haven't
> compared to SQL, but others have, and ZODB itself seems according to
> those tests quite fast. We know Plone slows everything down immensly
> in any case.
>
> I don't know if BTrees get slow when they get very big, so you would
> need to test that.
>   

Mm.  Yes, Plone is a bit sluggish, that's why I want to write a purely 
Zope-based app.

Yeah, I'll have to try different storage strategies in the ZODB, to see 
if a BTreeFolder
containing BTrees in the [0-9|A-Z|a-z] ranges would do, or if I need to 
partition it
up further with BTreeFolders containing BTreeFolders.

On the one hand I'm concerned about lookup speed, on the other about 
speed of
inserts and how much the entire database will grow inserting a < 1 KB 
object.

>>  Should I consider using the ZCatalog for faster lookups?
>> 
>
> Maybe. You probably need to not only store the objects in BTrees, but
> also somehow have indexes. These you do by storing the values you want
> to search on in BTrees as well. The ZCatalog does this in a
> configurable way for you, so if you need configurability, yes. If not,
> it's probably faster to make your own indexes with your own BTrees.
>   

Mm.  I guess I could be OK with one "index", it being the id/path of the 
object.  However,
it would be nice to build for the future and include the ability to 
search all objects.  Maybe
a combination of the two could work.

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Morten W. Petersen

> I suggest you experiment a bit. Create 100 million objects, and do
> some of the actions you are planning to do on them.
>   

Right.  I'm thinking of taking the time to try a simple SQL based 
implementation,
as well as one in ZODB.  I need to learn more about high-speed Zope 
programming
as well as keeping my SQL skills up to date so.  :-)

-Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-27 Thread Peter Bengtsson
>From experience I find that BTrees are very fast to write to and pick
out items from. Even in the millions. (Never gone into the tens of
millions or further)
Also, when it comes to browsing stuff I find SQL faster and easier to
work with. An added advantage of a RDBMS is that you get the indexing
seamlessly built in (no need to bridge zbrain.getObject()) and it
makes it easier to optimize and figure out which indexes help and
which indexes slow you down which is something that is far from
obvious with a ZCatalog approach.

2009/4/25 Morten W. Petersen :
> Hi,
>
> I'm considering building a large scale, but small in features site.  It
> will contain
> lots of small objects (millions, tens of millions, hundreds of millions)
> of objects,
> where each object has a couple of strings and maybe some other light
> attributes.
>
> So far, I've been contemplating disabling undo (if that's possible), and
> using
> BTree structures, maybe segmenting objects into different groups
> (folders) to
> further speed up lookups.  Scalability is also an issue, should I
> consider using
> RelStorage?  Should I consider using the ZCatalog for faster lookups?
>
> Has anyone else developed something similar?  Are there Zope product
> examples out there that fit the bill?
>
> -Morten
>
> --
> Morten W. Petersen
> Manager
> Nidelven IT Ltd
>
> Phone: +47 45 44 00 69
> Email: mor...@nidelven-it.no
>
> ___
> Zope maillist  -  z...@zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )
>



-- 
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-26 Thread Lennart Regebro
On Sat, Apr 25, 2009 at 13:24, Morten W. Petersen  wrote:
> So far, I've been contemplating disabling undo (if that's possible),

I doubt that it would make a difference. The Undo functionality comes
out of the database being logging, and changing that would mean pretty
much a complete rewrite.

> and using BTree structures, maybe segmenting objects into different groups
> (folders) to further speed up lookups.

Yes, in my experience putting small objects in to BTree structures is
quite fast. You may be talking about BTreeFolders, and in that case I
don't know, I haven't done any sort of performance testing on those, I
have used BTrees directly though, and that was fast. I haven't
compared to SQL, but others have, and ZODB itself seems according to
those tests quite fast. We know Plone slows everything down immensly
in any case.

I don't know if BTrees get slow when they get very big, so you would
need to test that.

> Should I consider using the ZCatalog for faster lookups?

Maybe. You probably need to not only store the objects in BTrees, but
also somehow have indexes. These you do by storing the values you want
to search on in BTrees as well. The ZCatalog does this in a
configurable way for you, so if you need configurability, yes. If not,
it's probably faster to make your own indexes with your own BTrees.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Building a fast, scalable yet small Zope application

2009-04-25 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Am 25.04.2009 um 13:24 schrieb Morten W. Petersen:

> Hi,
>
> I'm considering building a large scale, but small in features site.   
> It
> will contain
> lots of small objects (millions, tens of millions, hundreds of  
> millions)
> of objects,
> where each object has a couple of strings and maybe some other light
> attributes.
>
> So far, I've been contemplating disabling undo (if that's possible),  
> and
> using
> BTree structures, maybe segmenting objects into different groups
> (folders) to
> further speed up lookups.  Scalability is also an issue, should I
> consider using
> RelStorage?  Should I consider using the ZCatalog for faster lookups?


This description is pretty weak for given any kind of hint since it  
does not
contain any information about your data model etc. Did you consider
using a RDBMS? Any way...you need to provide more information.

- -aj

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkny9tMACgkQCJIWIbr9KYwvFwCfSL12AbwO1iIiwzSHewxcy6hZ
9D4AoIolVcNtpxTf0ZcbpyRyHmEUu3QX
=46wd
-END PGP SIGNATURE-
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )