How does Gemstone implement efficient querying or indexing?
[snip]
Okay, this sounds like an indexing framework built into the database
layer, something the ZODB doesn't have, but of course has been built
on top with the catalog.
pretty much.
i haven't looked into the specific details of how they wire it
altogether
but it comes down
to, gemstone is a fullstack. whether you are using the smalltalk,
java or
eventual ruby...
they write the vm which has primitives to make their ops fast, has
built in
persistence
so you just dont think about it at all. in fact, you have to ask
for a class
to not be
persistent.
Fullstack has its advantages, though also disadvantages. It means they
need to reimplement compliant interpreters for any language they want
to support, and that's going to hurt their library support. (as I
doubt arbitrary CPython extensions would work with a hypothetical
Python version of this)
indeed they would have to. disadvantage for them.
as an end user, i dont see that as really hurting me.
[snip]
theres an object store shared by all. there are multiple vms
instances
running whatever code and the
entire thing can run across multiple machines... need to scale, add
more
machines in.
This is something ZEO also provides, as far as I can grasp from your
description.
indeed it does.
i'm still digging into it all, its only been 3 weeks so i still
have a lot
of the terminology wrong etc,
but it really is a very cool product. not having to think about the
data
store is just real nice.
its all just objects and you dont have to change anything about how
you code
for them unless
you want to use indexes and then the changes are very minor.
I'd say that the ZODB by itself also doesn't put heavy requirements on
your code. The main thing is the subclassing from Persistent, and
_p_changed flags if you use non-persistent subobjects you still want
to persist.
For indexing, a framework like Zope 3 requires zero changes to the
classes themselves.
pull it back out and there it is again, object pointers fully
intact.
store
in 2 different directories, modify in one, blam! modified in the
other.
I'm not sure how this is different than using the root object to
store
objects and ZEO?
if i have customer A who has order B
and i store customer A to customer dictionary
and order B to order dictionary
then later access order B from order dictionary, modify and update
it
does ZEO update the instance of order pointed to by customer A?
I cant get it to do it. My understanding is it cant. Well, it could
but it isnt 'right out of the box' seamless.
ZEO should do just that. I understand you have an object A which has a
reference to B. You also have a dictionary that has a reference to A,
and a dictionary that has a reference to B. Both A and the dictionary
will be pointing to the same instance of B. (if A and B are both
subclasses of Persistent. If not, it might be both serialize
separately, I'm not sure).
If you do that in gemstone, there is only one copy of Order B, no
matter
what variable in what dictionary you come at it from. And its drop
dead
simple.
I looked at implementing that with zodb and moved along.
I'm confused. This has been the way the ZODB worked for a long time,
unless I'm really missing something in your description.
i tried to do this:
create customer that has order
so that i can have different extents type situations...
store customer in one dictionary.
store order in another.
if i pulled the order back out from the order dictionary and modified it
then pulled the customer out, the customers order was no longer in sync
with what came out of the order dictionary.
the reference was lost on serialization. original in memory objects
were fine,
those that came back out from zodb werent.
i'm going to quote the initial email i sent with the idea in general
and the followup i got
and i then tried it out to make sure i hadnt asked the question wrong,
and yeah...
what i wanted to do, wasnt easily done.
the quotes:
The biggest concern I have is how do to the layout/storage so that
this slightly contrived example works:
Product has a brand.
There are many brands.
How do I store so that I can find all products simply and all brands
simply and also so that changes in a brand instance are reflected when
the product instance is deserialized. By 'simply' I mean that it
doesnt really work on our end to have to walk all Products looking
for unique brands. Should just be able to go directly in and get
said brands ( using keys() or similar call ).
If I create 'brand' and 'product' as btrees, then if i do something
like
some_product.brand.name = 'something entirely different'
and that brand already exists in 'brand', would it be updated? are
references maintained in that fashion?
do we have to handle manually on update and creation?
Note that we would just be using ZODB not Zope in this scenario.
Back references are not maintained automatically.
I'd identify two classic solutions to this sort of thing.
One is to make a custom mapping (using a BTree as the inner data
structure) that maintains back-references when objects are placed in
them or removed. zope.app(.container? .folder? I'd have to look) has
code that does this, along with firing events. For simple stories
like the one you describe here, that's what I'd probably recommend.
It works to the strengths of the ZODB, which particularly shines in
terms of readability when you just need to walk a tree of attributes
to get what you want.
The other is to keep an external index, a la zc.extrinsicreference or
zc.relation.
zc.extrinsicreference does not have too many dependencies beyond ZODB,
and as long as zope.app.keyreference doesn't drag much along with it,
might be usable as a library. That said, it's also very simple, and
could be used as a model for you, even if you don't use it directly.
It would also be a reasonable choice for a simple situation like the
one you describe. It relies on events to update its data structures.
zc.relation an almost-released-revision of zc.relationship that
drastically reduces dependencies--actually, it has no additional
dependencies to ZODB, as you can see at http://svn.zope.org/zc.relation/trunk/setup.py?view=markup
. It's also a bit overwhelming and low-level: see the README:http://svn.zope.org/zc.relation/trunk/src/zc/relation/README.txt?view=auto
. It doesn't hook anything up for you: you set the relationship
catalog up and you arrange for it to be updated, via events or direct
messages. That said, if you need its power, it is well-tested and
would be a good choice for some jobs from at least some perspectives
(caveat read-or: I'm the author).
HTH
Gary
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev