Re: [Zope-dev] Catalog improvements

2001-11-29 Thread Chris Withers

Andreas Jung wrote:
 
 I think the software MG from the book Managing Gigabytes is GPLed and
 currently
 released as mg-1.21. Walking through the TOC of the book, it seems to be a
 very detailed
 sources about text processing and gives very much informations about
 different indexes types.
 But I miss some explanations about current data structures like suffix
 arrays or suffix tree
 that have several advantages for text processing compared to B-Trees.

Hmmm... looks like it's time ot go buy a book :-)

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Chris McDonough

Note that one way to get the effect of cached queries is to use a 
TopicIndex, which I believe either Andreas or Tres has implemented.  See 
http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes.  I can't find 
the actual source code, though.  Maybe either Tres or Andreas knows 
where it is?

Wolfram Kerber wrote:
 Hi
 
 No, i wasn't aware of your product :-( , the only one i found was ZOQL by
 Stephan Richter, but that didn't help much. Well, now i have written an
 implementation that reuses some of the code in TextIndex (for parenthesis
 parsing and insertion of a default operator) an then saves the query in RPN
 format (so the Catalog does't need to think that hard when being queried).
 I have taken a look at your product, and i'd say a 'new' Catalog should have
 sort of QueryParser plugins that know how to turn string-queries (as yours)
 or SQL to native Catalog queries ...
 I've also contacted the authors of the two proposals, just wasn't sure
 wether i should start this off, since i have no experience as to how the
 fishbowl works and i'm expected to finish my current project sometime soon.
 
 
 Wolfram
 
 - Original Message -
 From: Casey Duncan [EMAIL PROTECTED]
 To: Wolfram Kerber [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Tuesday, November 27, 2001 2:48 PM
 Subject: Re: [Zope-dev] Catalog improvements
 
 
 
On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:

Hi,

i'm currently working on a product that allows to attach relational
information to zope-objects. It works quite well so far, but to further
enhance it i need to make some changes to the Catalog. I could perhaps
implement it as a separate product, but i strongly feel that those

 changes
 
are best applied to the Catalog itself, as they are of general use (i
think) and involve a lot of changes to the inner workings of the

 Catalog.
 
In particular i need the following:

- named/stored queries
these are precompiled queries, so they can be executed without parsing

 and
 
are easily cacheable
i.e. similar to what is implemented in CMFTopic, but stored in the

 Catalog
 
and a bit smarter

- caching support

- unions and intersections
sub-queries (i.e. queries that are directed at a certain index) should

 be
 
more flexibly combineable

I have some code that implements this in my CatalogQuery product. It

 creates
 
a query object from a string. Presently these are not persistent, but they
could easily be made to be to create precompiled queries.

code at: http://www.zope.org/Members/Kaivo/CatalogQuery


I searched this mailing-list as well as zope.org to get an idea about

 what
 
has already been discussed and requested, and there seems to be some
interest in improving the Catalog. Some people even seem to have worked

 on
 
this, perhaps they could give an update on this? Possibly i don't have

 to
 
write everything from scratch...

I would be willing to help both in coding and getting the code put into

 the
 
Zope core.


I would have put this into a proposal, but there already are two

 proposals
 
that deal with the features i want, one is dedicated to
unions/intersections, the other (TopicIndexes) to performance issues (i
dont't know what's the status of these though, especially the first one

 is
 
rather old), and i don't want to hijack them without asking. As so often

 i
 
will need to complete my current project first, but would then like to

 help
 
in improving the Catalog for a more general use.

Possibly we need to rekindle discussion. I would suggest contacting the
authors of those proposals to see how compatible your concepts are wth
theirs. Perhaps a new proposal should be drafted with the new ideas and ty
them back to the previous ones. If there is redundancy, that can be worked
out.


So, if there is interest, i would propose to collect some ideas and
comments about how a better Catalog should look like, how it could be

 best
 
implemented and how to organize this effort (with respect to the already
existing proposals).

I am very interested in such a discussion. Let me know what I can do to

 help.
 
/---\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  [EMAIL PROTECTED]
\---/

 
 
 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists - 
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )
 


-- 
Chris McDonoughZope Corporation
http://www.zope.org http://www.zope.com
Killing hundreds of birds with thousands of stones


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org

Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung

TopicIndexes are currently available in the 'ajung-topicindex' branch and
are not yet part of the Zope core.

Andreas

- Original Message -
From: Chris McDonough [EMAIL PROTECTED]
To: Wolfram Kerber [EMAIL PROTECTED]
Cc: Casey Duncan [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, November 28, 2001 10:06
Subject: Re: [Zope-dev] Catalog improvements


 Note that one way to get the effect of cached queries is to use a
 TopicIndex, which I believe either Andreas or Tres has implemented.  See
 http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes.  I can't find
 the actual source code, though.  Maybe either Tres or Andreas knows
 where it is?

 Wolfram Kerber wrote:
  Hi
 
  No, i wasn't aware of your product :-( , the only one i found was ZOQL
by
  Stephan Richter, but that didn't help much. Well, now i have written an
  implementation that reuses some of the code in TextIndex (for
parenthesis
  parsing and insertion of a default operator) an then saves the query in
RPN
  format (so the Catalog does't need to think that hard when being
queried).
  I have taken a look at your product, and i'd say a 'new' Catalog should
have
  sort of QueryParser plugins that know how to turn string-queries (as
yours)
  or SQL to native Catalog queries ...
  I've also contacted the authors of the two proposals, just wasn't sure
  wether i should start this off, since i have no experience as to how the
  fishbowl works and i'm expected to finish my current project sometime
soon.
 
 
  Wolfram
 
  - Original Message -
  From: Casey Duncan [EMAIL PROTECTED]
  To: Wolfram Kerber [EMAIL PROTECTED]; [EMAIL PROTECTED]
  Sent: Tuesday, November 27, 2001 2:48 PM
  Subject: Re: [Zope-dev] Catalog improvements
 
 
 
 On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
 
 Hi,
 
 i'm currently working on a product that allows to attach relational
 information to zope-objects. It works quite well so far, but to further
 enhance it i need to make some changes to the Catalog. I could perhaps
 implement it as a separate product, but i strongly feel that those
 
  changes
 
 are best applied to the Catalog itself, as they are of general use (i
 think) and involve a lot of changes to the inner workings of the
 
  Catalog.
 
 In particular i need the following:
 
 - named/stored queries
 these are precompiled queries, so they can be executed without parsing
 
  and
 
 are easily cacheable
 i.e. similar to what is implemented in CMFTopic, but stored in the
 
  Catalog
 
 and a bit smarter
 
 - caching support
 
 - unions and intersections
 sub-queries (i.e. queries that are directed at a certain index) should
 
  be
 
 more flexibly combineable
 
 I have some code that implements this in my CatalogQuery product. It
 
  creates
 
 a query object from a string. Presently these are not persistent, but
they
 could easily be made to be to create precompiled queries.
 
 code at: http://www.zope.org/Members/Kaivo/CatalogQuery
 
 
 I searched this mailing-list as well as zope.org to get an idea about
 
  what
 
 has already been discussed and requested, and there seems to be some
 interest in improving the Catalog. Some people even seem to have worked
 
  on
 
 this, perhaps they could give an update on this? Possibly i don't have
 
  to
 
 write everything from scratch...
 
 I would be willing to help both in coding and getting the code put into
 
  the
 
 Zope core.
 
 
 I would have put this into a proposal, but there already are two
 
  proposals
 
 that deal with the features i want, one is dedicated to
 unions/intersections, the other (TopicIndexes) to performance issues (i
 dont't know what's the status of these though, especially the first one
 
  is
 
 rather old), and i don't want to hijack them without asking. As so
often
 
  i
 
 will need to complete my current project first, but would then like to
 
  help
 
 in improving the Catalog for a more general use.
 
 Possibly we need to rekindle discussion. I would suggest contacting the
 authors of those proposals to see how compatible your concepts are wth
 theirs. Perhaps a new proposal should be drafted with the new ideas and
ty
 them back to the previous ones. If there is redundancy, that can be
worked
 out.
 
 
 So, if there is interest, i would propose to collect some ideas and
 comments about how a better Catalog should look like, how it could be
 
  best
 
 implemented and how to organize this effort (with respect to the
already
 existing proposals).
 
 I am very interested in such a discussion. Let me know what I can do to
 
  help.
 
 /---\
   Casey Duncan, Sr. Web Developer
   National Legal Aid and Defender Association
   [EMAIL PROTECTED]
 \---/
 
 
 
  ___
  Zope-Dev maillist  -  [EMAIL PROTECTED]
  http://lists.zope.org/mailman/listinfo/zope-dev
  **  No cross posts or HTML encoding!  **
  (Related

Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Chris Withers

Matt Hamilton wrote:
 
 I would like in on that too :)  About a year or so ago I was working on a
 full-text indexing system for indexing several gigabytes of text (mailing
 list archives).  Most of it was written in C and uses quite a lot of cool
 algorithms from various information retrieval papers and books.  I have
 been hoping to have the time to take parts of it and work it into the new
 PluginIndex architecture.  The existing code uses BerkeleyDB files to hold
 the index structures, but I would like to use ZODB instead to give it a
 bit more modularity.

Hi Matt,

Are any of these algorithms publicly available? I'd be _very_ interested in them
:-)

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung


- Original Message -
From: Chris Withers [EMAIL PROTECTED]
To: Matt Hamilton [EMAIL PROTECTED]
Cc: Casey Duncan [EMAIL PROTECTED]; Steve Alexander
[EMAIL PROTECTED]; Wolfram Kerber [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, November 28, 2001 09:27
Subject: Re: [Zope-dev] Catalog improvements


 Matt Hamilton wrote:
 
  I would like in on that too :)  About a year or so ago I was working on
a
  full-text indexing system for indexing several gigabytes of text
(mailing
  list archives).  Most of it was written in C and uses quite a lot of
cool
  algorithms from various information retrieval papers and books.  I have
  been hoping to have the time to take parts of it and work it into the
new
  PluginIndex architecture.  The existing code uses BerkeleyDB files to
hold
  the index structures, but I would like to use ZODB instead to give it a
  bit more modularity.

 Hi Matt,

 Are any of these algorithms publicly available? I'd be _very_ interested
in them
 :-)


I think the software MG from the book Managing Gigabytes is GPLed and
currently
released as mg-1.21. Walking through the TOC of the book, it seems to be a
very detailed
sources about text processing and gives very much informations about
different indexes types.
But I miss some explanations about current data structures like suffix
arrays or suffix tree
that have several advantages for text processing compared to B-Trees.

Andreas

-
   -Andreas JungZope Corporation   -
  -   EMail: [EMAIL PROTECTED]http://www.zope.com  -
 -  Python Powered   http://www.python.org -
  -   Makers of Zope   http://www.zope.org  -
   -  Life is a fulltime occupation  -
-




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Matt Hamilton

On Wed, 28 Nov 2001, Andreas Jung wrote:

 I think the software MG from the book Managing Gigabytes is GPLed and
 currently
 released as mg-1.21. Walking through the TOC of the book, it seems to be a
 very detailed
 sources about text processing and gives very much informations about
 different indexes types.
 But I miss some explanations about current data structures like suffix
 arrays or suffix tree
 that have several advantages for text processing compared to B-Trees.

Suffix Trees/Tries take up a *lot* of space.  But they are very fast, and
useful for searching for substrings.  The main gist of the stuff in
'Managing Gigabytes' is that it is possible to store an ascending list of
integers in a compressed form, such that on average each integer requires
only 4 bits to represent it.  This is obviously much more compact than a
straight list of 32 or 64 bit integers/longs (plus any overhead python
adds to its inbuild list type).  The other point is that you can read and
decode the lists very quickly (you don't need to decompress the entire
list first before reading it).  Also consecutive numbers only take 1 bit
of storage, this means that 'stopwords' that are normally omitted from
indexes due to their very high frequency (and hence bloat of the index)
can be stored very efficiently.

One problem is that all of the research done in MG is based on much older
hardware than is currently availible and they try to make certain
optimisations, which nowadays don't save much time.

-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-28 Thread Andreas Jung


- Original Message -
From: Matt Hamilton [EMAIL PROTECTED]
To: Andreas Jung [EMAIL PROTECTED]
Cc: Chris Withers [EMAIL PROTECTED]; Casey Duncan
[EMAIL PROTECTED]; Steve Alexander [EMAIL PROTECTED]; Wolfram
Kerber [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, November 28, 2001 09:55
Subject: Re: [Zope-dev] Catalog improvements


 On Wed, 28 Nov 2001, Andreas Jung wrote:

  I think the software MG from the book Managing Gigabytes is GPLed
and
  currently
  released as mg-1.21. Walking through the TOC of the book, it seems to be
a
  very detailed
  sources about text processing and gives very much informations about
  different indexes types.
  But I miss some explanations about current data structures like suffix
  arrays or suffix tree
  that have several advantages for text processing compared to B-Trees.

 Suffix Trees/Tries take up a *lot* of space.  But they are very fast, and
 useful for searching for substrings.

Usually four times the amount of the data to be indexed ;-)

Andreas


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Casey Duncan

On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
 Hi,

 i'm currently working on a product that allows to attach relational
 information to zope-objects. It works quite well so far, but to further
 enhance it i need to make some changes to the Catalog. I could perhaps
 implement it as a separate product, but i strongly feel that those changes
 are best applied to the Catalog itself, as they are of general use (i
 think) and involve a lot of changes to the inner workings of the Catalog.
 In particular i need the following:

 - named/stored queries
 these are precompiled queries, so they can be executed without parsing and
 are easily cacheable
 i.e. similar to what is implemented in CMFTopic, but stored in the Catalog
 and a bit smarter

 - caching support

 - unions and intersections
 sub-queries (i.e. queries that are directed at a certain index) should be
 more flexibly combineable

I have some code that implements this in my CatalogQuery product. It creates 
a query object from a string. Presently these are not persistent, but they 
could easily be made to be to create precompiled queries.

code at: http://www.zope.org/Members/Kaivo/CatalogQuery


 I searched this mailing-list as well as zope.org to get an idea about what
 has already been discussed and requested, and there seems to be some
 interest in improving the Catalog. Some people even seem to have worked on
 this, perhaps they could give an update on this? Possibly i don't have to
 write everything from scratch...

I would be willing to help both in coding and getting the code put into the 
Zope core.

 I would have put this into a proposal, but there already are two proposals
 that deal with the features i want, one is dedicated to
 unions/intersections, the other (TopicIndexes) to performance issues (i
 dont't know what's the status of these though, especially the first one is
 rather old), and i don't want to hijack them without asking. As so often i
 will need to complete my current project first, but would then like to help
 in improving the Catalog for a more general use.

Possibly we need to rekindle discussion. I would suggest contacting the 
authors of those proposals to see how compatible your concepts are wth 
theirs. Perhaps a new proposal should be drafted with the new ideas and ty 
them back to the previous ones. If there is redundancy, that can be worked 
out.


 So, if there is interest, i would propose to collect some ideas and
 comments about how a better Catalog should look like, how it could be best
 implemented and how to organize this effort (with respect to the already
 existing proposals).

I am very interested in such a discussion. Let me know what I can do to help.

/---\
  Casey Duncan, Sr. Web Developer
  National Legal Aid and Defender Association
  [EMAIL PROTECTED]
\---/

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Steve Alexander

Casey Duncan wrote:

 
 I have some code that implements this in my CatalogQuery product. It creates 
 a query object from a string. Presently these are not persistent, but they 
 could easily be made to be to create precompiled queries.
 
 code at: http://www.zope.org/Members/Kaivo/CatalogQuery


Casey, did you get a chance to look at my patches for adding an extended 
uniqueValues method to CatalogQuery?

 
 I would be willing to help both in coding and getting the code put into the 
 Zope core.


raises hand me too!


 
So, if there is interest, i would propose to collect some ideas and
comments about how a better Catalog should look like, how it could be best
implemented and how to organize this effort (with respect to the already
existing proposals).
 
 I am very interested in such a discussion. Let me know what I can do to help.


I'm interested in this too, and I'm keen to get a solution that will 
work with just the ZODB, without needing all of Zope.


--
Steve Alexander
Software Engineer
Cat-Box limited



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Andreas Jung

Is this code available for public ?

Andreas
- Original Message -
From: Matt Hamilton [EMAIL PROTECTED]
To: Casey Duncan [EMAIL PROTECTED]
Cc: Steve Alexander [EMAIL PROTECTED]; Wolfram Kerber
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, November 27, 2001 10:06
Subject: Re: [Zope-dev] Catalog improvements


 On Tue, 27 Nov 2001, Casey Duncan wrote:

   I'm interested in this too, and I'm keen to get a solution that will
   work with just the ZODB, without needing all of Zope.
 
  Yes, I second, third and forth that motion. I have a bunch of ideas
kicking
  around for ZODB-level indexing. Let's talk more. Perhaps we should
arrange an
  indexing and catalog chat on #zope.

 I would like in on that too :)  About a year or so ago I was working on a
 full-text indexing system for indexing several gigabytes of text (mailing
 list archives).  Most of it was written in C and uses quite a lot of cool
 algorithms from various information retrieval papers and books.  I have
 been hoping to have the time to take parts of it and work it into the new
 PluginIndex architecture.  The existing code uses BerkeleyDB files to hold
 the index structures, but I would like to use ZODB instead to give it a
 bit more modularity.



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Matt Hamilton

On Tue, 27 Nov 2001, Andreas Jung wrote:

 Is this code available for public ?

Sort of :)  It used to be around, but the server with it on is currently
offline and in need of a new disk controller, so it is not to hand.  It is
also poorly commented :( and written in very highly optimised (read:
illegible) C.

The main bits needed from it are the routines to store an retrieve
compressed lists of ascending integers (ie. used in indexes).  I want to
write a python wrapper around them and release a list-like python data
structure that will allow efficient storage of indexes.  The other bit is
the code for doing the cosine ranking similarity comparison in order to
rank the documents in order of relevance to a query.

Most of the code is taken from the book/code 'Managing Gigabytes'
by Witten, Moffat  Bell (http://www.cs.mu.OZ.AU/mg/)  The code is quite
old now (1999) and designed for quite large systems, or reletively static
text (ie. doesn't do incremental indexing very well).  I worked on
developing a 'forward' index which could be easily updated, and then
inverted quite quickly on a regular basis (since it didn't need to parse
the source text again).


-Matt

-- 
Matt Hamilton [EMAIL PROTECTED]
Netsight Internet Solutions, Ltd.  Business Vision on the Internet
http://www.netsight.co.uk   +44 (0)117 9090901
Web Hosting | Web Design  | Domain Names  |  Co-location  | DB Integration



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-27 Thread Wolfram Kerber

Hi

No, i wasn't aware of your product :-( , the only one i found was ZOQL by
Stephan Richter, but that didn't help much. Well, now i have written an
implementation that reuses some of the code in TextIndex (for parenthesis
parsing and insertion of a default operator) an then saves the query in RPN
format (so the Catalog does't need to think that hard when being queried).
I have taken a look at your product, and i'd say a 'new' Catalog should have
sort of QueryParser plugins that know how to turn string-queries (as yours)
or SQL to native Catalog queries ...
I've also contacted the authors of the two proposals, just wasn't sure
wether i should start this off, since i have no experience as to how the
fishbowl works and i'm expected to finish my current project sometime soon.


Wolfram

- Original Message -
From: Casey Duncan [EMAIL PROTECTED]
To: Wolfram Kerber [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, November 27, 2001 2:48 PM
Subject: Re: [Zope-dev] Catalog improvements


 On Tuesday 20 November 2001 05:35 pm, Wolfram Kerber allegedly wrote:
  Hi,
 
  i'm currently working on a product that allows to attach relational
  information to zope-objects. It works quite well so far, but to further
  enhance it i need to make some changes to the Catalog. I could perhaps
  implement it as a separate product, but i strongly feel that those
changes
  are best applied to the Catalog itself, as they are of general use (i
  think) and involve a lot of changes to the inner workings of the
Catalog.
  In particular i need the following:
 
  - named/stored queries
  these are precompiled queries, so they can be executed without parsing
and
  are easily cacheable
  i.e. similar to what is implemented in CMFTopic, but stored in the
Catalog
  and a bit smarter
 
  - caching support
 
  - unions and intersections
  sub-queries (i.e. queries that are directed at a certain index) should
be
  more flexibly combineable

 I have some code that implements this in my CatalogQuery product. It
creates
 a query object from a string. Presently these are not persistent, but they
 could easily be made to be to create precompiled queries.

 code at: http://www.zope.org/Members/Kaivo/CatalogQuery

 
  I searched this mailing-list as well as zope.org to get an idea about
what
  has already been discussed and requested, and there seems to be some
  interest in improving the Catalog. Some people even seem to have worked
on
  this, perhaps they could give an update on this? Possibly i don't have
to
  write everything from scratch...

 I would be willing to help both in coding and getting the code put into
the
 Zope core.

  I would have put this into a proposal, but there already are two
proposals
  that deal with the features i want, one is dedicated to
  unions/intersections, the other (TopicIndexes) to performance issues (i
  dont't know what's the status of these though, especially the first one
is
  rather old), and i don't want to hijack them without asking. As so often
i
  will need to complete my current project first, but would then like to
help
  in improving the Catalog for a more general use.

 Possibly we need to rekindle discussion. I would suggest contacting the
 authors of those proposals to see how compatible your concepts are wth
 theirs. Perhaps a new proposal should be drafted with the new ideas and ty
 them back to the previous ones. If there is redundancy, that can be worked
 out.

 
  So, if there is interest, i would propose to collect some ideas and
  comments about how a better Catalog should look like, how it could be
best
  implemented and how to organize this effort (with respect to the already
  existing proposals).

 I am very interested in such a discussion. Let me know what I can do to
help.

 /---\
   Casey Duncan, Sr. Web Developer
   National Legal Aid and Defender Association
   [EMAIL PROTECTED]
 \---/


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-21 Thread Jeffrey P Shell


On Tuesday, November 20, 2001, at 03:35  PM, Wolfram Kerber wrote:

 Hi,

 i'm currently working on a product that allows to attach relational
 information to zope-objects. It works quite well so far, but to further
 enhance it i need to make some changes to the Catalog. I could perhaps
 implement it as a separate product, but i strongly feel that those 
 changes
 are best applied to the Catalog itself, as they are of general use 
 (i think)
 and involve a lot of changes to the inner workings of the Catalog. In
 particular i need the following:

 - named/stored queries
 these are precompiled queries, so they can be executed without 
 parsing and
 are easily cacheable
 i.e. similar to what is implemented in CMFTopic, but stored in the 
 Catalog
 and a bit smarter

There used to be something like this in ZTables/Tabula (a Zope 1.x 
product that was sort of the genesis of the Catalog, for better or 
worse) called 'Hierarchies'.  Hierarchies were actually indexes (I 
think the current Keyword index is descended from the Keyword 
Hierarchy).

I don't know what happened to that code.  If it's not available, 
you could probably achieve the effect that you're looking for here 
with PluginIndexes, which wouldn't require changing the Catalog at 
all.  Just write a Query Index that indexes objects that match 
its pre-cooked Query.  This would speed up searching tremendously, 
but you could take a big hit at indexing time if you have many of 
them.

Jeffrey P Shell, [EMAIL PROTECTED]


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] Catalog improvements

2001-11-21 Thread Wolfram Kerber


- Original Message -
From: Jeffrey P Shell [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, November 21, 2001 7:38 PM
Subject: Re: [Zope-dev] Catalog improvements



 On Tuesday, November 20, 2001, at 03:35  PM, Wolfram Kerber wrote:

  Hi,
 
  i'm currently working on a product that allows to attach relational
  information to zope-objects. It works quite well so far, but to further
  enhance it i need to make some changes to the Catalog. I could perhaps
  implement it as a separate product, but i strongly feel that those
  changes
  are best applied to the Catalog itself, as they are of general use
  (i think)
  and involve a lot of changes to the inner workings of the Catalog. In
  particular i need the following:
 
  - named/stored queries
  these are precompiled queries, so they can be executed without
  parsing and
  are easily cacheable
  i.e. similar to what is implemented in CMFTopic, but stored in the
  Catalog
  and a bit smarter

 There used to be something like this in ZTables/Tabula (a Zope 1.x
 product that was sort of the genesis of the Catalog, for better or
 worse) called 'Hierarchies'.  Hierarchies were actually indexes (I
 think the current Keyword index is descended from the Keyword
 Hierarchy).

 I don't know what happened to that code.  If it's not available,
 you could probably achieve the effect that you're looking for here
 with PluginIndexes

I think your right. Indexes also have a management interface that could be
used to define the query. It could result in a nesting problem however, if
'QueryIndexes' rely on each others results (that they should be able to). I
would possibly need a management view that shows the hirarchical structure
of the Indexes, but it can be merely that, a view.
I'll try this out...

, which wouldn't require changing the Catalog at all.

I'd say, if  i would _not_ store the result of the query and just delegate
to other indexes this would be true, otherwise i would need some notify
mechanism to tell if my result is affected by an indexing call, and/or at
least be notified when the call is over so i can update the result by
issuing a query, but the latter would mean to 'take the big hit' as you
mentioned, wich i think isn't acceptable.

 Just write a Query Index that indexes objects that match
 its pre-cooked Query.  This would speed up searching tremendously,
 but you could take a big hit at indexing time if you have many of
 them.

 Jeffrey P Shell, [EMAIL PROTECTED]

thanks,

Wolfram


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )