subject:"RE\: PossitionIndex \(was\: Re\: \[Zope\-dev\] ZCatalog phrase indexingrevisited\)"

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-20 Thread Erik Enge


On Tue, 19 Jun 2001, Chris Withers wrote:

> I'm guessing this is the point at which your problems become mine? ;-)

*evil laughter*  Yes :-)

We should write about it and publish it to the community...


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-19 Thread Chris Withers

> On Mon, 18 Jun 2001, Andreas Jung wrote:
> 
> > These are good ideas to improve the TextIndex. I already encouraged
> > Erik to put alltogether into a Fishbowl proposal,
> 
> Which I would do, if I had time.  Which I will have, but not for another
> two weeks. :-)

I'm guessing this is the point at which your problems become mine? ;-)

*grinz*

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-19 Thread Erik Enge

On Mon, 18 Jun 2001, Andreas Jung wrote:

> These are good ideas to improve the TextIndex. I already encouraged
> Erik to put alltogether into a Fishbowl proposal,

Which I would do, if I had time.  Which I will have, but not for another
two weeks. :-)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-19 Thread Rik Hoekstra

> 
> Rik Hoekstra writes:
>  > This raises the question how dependent the splitter on the paticularities of the
>  > document source - I do not really see how different splitters could be useful
>  > for one single document. This is perhaps less obvious than it appears, as you
>  > may want to use different splitters for documents in different languages. Taken
>  > as a whole I would say choosing a splitter would be a decision that had to be
>  > taken at indexing time anyway. But perhaps it's just my imagination that is

> 
> Of couse, the search must follow the same splitting rules
> than the indexing did. Changing the rules (the splitter
> or its configuration) after indexing will make the index
> inconsistent.
> 

I agree; in fact I think we're saying the same. What is more interesting, is how
(less than when) you decide to use which splitter. With heterogeneous documents
I'd think it would be difficult to decide automagically...

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Andreas Jung


These are good ideas to improve the TextIndex. I already encouraged Erik
to put alltogether into a Fishbowl proposal,

Andreas
- Original Message -
From: "Dieter Maurer" <[EMAIL PROTECTED]>
To: "Rik Hoekstra" <[EMAIL PROTECTED]>
Cc: "Chris McDonough" <[EMAIL PROTECTED]>; "Erik Enge"
<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, June 18, 2001 4:59 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


> Rik Hoekstra writes:
>  > This raises the question how dependent the splitter on the
paticularities of the
>  > document source - I do not really see how different splitters could be
useful
>  > for one single document. This is perhaps less obvious than it appears,
as you
>  > may want to use different splitters for documents in different
languages. Taken
>  > as a whole I would say choosing a splitter would be a decision that had
to be
>  > taken at indexing time anyway. But perhaps it's just my imagination
that is
>  > lacking.
> There are lots of things you may want to change based on
> experience with your index:
>
>   *  change the set of token boundary characters
>  they define, where words are broken out.
>
>   *  change the set of removed characters
>  they are removed from the words, usually for
>  normalization.
>
>  In German, e.g., you can write both "Auto-Lackierer"
>  and "Autolackierer". You want to normalize
>  these different spellings.
>
>   *  change the set of "composing" characters
>
>  German is very rich in composite terms.
>  You may want to index under each component term.
>  For this, you need the rules on how the composition
>  is build.
>  For text, it is usually '-'. But if you have
>  computer sources, '_' or ':' may be relevant, too.
>
> Of couse, the search must follow the same splitting rules
> than the indexing did. Changing the rules (the splitter
> or its configuration) after indexing will make the index
> inconsistent.
>
>
> Dieter
>
> ___
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Dieter Maurer

Rik Hoekstra writes:
 > This raises the question how dependent the splitter on the paticularities of the
 > document source - I do not really see how different splitters could be useful
 > for one single document. This is perhaps less obvious than it appears, as you
 > may want to use different splitters for documents in different languages. Taken
 > as a whole I would say choosing a splitter would be a decision that had to be
 > taken at indexing time anyway. But perhaps it's just my imagination that is
 > lacking. 
There are lots of things you may want to change based on
experience with your index:

  *  change the set of token boundary characters
 they define, where words are broken out.

  *  change the set of removed characters
 they are removed from the words, usually for
 normalization.

 In German, e.g., you can write both "Auto-Lackierer"
 and "Autolackierer". You want to normalize
 these different spellings.

  *  change the set of "composing" characters

 German is very rich in composite terms.
 You may want to index under each component term.
 For this, you need the rules on how the composition
 is build.
 For text, it is usually '-'. But if you have
 computer sources, '_' or ':' may be relevant, too.

Of couse, the search must follow the same splitting rules
than the indexing did. Changing the rules (the splitter
or its configuration) after indexing will make the index
inconsistent.

Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Andreas Jung


The Splitter interface is not really document. However Zope 2.4
has a much better support for 3rd party splitters.

Andreas
- Original Message -
From: "R. David Murray " <[EMAIL PROTECTED]>
To: "Chris McDonough" <[EMAIL PROTECTED]>
Cc: "Erik Enge" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, June 18, 2001 11:39 AM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


> On Sun, 17 Jun 2001, Chris McDonough wrote:
> > index_object, because the splitter return has all the words
> > in order, even the dupes... as you iterate, you can mutate
>
> Is this part of the current formal Splitter Interface? If not,
> it needs to be if other code is going to depend on it.
>
> Oh, yeah, and where is the formal Splitter interface documented ?
> I don't see anything in SearchIndex, and a search for "splitter interface"
> on zope.org didn't turn up anything useful.
>
> --RDM
>
>
> ___
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
>


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread R. David Murray

On Sun, 17 Jun 2001, Chris McDonough wrote:
> index_object, because the splitter return has all the words
> in order, even the dupes... as you iterate, you can mutate

Is this part of the current formal Splitter Interface? If not,
it needs to be if other code is going to depend on it.

Oh, yeah, and where is the formal Splitter interface documented ?
I don't see anything in SearchIndex, and a search for "splitter interface"
on zope.org didn't turn up anything useful.

--RDM

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Rik Hoekstra


> 
> > Once you're satisfied with the implementation, would you be willing
> > submit the module to the collector?
> 
> Do you think you (or someone else for that matter) could have a look at
> [1] the method that returns the position in the document - positionInDoc()
> - to how that could be made to run much faster?  Maybe it is how it
> used...  It is too slow to be very useful when indexing large amounts of
> data.
> 
> Anyway, I suck at making Python fast (or using it the right way, which
> ever I've fallen pray for this time ;-), and any hints would be greatly
> appretiated.
> 
> I've been indexing and searching a lot this weekend, and bar that problem
> with the indexing-speed it seems ok and I have no issues submitting it to
> the Collector.
> 
Doing something similar (in fact what I needed was citations of word usage) I
took a two step approach, with the idea that most of the actual returning of
results would have to be done on a much smaller subset of documents than if
you'd have to index all documents with word indexes and positions.

I use a normal textindex for querying. Then if a document is returned by the
query I start processing the documents. This requires parsing the query in a
slightly different way (throw out the NOTs). The two step approach has the
advantage that you can postpone processing actual documents until you return the
results for the specific documents. 

Using your positionInDoc will require a _lot_ of processing (why does it use
string.split btw and not Splitter?; why split on " " and not on
string.whitespace?). I have used string.find for finding word positions, which
is probably faster than looping a list of words. BTW, I'd rather use Splitter,
but word positions appeared not to be reliable (bug, or something I didn't
understand; anyhow, string.find works for me and is fast)

def splitit(txt, word):
postions = []
start = 0
while 1:
  res = string.find(txt, word, start)
  if res is -1:
  break
  else:
  start = res+1
  postions.append(res)
return postions


Perhaps using re would perhaps also be an option, but allowing regular
expressions will complicate searching a lot, so I use globbing lexicon for
expanding and then do the matching on the expanded items (if necessary - not if
using [wordpart]*)

Advantages of using this approach:
- it's faster. 
- it splits up the query processing part in different subparts which also
contributes to speeding things up. 
- it's also more flexible, as you can divide searching and parsing over
different webrequests, and even make them dependend on the number of results.
For example: why return text fragments from all documents if your users will not
be able to see all the results anyway. Or why return all fragments containing
word combinations from one single document while returning a few occurrences
from different documents is more useful for your users. Note that this will
mainly affect returning text fragments, which may or may not be useful.

There's also a couple of disadvantages (as I see them , but there may be more):
- it only works with exact word positions and not numbers in a text. The within
two words approach may be remedied by using string.split on substrings however
if really needed. Depending on you purposes an even rougher approach is by
taking some default length for words (this is a bit faster). These are not very
elegant solutions, though.
- because of an approach that is not so coupled with (Z)Catalog, integration
strategies are less obvious (at least for me)
- the positionIndex might be used for further processing as is, in my approach
this is less obvious.


another 2 cents

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-18 Thread Rik Hoekstra

Chris McDonough wrote:
> 
> It just occurred to me that depending on the splitter to do
> positions makes it impossible to alter the splitter without
> reindexing the whole text index... but I think this is a
> reasonable tradeoff.  Other opinions welcome.
> 

This raises the question how dependent the splitter on the paticularities of the
document source - I do not really see how different splitters could be useful
for one single document. This is perhaps less obvious than it appears, as you
may want to use different splitters for documents in different languages. Taken
as a whole I would say choosing a splitter would be a decision that had to be
taken at indexing time anyway. But perhaps it's just my imagination that is
lacking. 

There is a much greater dependence on the lexicon here. And indeed several
different lexicons could be applied to a set of documents depending of what is
wanted. 

my 2 cents

Rik

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Dieter Maurer


[EMAIL PROTECTED] writes:
 > A lot of folks who do "power searches," say, librarians or other trained
 > researchers, familiar with the bells and whistles of more powerful search
 > engines, will want a simple operator for proximity, with the ability to
 > specify proximity depth:
 > 
 > For example:
 > 
 > Lexis-Nexis: Sean w/2 Upton  (where w/2 is within 2 words)
 >  Also, lexis doesn't count stop-words in proximity
 > indexes.
 > Folio/Nextpage:  "Sean Upton"@2
 > 
 > IMHO, the syntax is clean and very brief in the Lexis-Nexis case and should
 > suppliment a more generic 
 >  Sean ... Upton
 > style search.
I do not think, it is a good idea to have an infix operator
for proximity searches. This combines just 2 words but
proximity searches may involve more than two words:
a set of words, near together (e.g. in one paragraph, sentence,
within x words).


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Chris McDonough

On Sun, 17 Jun 2001 21:05:47 +0200 (CEST)
 Erik Enge <[EMAIL PROTECTED]> wrote:
> On Fri, 15 Jun 2001, Chris McDonough wrote:
> 
> > Once you're satisfied with the implementation, would
> you be willing
> > submit the module to the collector?
> 
> Do you think you (or someone else for that matter) could
> have a look at
> [1] the method that returns the position in the document
> - positionInDoc()
> - to how that could be made to run much faster?  Maybe it
> is how it
> used...  It is too slow to be very useful when indexing
> large amounts of
> data.

Erik,

It looks like you call proximityInsert for each item
returned from the splitter on the doc source.  Instead of
looking for the position in the source document by splitting
the source up again within proximityInsert, you can keep a
simple counter while you iterate over the splitter return in
index_object, because the splitter return has all the words
in order, even the dupes... as you iterate, you can mutate
the position entry for that word/documentId pair within
proximityInsert.  You never actually need to manually split
the document source, instead just always rely on the
splitter to bust up the doc, and manipulate the position
list in place.  This is not the most efficient way, but it's
more efficient than your current way.

Therefore, the bit in index_object becomes:

i = 0
for word in splitter(source):   
self.proximityInsert(word, documentId, i)
i = i + 1

The proximityInsert method becomes:

def proximityInsert(self, word, documentId, i):
"""Insert proximity information about this wid (word id)
in
the index' proximity bucket."""
wid=self.getWid(word)
prox=self._proximity
if not prox.has_key(wid):
prox[wid]=IOBTree()
prox[wid][documentId]=[i]
self._p_changed = 1
else:
if i in prox[wid][documentId]: return
prox[wid][documentId].append(i)
self._p_changed = 1

.. and the positionInDoc method goes away.

I didn't scan too hard for what else in the source this
would break.

> Anyway, I suck at making Python fast (or using it the
> right way, which
> ever I've fallen pray for this time ;-), and any hints
> would be greatly
> appretiated.
> 
> I've been indexing and searching a lot this weekend, and
> bar that problem
> with the indexing-speed it seems ok and I have no issues
> submitting it to
> the Collector.

Cool...

> 
> [1] http://nittin.net/erik/software/PositionIndex/PositionIndex.py>
> 

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Chris McDonough


It just occurred to me that depending on the splitter to do
positions makes it impossible to alter the splitter without
reindexing the whole text index... but I think this is a
reasonable tradeoff.  Other opinions welcome.

On Sun, 17 Jun 2001 15:57:20 -0400
 "Chris McDonough" <[EMAIL PROTECTED]> wrote:
> On Sun, 17 Jun 2001 21:05:47 +0200 (CEST)
>  Erik Enge <[EMAIL PROTECTED]> wrote:
> > On Fri, 15 Jun 2001, Chris McDonough wrote:
> > 
> > > Once you're satisfied with the implementation, would
> > you be willing
> > > submit the module to the collector?
> > 
> > Do you think you (or someone else for that matter)
> could
> > have a look at
> > [1] the method that returns the position in the
> document
> > - positionInDoc()
> > - to how that could be made to run much faster?  Maybe
> it
> > is how it
> > used...  It is too slow to be very useful when indexing
> > large amounts of
> > data.
> 
> Erik,
> 
> It looks like you call proximityInsert for each item
> returned from the splitter on the doc source.  Instead of
> looking for the position in the source document by
> splitting
> the source up again within proximityInsert, you can keep
> a
> simple counter while you iterate over the splitter return
> in
> index_object, because the splitter return has all the
> words
> in order, even the dupes... as you iterate, you can
> mutate
> the position entry for that word/documentId pair within
> proximityInsert.  You never actually need to manually
> split
> the document source, instead just always rely on the
> splitter to bust up the doc, and manipulate the position
> list in place.  This is not the most efficient way, but
> it's
> more efficient than your current way.
> 
> Therefore, the bit in index_object becomes:
> 
> i = 0
> for word in splitter(source): 
> self.proximityInsert(word, documentId, i)
> i = i + 1
> 
> The proximityInsert method becomes:
> 
> def proximityInsert(self, word, documentId, i):
> """Insert proximity information about this wid (word
> id)
> in
> the index' proximity bucket."""
> wid=self.getWid(word)
> prox=self._proximity
> if not prox.has_key(wid):
> prox[wid]=IOBTree()
> prox[wid][documentId]=[i]
> self._p_changed = 1
> else:
> if i in prox[wid][documentId]: return
> prox[wid][documentId].append(i)
> self._p_changed = 1
> 
> .. and the positionInDoc method goes away.
> 
> I didn't scan too hard for what else in the source this
> would break.
> 
> > Anyway, I suck at making Python fast (or using it the
> > right way, which
> > ever I've fallen pray for this time ;-), and any hints
> > would be greatly
> > appretiated.
> > 
> > I've been indexing and searching a lot this weekend,
> and
> > bar that problem
> > with the indexing-speed it seems ok and I have no
> issues
> > submitting it to
> > the Collector.
> 
> Cool...
> 
> > 
> > [1] http://nittin.net/erik/software/PositionIndex/PositionIndex.py>
> > 
> 


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Erik Enge

On Fri, 15 Jun 2001, Chris McDonough wrote:

> Once you're satisfied with the implementation, would you be willing
> submit the module to the collector?

Do you think you (or someone else for that matter) could have a look at
[1] the method that returns the position in the document - positionInDoc()
- to how that could be made to run much faster?  Maybe it is how it
used...  It is too slow to be very useful when indexing large amounts of
data.

Anyway, I suck at making Python fast (or using it the right way, which
ever I've fallen pray for this time ;-), and any hints would be greatly
appretiated.

I've been indexing and searching a lot this weekend, and bar that problem
with the indexing-speed it seems ok and I have no issues submitting it to
the Collector.

[1] http://nittin.net/erik/software/PositionIndex/PositionIndex.py>

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-17 Thread Erik Enge


On Sat, 16 Jun 2001 [EMAIL PROTECTED] wrote:

> Lexis-Nexis:  Sean w/2 Upton  (where w/2 is within 2 words)

This wouldn't be hard to make happen.  I don't know if it is better to do
it before of after the parsers, though.  Maybe a more userfriendly alias
would be best as a default?



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-16 Thread sean . upton

A lot of folks who do "power searches," say, librarians or other trained
researchers, familiar with the bells and whistles of more powerful search
engines, will want a simple operator for proximity, with the ability to
specify proximity depth:

For example:

Lexis-Nexis:Sean w/2 Upton  (where w/2 is within 2 words)
Also, lexis doesn't count stop-words in proximity
indexes.
Folio/Nextpage: "Sean Upton"@2

IMHO, the syntax is clean and very brief in the Lexis-Nexis case and should
suppliment a more generic 
Sean ... Upton
style search.

Sean

-Original Message-
From: Chris McDonough [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 16, 2001 2:59 AM
To: Erik Enge
Cc: [EMAIL PROTECTED]
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)

Erik Enge wrote:
> 
> On Fri, 15 Jun 2001, Chris McDonough wrote:
> 
> > Once you're satisfied with the implementation, would you be willing
> > submit the module to the collector?
> 
> Will do.  Have you thought about how users actually are to use
> exact-phrase?  What I'm thinking I will do here (currently I've only been
> testing explicitly with "adjoinedby" in the query) is to insert
> "adjoinedby" in phrased searches:
> 
> "erik enge"-> erik adjoinedby enge
> erik ... enge  -> erik near enge
> 
> What do you think?

These both look like good spellings, and I think "erik near enge" would
be a good alias for "erik ... enge" as well..

- C

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-16 Thread Chris McDonough

Erik Enge wrote:
> 
> On Fri, 15 Jun 2001, Chris McDonough wrote:
> 
> > Once you're satisfied with the implementation, would you be willing
> > submit the module to the collector?
> 
> Will do.  Have you thought about how users actually are to use
> exact-phrase?  What I'm thinking I will do here (currently I've only been
> testing explicitly with "adjoinedby" in the query) is to insert
> "adjoinedby" in phrased searches:
> 
> "erik enge"-> erik adjoinedby enge
> erik ... enge  -> erik near enge
> 
> What do you think?

These both look like good spellings, and I think "erik near enge" would
be a good alias for "erik ... enge" as well..

- C

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-16 Thread Erik Enge

On Fri, 15 Jun 2001, Chris McDonough wrote:

> Once you're satisfied with the implementation, would you be willing
> submit the module to the collector?

Will do.  Have you thought about how users actually are to use
exact-phrase?  What I'm thinking I will do here (currently I've only been
testing explicitly with "adjoinedby" in the query) is to insert
"adjoinedby" in phrased searches:

"erik enge"-> erik adjoinedby enge
erik ... enge  -> erik near enge

What do you think?

I'll be submitting PositionIndex.py and ResultList.py in a day or two.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Chris McDonough


Erik,

Once you're satisfied with the implementation, would you be willing submit
the module to the collector?

- C

- Original Message -
From: "Erik Enge" <[EMAIL PROTECTED]>
To: "Chris McDonough" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, June 15, 2001 11:53 AM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


> On Thu, 14 Jun 2001, Erik Enge wrote:
>
> > To be really useful I think the PossitionIndex' _proximity dictionary
> > needs to be turned into a BTree of some sort, but apart from that I
> > don't know what is missing.
>
> It's now using BTrees.  And I renamed it to PositionIndex (thanks to
> Chris Withers for this :-).
>
> > And speed might be a problem, haven't really tested that yet.  Will
> > during the weekend though.
>
> I indexed 30.000 objects using PositionIndex and searching (both
> exact-phrase and near) is very fast.  It doesn't seem to be bloated,
> either (the _proximity-attribute, that is).
>
> Do you guys have a testing-suite for indexes?  Maybe some I can apply to
> this index of mine?
>
>
> ___
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
>


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Erik Enge

On Thu, 14 Jun 2001, Erik Enge wrote:

> To be really useful I think the PossitionIndex' _proximity dictionary
> needs to be turned into a BTree of some sort, but apart from that I
> don't know what is missing.

It's now using BTrees.  And I renamed it to PositionIndex (thanks to
Chris Withers for this :-).

> And speed might be a problem, haven't really tested that yet.  Will
> during the weekend though.

I indexed 30.000 objects using PositionIndex and searching (both
exact-phrase and near) is very fast.  It doesn't seem to be bloated,
either (the _proximity-attribute, that is).

Do you guys have a testing-suite for indexes?  Maybe some I can apply to
this index of mine?

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-15 Thread Erik Enge

On Thu, 14 Jun 2001, Chris McDonough wrote:

> Excellent!  I haven't looked at it in detail, but thanks very much for
> contributing it! Maybe we can roll some of this work into a
> position-aware Text Index

It is actually a TextIndex on steoroids.  Remove the _proximity attribute
and a couple of methods and what you are left with is a standard
TextIndex.  So I think what you already have is a position-aware
TextIndex.  That's how I'm planning to use it anyway :)

> or maybe even a new kind of Pluggable Index.

:-)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-14 Thread Chris McDonough

Excellent!  I haven't looked at it in detail, but thanks very much for
contributing it! Maybe we can roll some of this work into a position-aware
Text Index, or maybe even a new kind of Pluggable Index.

- C

- Original Message -
From: "Erik Enge" <[EMAIL PROTECTED]>
To: "Chris McDonough" <[EMAIL PROTECTED]>
Cc: "Oren Yosifon" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, June 14, 2001 12:45 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)

> On Thu, 14 Jun 2001, Erik Enge wrote:
>
> > Me got a patch: http://nittin.net/erik/software/PossitionIndex>.
>
> And I should mention that it has only been tested on Zope 2.3.2.
>
> (BTW, thanks, Chris, for suggesting how to code it.)
>
>

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

2001-06-14 Thread Erik Enge


On Thu, 14 Jun 2001, Erik Enge wrote:

> Me got a patch: http://nittin.net/erik/software/PossitionIndex>.  

And I should mention that it has only been tested on Zope 2.3.2.

(BTW, thanks, Chris, for suggesting how to code it.)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

RE: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

23 matches

Site Navigation

Mail list logo

Footer information