Re: RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Kevin A. Burton
Rasik Pandey wrote: Hello, I've been meaning to look into good ways to store token offset information to allow for very efficient highlighting and I believe Mark may also be looking into improving the highlighter via other means such as temporary ram indexes. Search the archives t

Re: RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Kevin A. Burton
Rasik Pandey wrote: Kevin, http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency

RE: Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Jochen Frey
> Several solutions have been proposed. The simplest is to not scan past > the first 10k or so for snippets unless nothing relevant is found in the > first 10k. I don't think Mark's highlighter yet does this, but I might > be mistaken. > > > since lucene already knows the > > frequency and posit

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Doug Cutting
Kevin A. Burton wrote: I'm playing with this package: http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient Does it just seem inefficient,

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread mark harwood
I intend to release a new version of the highlighter soon that should (hopefully) address some of the issues under discussion. The re-design will be based on the following principles: * A TokenStream will be passed to the highlighter to provide the source of tokens. The token stream could be pr

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Rasik Pandey
Kevin, > http://home.clara.net/markharwood/lucene/highlight.htm > > Trying to do hit highlighting. This implementation uses > another > Analyzer to find the positions for the result terms. > > This seems that it's very inefficient since lucene already > knows the > frequency and position of giv

RE : Performance of hit highlighting and finding term positions for a specific document

2004-03-31 Thread Rasik Pandey
Hello, > I've been meaning to look into good ways to store token offset > information to allow for very > efficient highlighting and I believe Mark may also be looking > into improving the highlighter via > other means such as temporary ram indexes. Search the archives > to get a background on som

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Bruce Ritchie
Kevin A. Burton wrote: I'm playing with this package: http://home.clara.net/markharwood/lucene/highlight.htm Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Kevin A. Burton
Erik Hatcher wrote: On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given term

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Stephane James Vaucher
I agree with you that a highlight package should be available directly from the lucene website. To offer this much-desired feature, having a dependency on a personal web site seems a little weird to me. It would also force the community to support this functionality, which would seem appropriat

Re: Performance of hit highlighting and finding term positions for a specific document

2004-03-30 Thread Erik Hatcher
On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote: Trying to do hit highlighting. This implementation uses another Analyzer to find the positions for the result terms. This seems that it's very inefficient since lucene already knows the frequency and position of given terms in the index. What i