Gentlemen,
On Friday 12 January 2007 21:00, Chuck Williams wrote:
>
> Doug Cutting wrote on 01/12/2007 09:49 AM:
> > Marvin Humphrey wrote:
> >> Can you show us some code or pseudo-code for a BooleanScorer that
> >> would use impact-sorted posting lists?
> >
> > Another way to interpret this prop
Doug Cutting wrote on 01/12/2007 09:49 AM:
> Marvin Humphrey wrote:
>> Can you show us some code or pseudo-code for a BooleanScorer that
>> would use impact-sorted posting lists?
>
> Another way to interpret this proposal is index-only: the low-level
> indexing APIs should be general enough to per
Marvin Humphrey wrote:
Can you show us some code or pseudo-code for a BooleanScorer that would
use impact-sorted posting lists?
Another way to interpret this proposal is index-only: the low-level
indexing APIs should be general enough to permit impact-sorted posting
lists, and perhaps an impa
Thanks Grant, I will take a look at this.
> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 11, 2007 8:12 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
> Hi Jeff,
>
> Wond
y 10, 2007 5:41 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
> I have a couple of questions about the original post of the
> new index design:
>
> (1) Question on the posting list
> > > f. > ,],...[docN, freq
> &g
IL PROTECTED]
> Sent: Wednesday, January 10, 2007 5:12 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
> Hi, Jeff,
>
> I like the idea of impact based scoring. However, could you
> elaborate more on why we only need to use single field at
On Jan 11, 2007, at 8:37 PM, Ming Lei wrote:
But practically, the approximation (as in my original
post) should work well enough for large corpus and
relevancy-driven retrieval.
The saving on disk access for large corpus (implies
very long posting list) will be huge by impact-sorted
posting
Marvin,
Several posts back on this thread, I talked about an
algorithm of impact-sorted posting list for
conjunctive boolean query. Your concerns on
impact-sorting in boolean retrieval model is valid.
But practically, the approximation (as in my original
post) should work well enough for large corp
On Jan 11, 2007, at 2:30 PM, jian chen wrote:
It seems to me that the impacted-sorted list makes sense if you are
trying
to do pure vector space based ranking. This is from what I have
read from
the research papers. They all talk about how to optimize the vector
space
model using this imp
I also got the same question. It seems it is very hard to efficiently do
phrase based query.
I think most search engines do phrase based query, or at least appear to be.
So, like in google, the query result must contain all the words user
searched on.
It seems to me that the impacted-sorted list
On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
e.
f. ],...[docN, freq
,])
How do you build an efficient PhraseScorer to work with an impact-
sorted posting list?
The way PhraseScorer currently works is: find a doc that contains all
terms, then see if the terms occur consecutively in
Hi Jeff,
Wondering if you (and/or others) would be interested in taking a look
at https://issues.apache.org/jira/browse/LUCENE-662 and vetting the
new interfaces, etc. to see if you could come up w/ a prototype
implementation. This would help move along 662 as it would sort out
some of t
The idea of "impact" and "impact-sorted posting list"
should practically work with boolean model by
approximation in the following way:
(1) Index Structure
Inverted-Index : *
posting-list: + (sorted
by impact)
occurrence: position
(2) Retrieval Algorithm for boolean query "a AND b"
set an impa
ses). In
> > > addition to having fewer posting lists to
> examine,
> > you often don't need
> > > to read to the end of long posting lists when
> > processing with a
> > > score-at-a-time approach (see Anh/Moffat's
> Pruned
> > Query Eva
gt; > score-at-a-time approach (see Anh/Moffat's Pruned
> Query Evaluation Using
> > Pre-Computed Impacts, SIGIR 2006) for details on
> one potential
> > algorithm.
> >
> > I'm not quite sure what you mean when mention
> leaving them out and
>
offat's Pruned
> Query Evaluation Using
> > Pre-Computed Impacts, SIGIR 2006) for details on
> one potential
> > algorithm.
> >
> > I'm not quite sure what you mean when mention
> leaving them out and
> > re-calculating them at merge time.
> >
>
.
- Jeff
> -Original Message-
> From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 09, 2007 2:58 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
>
> On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
>
when mention leaving them out and
re-calculating them at merge time.
- Jeff
> -Original Message-
> From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 09, 2007 2:58 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
>
>
ginal Message-
> From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 09, 2007 2:58 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Beyond Lucene 2.0 Index Design
>
>
> On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
>
> > e.
> >
rers to perform scoring and intersection.
The end product would be a very scalable and flexible solution.
- Jeff
> -Original Message-
> From: Doron Cohen [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 09, 2007 5:27 PM
> To: java-dev@lucene.apache.org
> Subject: Re:
Scoring today goes doc-at-a-time - all scorers and term-posting-readers
advance together; once a new doc is processed, scoring of previous docs is
known and final. This allows maintaining a finite size queue for collecting
best hits. Then, for huge collections, having to exhaustively scan all
posti
On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote:
e.
f. ],...[docN, freq
,])
Does the impact have any use after it's used to sort the postings?
Can we leave it out of the index format and recalculate at merge-time?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---
Hi,
I wanted to start some discussion about possible future Lucene file /
index formats. This is an extension to the discussion on Flexible
Lucene Indexing discussed on the wiki:
http://wiki.apache.org/jakarta-lucene/FlexibleIndexing
Note: Related sources are listed at the end.
I would like
23 matches
Mail list logo