Wolf Siberski wrote:
The price is an extension (or modification) of the
Searchable interface. I've added corresponding search(Weight...) methods
to the existing search(Query...) methods and deprecated the latter.
I think this is the right solution.
If Searchable is meant to be Lucene internal, then
Doug Cutting wrote:
Wolf Siberski wrote:
Now I found another solution which requires more changes, but IMHO is
much cleaner:
- when a query computes its Weight, it caches it in an attribute
- a query can be 'frozen'. A frozen query always returns the cached
Weight when calling Query.weight().
Or
Wolf Siberski wrote:
Now I found another solution which requires more changes, but IMHO is
much cleaner:
- when a query computes its Weight, it caches it in an attribute
- a query can be 'frozen'. A frozen query always returns the cached
Weight when calling Query.weight().
Orignally there was no
Doug Cutting wrote:
Christoph Goller wrote:
The similarity specified for the search has to be modified so that both
idf(...) AND queryNorm(...) always return 1 and as you say everything
except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts
of the rewritten query. coord/tf/slopp
Christoph Goller wrote:
> Chuck Williams wrote:
>> score(query, doc) =
>> coord*queryNorm*
>> sum[ term in query :
>> idf(term)*boost(term)*idf(term)*tf(term, doc)*docNorm(doc)
>>]
>>
>> where queryNorm = 1/sum[ term in query : (boost(term)*idf(term))^2 ]
>> [...] The MultiSearcher bo
TECTED]
> Sent: Monday, February 07, 2005 3:36 PM
> To: Lucene Developers List
> Subject: Re: single field code ready - Re: URL to compare 2
Similarity's
> ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher proble
Daniel Naber wrote:
On Tuesday 08 February 2005 00:06, David Spencer wrote:
So, does this make sense and is it useful way of trying to evaluate the
Similarities?
Is this the MultiFieldQueryParser from Lucene 1.4?
I see WEB-INF/lib/lucene-1.5-rc1-dev.jar dated Jan 28, though I'm not
sure if that
On Tuesday 08 February 2005 00:06, David Spencer wrote:
> So, does this make sense and is it useful way of trying to evaluate the
> Similarities?
Is this the MultiFieldQueryParser from Lucene 1.4? Then it's "buggy"
anyway, so it probably doesn't make sense to test it. But even with the
current
e (q6/q8).
So, does this make sense and is it useful way of trying to evaluate the
Similarities?
I think another thread w/ a different thread has started on this topic,
I'll try to redirect it back here.
thx,
Dave
My $0.02,
Chuck
> -Original Message-----
> From: David Spencer [
Paul Elschot wrote:
> On Wednesday 02 February 2005 03:38, Chuck Williams wrote:
> > I was hoping to do this
> > by simple thresholding, e.g. achieve a property like "results with
all
> > terms matched are always in [0.8, 1.0], and results missing a term
> > always have a score less than
On Wednesday 02 February 2005 03:38, Chuck Williams wrote:
> Paul Elschot wrote:
> > An alternative is to make sure all scores are bounded.
> > Then the coordination factor can be implemented in the same bound
> > while preserving the coordination order.
>
> If I understand this, I think mor
akarta.apache.org
> Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring
benchmark
> evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
> problems with Similarity.docFreq() ?
>
> Doug,
>
> On Tuesday 01 February 2005 20:05, Doug Cut
Doug,
On Tuesday 01 February 2005 20:05, Doug Cutting wrote:
> Chuck Williams wrote:
> > > So I think this can be implemented using the expansion I proposed
> > > yesterday for MultiFieldQueryParser, plus something like my
> > > DensityPhraseQuery and perhaps a few Similarity tweaks.
> >
>
AM
> To: Lucene Developers List
> Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring
benchmark
> evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
> problems with Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > > So I
David Spencer wrote:
Let's start with the issue that's been raised so much: whether idf is
better defined with log() or sqrt(log()).
I can redo my page and rebuild indexes if necessary, I just need it
clarified what we want to do, esp -> does the index need to be rebuilt?
The index needs to be r
27;m proposing for Default-OR) should be separate.
My $0.02,
Chuck
> -Original Message-
> From: David Spencer [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, February 01, 2005 10:59 AM
> To: Lucene Developers List
> Subject: Re: URL to compare 2 Similarity's re
Chuck Williams wrote:
> So I think this can be implemented using the expansion I proposed
> yesterday for MultiFieldQueryParser, plus something like my
> DensityPhraseQuery and perhaps a few Similarity tweaks.
I don't think that works unless the mechanism is limited to default-AND
(i.e., all
Doug Cutting wrote:
David Spencer wrote:
+(f1:t1^2.0 t1) +(f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) (f1:t3^2.0 t3) (f1:t4^2.0 t4) (f1:t5^2.0
t5) f1:"t1 t2 t3 t4 t5"~5^3.0 "t1 t2 t3 t4 t5"~2^1.5
This loo
Doug Cutting wrote:
> That's a lot of functionality bundled into a single Query class!
I'd
> rather make it possible to assemble this from reusable parts. And
it
> almost can be already. Then we can offer such a thing pre-packaged.
That would be great, if it could be done.
> So let me t
David Spencer wrote:
+(f1:t1^2.0 t1) +(f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) f1:"t1 t2"~5^3.0 "t1 t2"~2^1.5
(f1:t1^2.0 t1) (f1:t2^2.0 t2) (f1:t3^2.0 t3) (f1:t4^2.0 t4) (f1:t5^2.0
t5) f1:"t1 t2 t3 t4 t5"~5^3.0 "t1 t2 t3 t4 t5"~2^1.5
This looks great to me! I'd
Chuck Williams wrote:
Doug Cutting wrote:
> What did you think of my DensityPhraseQuery proposal?
It is a step in the direction of what I have in mind, but I'd like to go
further. How about a query class with these properties:
1. Inputs are:
a. F = list of fields
b. B = list of
Doug Cutting wrote:
David Spencer wrote:
I worked w/ Chuck to get up a test page that shows search results with
2 versions of Similarity side by side.
David,
This looks great! Thanks for doing this.
Is the default operator AND or OR? It appears to be OR, but it should
probably be AND. That's
cene Developers List
> Subject: Re: URL to compare 2 Similarity's ready-- Re: Scoring
benchmark
> evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher
> problems with Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > That expansion is scalable, but
Chuck Williams wrote:
That expansion is scalable, but it only accounts for proximity of all
query terms together. E.g., it does not favor a match where t1 and t2
are close together while t3 is distant over a match where all 3 terms
are distant. Worse, it would not favor a match with t1 and t2 in
Doug Cutting wrote:
David Spencer wrote:
But what is right if there are > 2 terms in terms of the phrases -
does it have a phrase for every pair of terms like this (ignore fields
and boosts and proximity for a sec):
search for "t1 t2 t3" gives you these phrases in addition to the
direct field m
David Spencer wrote:
But what is right if there are > 2 terms in terms of the phrases - does
it have a phrase for every pair of terms like this (ignore fields and
boosts and proximity for a sec):
search for "t1 t2 t3" gives you these phrases in addition to the direct
field matches:
"t1 t2"
"t2
> frequently do not include all query terms. I just tried this bizarre
> query:
> hilbert space frank zappa george bush john kerry
>
> There are two hits and they do not appear to have all terms (even in the
It could be that the anchor text pointing to these pages from some
other web page had t
Doug Cutting wrote:
David Spencer wrote:
I worked w/ Chuck to get up a test page that shows search results with
2 versions of Similarity side by side.
David,
This looks great! Thanks for doing this.
Is the default operator AND or OR? It appears to be OR, but it should
probably be AND. That's
Folks,
In the light of this discussion, I'm working slowly on a new release of
Luke, which will include a BeanShell-driven Similarity designer.
However, this particular module is not finished yet... given my current
workload, this will take a week or two more...
--
Best regards,
Andrzej Bialeck
n error accessing the cache on
the first hit). This included looking at the page source.
Chuck
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 31, 2005 1:44 PM
> To: Lucene Developers List
> Subject: Re: URL to compar
Doug Cutting wrote:
David Spencer wrote:
I worked w/ Chuck to get up a test page that shows search results with
2 versions of Similarity side by side.
David,
This looks great! Thanks for doing this.
Thank you...it involved lots of back & forth interactions w/ Chuck over
the few days to get it t
Chuck Williams wrote:
I think the differences are pretty clear as the systems stands. Notice
a substantial difference in the idf's in the respective explanations. I
continue to think the current mechanism weights these too high,
primarily due to its squaring.
The other big difference occurs when
Doug Cutting wrote:
> Is the default operator AND or OR? It appears to be OR, but it
should
> probably be AND. That's become the industry standard since
QueryParser
> was first written. Also, any chance we can get explanations for
hits?
Explanations are available. Click the score link on
Doug Cutting wrote:
It would translate a query "t1 t2" given fields f1 and f2 into
something like:
+(f1:t1^b1 f2:t1^b2)
+(f2:t1^b1 f2:t2^b2)
Oops. The first term on that line should be "f1:t2", not "f2:t1":
+(f1:t2^b1 f2:t2^b2)
f1:"t1 t2"~s1^b3
f2:"t1 t2"~s2^b4
Doug
-
David Spencer wrote:
I worked w/ Chuck to get up a test page that shows search results with 2
versions of Similarity side by side.
David,
This looks great! Thanks for doing this.
Is the default operator AND or OR? It appears to be OR, but it should
probably be AND. That's become the industry s
larity
for
> the
> > vanilla implementation?
> >
> > It's important to know what we are comparing...
> >
> > Chuck
> >
> > > -Original Message-
> > > From: David Spencer [mailto:[EMAIL PROTECTED]
> > >
vid Spencer [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 28, 2005 3:38 PM
> To: Lucene Developers List
> Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
>
> Daniel Naber wrote:
>
> On Fri, 2005-01-28 at 21:42 +, Chuck Williams wrote:
> I just posted WikipediaSimilarity to Bug 32674. I've also reviewed and
> tested the port to Java 1.4 -- it's fine (although all the casts remind
> me why I like 1.5 so much). Thanks to Miles Barr for this port!
Not a problem, cheers fo
On Saturday 29 January 2005 00:37, David Spencer wrote:
> Hmmm, is it safe to assume I can build the index w/ lucene-1.4.3.jar but
>deploy the webapp for searching w/ lucene-1.5-rc1-dev.jar?
Yes, everything else would be a bug.
> And is the current code supposed to build with so many depreca
everything is spelled out.
Chuck
> -Original Message-
> From: David Spencer [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 28, 2005 3:38 PM
> To: Lucene Developers List
> Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - Mul
> To: Lucene Developers List
> Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
>
> Daniel Naber wrote:
>
> > On Friday 28 January 2005 22:45, Chuck Williams wrote:
> >
&
Daniel Naber wrote:
On Friday 28 January 2005 22:45, Chuck Williams wrote:
The fact that is requires all terms in all
fields is part of the problem. Once that is addressed, another problem
is that Lucene does not provide a good mechanis
That's fixed in CVS, so maybe the CVS version should be use
TED]
> Sent: Friday, January 28, 2005 3:21 PM
> To: Lucene Developers List
> Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed with
> Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
>
> On Friday 28 January 2005 22:45, Chuck Williams wrot
On Friday 28 January 2005 22:45, Chuck Williams wrote:
> The fact that is requires all terms in all
> fields is part of the problem. Ă‚Once that is addressed, another problem
> is that Lucene does not provide a good mechanis
That's fixed in CVS, so maybe the CVS version should be used for the
eva
r [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 28, 2005 1:44 PM
> To: Lucene Developers List
> Subject: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
>
> On Friday 28 January 2005 17:53,
ct: Re: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
>
> On Friday 28 January 2005 17:53, Chuck Williams wrote:
>
> > I think the baseline should use Lucene's MultiFieldQueryParser to
&
On Friday 28 January 2005 17:53, Chuck Williams wrote:
> I think the baseline should use Lucene's MultiFieldQueryParser to expand
> the query to search both title and body fields, as this is presumably
> the current "out-of-the-box" solution.
Please remember that this is kind of buggy in Lucene 1
uck
> -Original Message-
> From: Chuck Williams [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 28, 2005 8:53 AM
> To: Lucene Developers List
> Subject: RE: Scoring benchmark evaluation. Was RE: How to proceed
with
> Bug 31841 - MultiSearcher problems wit
ery vector factor that determines the normalization and the idf should
remain in the normalization.
Chuck
> -Original Message-
> From: Christoph Goller [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 28, 2005 1:29 AM
> To: Lucene Developers List
> Subject: R
Christoph Goller wrote:
The similarity specified for the search has to be modified so that both
idf(...) AND queryNorm(...) always return 1 and as you say everything
except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts
of the rewritten query. coord/tf/sloppyFreq computation wo
David Spencer wrote:
> I'm on JDK 1.4.2_06 and Tomcat 4+. Had issues w/ the Tomcat 5.5+/JDK
1.5
> combo so I rolled back.
There have been issues with Tomcat 5.5, although supposedly the latest
version has them resolved. I'm using Tomcat 5.0.28 with JDK 1.5.0_01,
which has been solid -- no
Chuck Williams schrieb:
Actually, the normalize is a third idf factor (in a different form,
square-rooted in the denominator and summed).
I.e., for a simple BoolanQuery:
score(query, doc) =
coord*queryNorm*
sum[ term in query :
idf(term)*boost(term)*idf(term)*tf(term, doc)*docNorm(d
ursday, January 27, 2005 2:36 PM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> Doug Cutting wrote:
>
> > Chuck Williams wrote:
> >
> >> Christoph
be "n" indexes and wikipedia-sim.jsp will
search in each one with the corresponding Similarity?
thx,
Dave
Chuck
> -Original Message-
> From: David Spencer [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 27, 2005 2:36 PM
> To: Lucene Developers List
>
ow or this weekend.
Chuck
> -Original Message-
> From: David Spencer [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 27, 2005 2:36 PM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.do
: Thursday, January 27, 2005 11:08 AM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > Christoph Goller writes:
> > > You may be right. But I a
Doug Cutting wrote:
Chuck Williams wrote:
Christoph Goller writes:
> You may be right. But I am not completely convinced. I think
> this should be decided based on the proposed benchmark evaluation.
Is that still happening?
Like anything else in an all-volunteer operation, it will only happen
Chuck Williams wrote:
Christoph Goller writes:
> You may be right. But I am not completely convinced. I think
> this should be decided based on the proposed benchmark evaluation.
Is that still happening?
Like anything else in an all-volunteer operation, it will only happen if
folks volunteer t
Christoph Goller writes:
> Chuck Williams schrieb:
> > Christoph Goller writes:
> > > My intention was to (ab-)use query boosts for idf transmission
and
> to
> > > overwrite Similarity so that local idf is ignored. The idea
was to
> > > simply multiply global idf into the given bo
Chuck Williams schrieb:
Christoph Goller writes:
> My intention was to (ab-)use query boosts for idf transmission and to
> overwrite Similarity so that local idf is ignored. The idea was to
> simply multiply global idf into the given boost. Unfortunately idf is
> not only used with the boos
ary 27, 2005 3:36 AM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems with
> Similarity.docFreq() ?
>
> Wolf Siberski schrieb:
> > This is more or less how the patch I already submitted works
> > (exc
Wolf Siberski schrieb:
This is more or less how the patch I already submitted works
(except that it ignored the query rewriting step). The problem I see
with this now is that if I (ab-)use the Similarity class for idf
transmission, it can't be redefined anymore by a user who wants to use
a custom S
Christoph Goller wrote:
[...]
I also think this is the best way to fix this bug. However there may be a
way to implement this while avoiding to change the Weight and Searchable
API. The idea is to rewrite the query in MultiSearcher and while rewriting
compile the global idf into the query boosts. F
Wolf Siberski schrieb:
Doug, Chuck,
thanks for your feedback, proposals and explanations.
The way to proceed seems quite clear to me now.
Due to other obligations it will take probably about
two to three weeks until I've implemented a new patch.
I'll get back to you as soon as it's finished.
--Wolf
Doug, Chuck,
thanks for your feedback, proposals and explanations.
The way to proceed seems quite clear to me now.
Due to other obligations it will take probably about
two to three weeks until I've implemented a new patch.
I'll get back to you as soon as it's finished.
--Wolf
--
nt: Friday, January 14, 2005 9:33 AM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > Doug Cutting wrote:
> > > It would indeed be nice to be
>>For example, the RPC made by its
> rewrite()
> implementation could also return the docFreq() of
> each term in the
> rewritten query
I haven't been following the remoting conversation in
detail bit this may be relevant:
Using the associated docFreq of each expanded term is
not particularly be
Chuck Williams wrote:
Doug Cutting wrote:
> It would indeed be nice to be able to short-circuit rewriting for
> queries where it is a no-op. Do you have a proposal for how this
could
> be done?
First, this gets into the other part of Bug 31841. I don't believe
MultiSearcher.rewrite() is eve
Wolf Siberski wrote:
Doug Cutting wrote:
So, when a query is executed on a MultiSearcher of RemoteSearchables,
the following remote calls are made:
1. RemoteSearchable.rewrite(Query) is called
After that step, are wildcards replaced by term lists?
Yes.
I haven't taken a look at the rewrite() met
Doug Cutting wrote:
Chuck Williams wrote:
I think the question is how frequent and how expensive would those two
steps be in comparison to the difference in the query processing.
I think the first question is: can we get RemoteSearchables to work
correctly and reasonably efficiently for simple que
Doug Cutting wrote:
So, when a query is executed on
a MultiSearcher of RemoteSearchables, the following remote calls are made:
1. RemoteSearchable.rewrite(Query) is called
After that step, are wildcards replaced by term lists?
I haven't taken a look at the rewrite() methods. Could
you explain to
Doug Cutting wrote:
Wolf Siberski wrote:
In the new context, the searcher would be a MultiSearcher,
and to resolve that call at on of the RemoteSearchables, the
method getSimilarity() would have to be called remotely on it.
I think this can be handled by:
a. declaring TermQuery.searcher transient -
ine() for all
query types (which is greatly simplified by a good default
implementation).
Chuck
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 13, 2005 11:41 AM
> To: Lucene Developers List
> Subject: Re: How to p
Chuck Williams wrote:
If auto-filters can provide an effective implementation for RangeQuery's
that avoids rewriting, and we can give up MultiTermQuery and PrefixQuery
in the distributed environment, then how about something like this
refinement:
1. No rewriting is done.
It would indeed be nice
From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 13, 2005 10:29 AM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > It just s
On Thursday 13 January 2005 19:29, Doug Cutting wrote:
> Chuck Williams wrote:
> > It just seems like a lot of IPC activity for each query. As things
> > stand now, I think you are proposing this?
> > 1. MultiSearcher calls the remote node to rewrite the query,
> > requiring serialization of th
Chuck Williams wrote:
It just seems like a lot of IPC activity for each query. As things
stand now, I think you are proposing this?
1. MultiSearcher calls the remote node to rewrite the query,
requiring serialization of the query.
2. The remote node returns the rewritten query to the dispatc
y processing.
Chuck
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 13, 2005 9:14 AM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
&g
Chuck Williams wrote:
I think there is another problem here. It is currently the Weight
implementations that do rewrite(), which requires access to the index,
not just to the idf's. E.g., RangeQuery.rewrite() must find the terms
in the index within the range. So, the Weight cannot be computed in
Wolf Siberski wrote:
Yes, I agree. I just wanted to point out that the current Weight
implementations need to be modified heavily to introduce the
behaviour you describe above. For example, take a look at
TermQuery.TermWeight.scorer():
[...]
return new TermScorer(this, termDocs, getSimilarity
--Original Message-
> From: Paul Elschot [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 13, 2005 12:18 AM
> To: lucene-dev@jakarta.apache.org
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> On Thursday 13 J
s.
> Chuck
>
> > -Original Message-
> > From: Wolf Siberski [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, January 12, 2005 4:08 PM
> > To: Lucene Developers List
> > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
> with
&
uted until after the Query is
rewritten, which requires access to the index on the remote node.
Chuck
> -Original Message-
> From: Wolf Siberski [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 12, 2005 4:08 PM
> To: Lucene Developers List
> Subject: Re: How
Doug Cutting wrote:
Wolf Siberski wrote:
Chuck Williams wrote:
This is a nice solution! By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called. This is similar to what I tried to do
with topmostSearcher, but a m
Doug Cutting wrote:
> Searchers are based on
> IndexReaders, and hence doFreqs don't change until a new Searcher is
> created. So long as this is true, and the central dispatch node
uses a
> searcher, then a simple cache, perhaps that is pre-fetched, is all
> that's feasable. It shouldn
Wolf Siberski wrote:
Chuck Williams wrote:
This is a nice solution! By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called. This is similar to what I tried to do
with topmostSearcher, but a much better way to do
Chuck Williams wrote:
There needs to be a way to create the aggregate docFreq table and keep
it current under incremental changes to the indices on the various
remote nodes.
I think you're getting ahead of yourself. Searchers are based on
IndexReaders, and hence doFreqs don't change until a new S
Chuck Williams wrote:
I've read through Wolf's patch and see a few issues (please correct
anything wrong here):
1. DfMapSimilarity works only with a limited set of queries.[...]
2. The patch hardwires the use of DfMapSimilarity into MultiSearcher.[...]
3. Philosophically, I'm not convinced
rom
the aggregate table to address this issue and assuming a docFreq of 1).
Is there a better way, or perhaps I'm missing something?
Chuck
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 12, 2005 8:58 AM
> To: Lucene
Chuck Williams wrote:
I was thinking of the aggressive version with an index-time solution,
although I don't know the Lucene architecture for distributed indexing
and searching well enough to formulate the idea precisely.
Conceptually, I'd like each server that owns a slice of the index in a
distri
end
the queries out to the remote Searcher's and these Searcher's could
consult their local indexes for the correct docFreq's to use.
Chuck
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 11, 2005 3:46 PM
Chuck Williams wrote:
This is a nice solution! By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called.
Glad to hear it at least makes sense... Now I hope it works!
I'm still left wondering if having MultiSearcher
t; From: Doug Cutting [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 11, 2005 1:13 PM
> To: Lucene Developers List
> Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
> Similarity.docFreq() ?
>
> Chuck Williams wrote:
> > As Wolf does, I
Chuck Williams wrote:
As Wolf does, I hope a committer with deep knowledge of Lucene's design
in this area will weigh in on the issue and help to resolve it.
The root of the bug is in MultiSearcher.search(). This should construct
a Weight, weight the query, then score the now-weighted query.
Her
ing efficient. Is something along those lines possible?
Chuck
> -Original Message-
> From: Wolf Siberski [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 11, 2005 12:55 AM
> To: Lucene Developers List
> Subject: How to proceed with Bug 31841 - MultiSearcher problems w
As I'm very interested in resolving this bug,
I would like to resume the discussion about it.
Chuck Williams (the original bug reporter) and me
both already have provided a patch. Is any of the
committers willing to review them?
If changes are necessary, or another way of handling
this issue turns
96 matches
Mail list logo