Re: FuzzyQuery prefix length

2004-10-20 Thread Doug Cutting
Daniel Naber wrote: On Tuesday 12 October 2004 17:22, Doug Cutting wrote: Which is worse: a person who searches for Photokopie~ in a 1000 document collection does not find documents containing Fotokopie; or a person who searches for Photokopie~ in a 1M document collection doesn't find anything beca

Re: FuzzyQuery prefix length

2004-10-20 Thread Bill Janssen
> Developers may always change this by calling > QueryParser.setFuzzyPrefixLength(). So at issue is which behaviour is > better for developers who do not know of this parameter. Is it more > important that their applications perform well or that they find all > matches to fuzzy queries? Rela

Re: Normalized Scoring

2004-10-20 Thread Paul Elschot
Chuck, Hits normalizes the final highest score to 1.0, and you can mplement your own HitCollector to suppress that normalisation. For the rest have a look at Weight, it can easily be used for your example by having sumOfSquaredWeights() return some sum of the weights, and letting normalize() do

Re: FuzzyQuery prefix length

2004-10-20 Thread Bernhard Messer
Doug Cutting wrote: Daniel Naber wrote: On Tuesday 12 October 2004 17:22, Doug Cutting wrote: Which is worse: a person who searches for Photokopie~ in a 1000 document collection does not find documents containing Fotokopie; or a person who searches for Photokopie~ in a 1M document collection does

Re: FuzzyQuery prefix length

2004-10-20 Thread Erik Hatcher
On Oct 20, 2004, at 12:14 PM, Doug Cutting wrote: The advantages of a zero-character prefix default are that it's back-compatibile and that it will find more matches, when spelling differences are in the first characters. I prefer this default. Anyone using QueryParser needs to be aware of the i

Re: FuzzyQuery prefix length

2004-10-20 Thread Scott Ganyo
I prefer this as well. But then again I didn't agree with the TooManyClauses decision, either, where it was decided that the better good was served by protecting the user regardless of whether he or she wanted it. Isn't this pretty much debating this philosophy again? On Oct 20, 2004, at 12:5

Retrieving Document Boosts

2004-10-20 Thread Dan Climan
I was trying to test whether the Document Boosts I calculate and add during indexing were being preserved correctly. I understand that what's actually preserved by default is Field Boost * Document Boost * lengthNorm I'm using default similarity and initially had no field boosts or document boo

Re: Retrieving Document Boosts

2004-10-20 Thread Doug Cutting
Dan Climan wrote: TermEnum terms = ir.terms(); int numTerms = 0; while (terms.next()) { Term t = terms.term(); if (t.field().equals("FullText")) numTerms++; }