Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

2010-02-15 Thread Michael McCandless
I don't think Lucene makes this easy, today, out of the box. The scoring process for a boolean query doesn't track which sub-clause had matched. Though, it does track the number of clauses that matched (coord). EG you'd be able to tell that a given hit had both clauses match, vs only 1 (just

Re: Further refinement of search results - distinguishing hits with exact phrase match from the rest

2010-02-15 Thread mark harwood
Re Mike's delegating custom query suggestion - see https://issues.apache.org/jira/browse/LUCENE-1999 - Original Message From: Michael McCandless luc...@mikemccandless.com To: java-user@lucene.apache.org Sent: Mon, 15 February, 2010 10:03:30 Subject: Re: Further refinement of search

Strange Fuzzyquery results scoring when using a low minimal distance

2010-02-15 Thread stefcl
Hello, I'm using Lucene v3. Please consider the following spellings Lucene Lucéne lucéne Lucane Lucen When searching for lucéne among those words using a FuzzyQuery (with 0.5 edit distance), results show : 1. Lucene 1.0259752 2. Lucane 1.0259752 3. Lucéne 0.95660806 4. lucéne 0.95660806 5.

Re: question regarding BooleanQuery:equals() method

2010-02-15 Thread Smith G
Hello All, I am really sorry for not following the rules and bringing it to the top. It is important at the moment. Thanks. On 11 February 2010 15:51, Smith G gudumba.sm...@gmail.com wrote: Hello All,            I am writing some test cases for a custom-class which modifies

PayloadNearSpanScorer explain method

2010-02-15 Thread Peter Keegan
The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. I don't see an easy way to override this because 'payloadsSeen' and 'payloadScore' are private/protected. It seems like the 'PayloadFunction' interface should have an 'explain' method that the Scorer could

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Tom Burton-West
Hi Chris, In our experience with large indexes (about 200-300GB) , we found most of our bottlenecks involved disk I/O. We found that if our experimental indexes were too small, that much of the index could fit in cache, and so our test results were not applicable to our larger indexes. On

Controlling what is indexed / normalizing our index

2010-02-15 Thread maxSchlein
We have a list of keywords with aliases (Example: keyword = ms access aliases = microsoft access, msaccess, m.s. access ) We would like to intercept the aliases prior to them being indexed, and have the keyword indexed instead. We can do this with a CustomFilter for single word aliases.

Re: Controlling what is indexed / normalizing our index

2010-02-15 Thread Ahmet Arslan
We have a list of keywords with aliases (Example:  keyword = ms access aliases = microsoft access, msaccess, m.s. access  ) We would like to intercept the aliases prior to them being indexed, and have the keyword indexed instead.  We can do this with a CustomFilter for single word

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Peter Keegan
Same experience here as Tom. Disk I/O becomes bottleneck with large indexes (or multiple shards per server) with less memory. Frequent updates to indexes can make the I/O bottleneck worse. Peter On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West tburtonw...@gmail.comwrote: Hi Chris, In our