lucene source tarball bugs

2005-01-28 Thread Jeff Breidenbach
Slightly more blatent is this problem which is still present in the Lucene 1.4.3 source tarball. http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg09083.html On Jan 28, 2005, at 6:50 PM, Robert Engels wrote: > The source 1.4-final builds available for download, have a 'version' >

Re: build version wrong?

2005-01-28 Thread Erik Hatcher
On Jan 28, 2005, at 6:50 PM, Robert Engels wrote: The source 1.4-final builds available for download, have a 'version' entry in the build.xml of 1.5-rc1-dev? Is it really the 1.4 final codebase? Yes, and this issue has come up a few times in the past. The version number is adjusted on purpose (

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread David Spencer
Chuck Williams wrote: Dave, are you using MultiFieldQueryParser and DefaultSimilarity for the vanilla implementation? Yes that's the plan. I'll try to have links to source etc too. It's important to know what we are comparing... I agree, that's why I'm trying to make sure everything is spelled out.

build version wrong?

2005-01-28 Thread Robert Engels
The source 1.4-final builds available for download, have a 'version' entry in the build.xml of 1.5-rc1-dev? Is it really the 1.4 final codebase? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
Dave, are you using MultiFieldQueryParser and DefaultSimilarity for the vanilla implementation? It's important to know what we are comparing... Chuck > -Original Message- > From: David Spencer [mailto:[EMAIL PROTECTED] > Sent: Friday, January 28, 2005 3:38 PM > To: Lucene Develop

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread David Spencer
Daniel Naber wrote: On Friday 28 January 2005 22:45, Chuck Williams wrote: The fact that is requires all terms in all fields is part of the problem. Once that is addressed, another problem is that Lucene does not provide a good mechanis That's fixed in CVS, so maybe the CVS version should be use

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
OK with me, assuming everything will run in the CVS version and there aren't changes that affect the semantics of any of my code. I've never tried it, and don't know whether or not Dave has. Chuck > -Original Message- > From: Daniel Naber [mailto:[EMAIL PROTECTED] > Sent: Friday,

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Daniel Naber
On Friday 28 January 2005 22:45, Chuck Williams wrote: > The fact that is requires all terms in all > fields is part of the problem. ÂOnce that is addressed, another problem > is that Lucene does not provide a good mechanis That's fixed in CVS, so maybe the CVS version should be used for the eva

DO NOT REPLY [Bug 32674] - [PATCH] MultiFieldQueryParser and BooleanQuery do not provide adequate support for queries across multiple fields

2005-01-28 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bu

cvs commit: jakarta-lucene/src/test/org/apache/lucene/search TestPhraseQuery.java

2005-01-28 Thread dnaber
dnaber 2005/01/28 14:22:04 Modified:src/test/org/apache/lucene/search TestPhraseQuery.java Log: test case that makes sure sloppy phrase queries use the term distance to calculate the result ranking Revision ChangesPath 1.10 +40 -0 jakarta-lucene/src/test/org/a

RE: -> Grouping Search Results by Clustering Snippets:

2005-01-28 Thread Joaquin Delgado
-Original Message- From: Joaquin Delgado Sent: Friday, January 28, 2005 4:41 PM To: 'Lucene Developers List'; [EMAIL PROTECTED] Subject: RE: -> Grouping Search Results by Clustering Snippets: This is a very interesting thread. Down is a link to a paper I published many years ago (1998)

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
Sorry for the mispost -- fingers slipped... Yes, but this part of the point. Lucene is a field-based search engine and its built-in support for taking simple queries and searching across relevant fields is poor. The fact that it requires all terms in all fields is part of the problem. Once that

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
Yes, but this part of the point. Lucene is a field-based search engine and its built-in support for taking simple queries and searching across relevant fields is poor. The fact that is requires all terms in all fields is part of the problem. Once that is addressed, another problem is that Lucene

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Daniel Naber
On Friday 28 January 2005 17:53, Chuck Williams wrote: > I think the baseline should use Lucene's MultiFieldQueryParser to expand > the query to search both title and body fields, as this is presumably > the current "out-of-the-box" solution. Please remember that this is kind of buggy in Lucene 1

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
David, I just posted WikipediaSimilarity to Bug 32674. I've also reviewed and tested the port to Java 1.4 -- it's fine (although all the casts remind me why I like 1.5 so much). Thanks to Miles Barr for this port! You don't want any of the test classes. You just need these 4 classes: Distribu

DO NOT REPLY [Bug 32674] - [PATCH] MultiFieldQueryParser and BooleanQuery do not provide adequate support for queries across multiple fields

2005-01-28 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bu

Re: Passage Search

2005-01-28 Thread Daniel Naber
On Friday 28 January 2005 01:10, Joaquin Delgado wrote: > <"man bites dog"~DOCSIZE> that would execute a phrasequery with the > slope being the individual document size in number of characters of each > hit. You can use the value of Integer.MAX_VALUE as a slop, Nutch does that to boost those mat

RE: -> Grouping Search Results by Clustering Snippets:

2005-01-28 Thread Otis Gospodnetic
This is very much of interest to me. Although it's not in the UI, I did integrate Lucene and Carrot2 in Simpy ( http://www.simpy.com ). Clustering is currently triggered only by a search. Although you may not be able to tell (again, sucky UI) Simpy is designed in a way that will let me hook in a

RE: -> Grouping Search Results by Clustering Snippets:

2005-01-28 Thread Adam Saltiel
This has been implemented in open source, but not with lucene? http://www.cs.put.poznan.pl/dweiss/carrot/ and http://carrot2.sourceforge.net/ David Weiss is a Polish academic at Poznan University, Poland. He and others have implemented a servlet based web app that uses pipe lined components that co

RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
Christoph Goller wrote: > So the changes for the MultiSearcher bug would remain locally in > MultiSearcher. > I think this would be a very clean solution. What do others think? Sounds good to me. Wolf is writing a patch to fix this bug, so it could depend on how far he's gotten already and

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Doug Cutting
Christoph Goller wrote: The similarity specified for the search has to be modified so that both idf(...) AND queryNorm(...) always return 1 and as you say everything except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts of the rewritten query. coord/tf/sloppyFreq computation wo

RE: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Chuck Williams
David Spencer wrote: > I'm on JDK 1.4.2_06 and Tomcat 4+. Had issues w/ the Tomcat 5.5+/JDK 1.5 > combo so I rolled back. There have been issues with Tomcat 5.5, although supposedly the latest version has them resolved. I'm using Tomcat 5.0.28 with JDK 1.5.0_01, which has been solid -- no

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Christoph Goller
Chuck Williams schrieb: Actually, the normalize is a third idf factor (in a different form, square-rooted in the denominator and summed). I.e., for a simple BoolanQuery: score(query, doc) = coord*queryNorm* sum[ term in query : idf(term)*boost(term)*idf(term)*tf(term, doc)*docNorm(d

Re: Passage Search

2005-01-28 Thread Christoph Goller
Joaquin Delgado schrieb: What is described here as "Passage Search" is nothing more than a PhraseQuery with a large slope. I think it's a UI problem rather than a ranking algorithm. For example you may want to have translate simple multi-term queries into phrasequery by default (instead of AND or O

Re: Passage Search

2005-01-28 Thread Paul Elschot
On Friday 28 January 2005 01:10, Joaquin Delgado wrote: > What is described here as "Passage Search" is nothing more than a > PhraseQuery with a large slope. I think it's a UI problem rather than a > ranking algorithm. For example you may want to have translate simple > multi-term queries into phra