Re: Confidence scores at search time

2009-03-05 Thread Chris Hostetter
: That being said, I could see maybe determining a delta value such that if the : distance between any two scores is more than the delta, you cut off the rest : of the docs. This takes into account the relative state of scores and is not : some arbitrary value (although, the delta is, of course)

RE: Confidence scores at search time

2009-03-05 Thread Chris Hostetter
: > Hmm, bugzilla has moved to JIRA. I'm not sure where the mapping is : > anymore. There used to be a Bugzilla Id in JIRA, I think. Sorry. FYI... by default the jira homepage has a form for searching by legacy bugzilla ID... https://issues.apache.org/jira/ ...if you create a Jira account

Re: Confidence scores at search time

2009-03-04 Thread Erik Hatcher
On Mar 4, 2009, at 9:05 AM, Michael McCandless wrote: I think (?) Explanation.toString() is in fact supposed to return the full explanation (not just the first line)? You're right... I just read the code wrong after seeing the output Ken posted originally. He followed up with a correct

Re: Confidence scores at search time

2009-03-04 Thread Michael McCandless
I think (?) Explanation.toString() is in fact supposed to return the full explanation (not just the first line)? Mike Ken Williams wrote: On 3/2/09 1:58 PM, "Erik Hatcher" wrote: On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: In the output, I get explanations like "0.88922405 = (M

Re: Confidence scores at search time

2009-03-02 Thread Ken Williams
On 3/2/09 4:23 PM, "Ken Williams" wrote: > On 3/2/09 1:58 PM, "Erik Hatcher" wrote: > >> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: >>> In the output, I get explanations like "0.88922405 = (MATCH) product >>> of:" >>> with no details. Perhaps I need to do something different in >>> ind

Re: Confidence scores at search time

2009-03-02 Thread Ken Williams
On 3/2/09 1:58 PM, "Erik Hatcher" wrote: > > On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: >> In the output, I get explanations like "0.88922405 = (MATCH) product >> of:" >> with no details. Perhaps I need to do something different in >> indexing? > > Explanation.toString() only returns t

Re: Confidence scores at search time

2009-03-02 Thread Ken Williams
On 3/2/09 4:19 PM, "Steven A Rowe" wrote: > On 3/2/2009 at 4:22 PM, Grant Ingersoll wrote: >> On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: >>> Also, while perusing the threads you refer to below, I saw a >>> reference to the following link, which seems to have gone dead: >>> >>> https://i

RE: Confidence scores at search time

2009-03-02 Thread Steven A Rowe
On 3/2/2009 at 4:22 PM, Grant Ingersoll wrote: > On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: > > Also, while perusing the threads you refer to below, I saw a > > reference to the following link, which seems to have gone dead: > > > > https://issues.apache.org/bugzilla/show_bug.cgi?id=31841 >

Re: Confidence scores at search time

2009-03-02 Thread Grant Ingersoll
On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: Hi Grant, It's true, I may have an X-Y problem here. =) My basic need is to sacrifice recall to achieve greater precision. Rather than always presenting the user with the top N documents, I need to return *only* the documents that seem rele

Re: Confidence scores at search time

2009-03-02 Thread Erik Hatcher
On Mar 2, 2009, at 2:47 PM, Ken Williams wrote: Finally, I seem unable to get Searcher.explain() to do much useful - my code looks like: Searcher searcher = new IndexSearcher(reader); QueryParser parser = new QueryParser(LuceneIndex.CONTENT, analyzer); Query query = pa

Re: Confidence scores at search time

2009-03-02 Thread Ken Williams
Hi Grant, It's true, I may have an X-Y problem here. =) My basic need is to sacrifice recall to achieve greater precision. Rather than always presenting the user with the top N documents, I need to return *only* the documents that seem relevant. For some searches this may be 3 documents, for so

Re: Confidence scores at search time

2009-02-28 Thread Grant Ingersoll
Personally, I have my doubts about this actually working and I think others do too. It's in there in Lucene, but I don't know if it makes sense. Logically speaking, I just don't see how it makes sense to compare different queries results, but maybe I'm just short-sighted. I'd certainly w

Re: Confidence scores at search time

2009-02-28 Thread Michael Stoppelman
I was just reading the Similarity javadocs ( http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_queryNorm) and I thought this might be relevant to your issue. >From the javadoc: *queryNorm(q) * is a normalizing factor used to make scores between queries compar

Re: Confidence scores at search time

2009-02-26 Thread Grant Ingersoll
I don't know of anyone doing work on it in the Lucene community. My understanding to date is that it is not really worth trying, but that may in fact be an outdated view. I haven't stayed up on the literature on this subject, so background info on what you are interested in would be help

Re: Confidence scores at search time

2009-02-25 Thread Michael Stoppelman
Hi Ken, I found this post on the Lucene documentation page: http://wiki.apache.org/lucene-java/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 In practice you sometimes need to have a cut-off or boost factor post tf-idf scoring. The way I've been going about it is by picking values and se

Re: Confidence scores at search time

2009-02-25 Thread Ken Williams
Hi all, I didn't get a response to this - not sure whether the question was ill-posed, or too-frequently-asked, or just not interesting. But if anyone could take a stab at it or let me know a different place to look, I'd really appreciate it. Thanks, -Ken On 2/20/09 12:00 PM, "Ken Williams"

Confidence scores at search time

2009-02-20 Thread Ken Williams
Hi, Has there been any work done on getting confidence scores at runtime, so that scores of documents can be compared across queries? I found one reference in the mailing list to some work in 2003, but couldn't find any follow-up: http://osdir.com/ml/jakarta.lucene.user/2003-12/msg00093.html