Re: search quality - assessment & improvements

2007-07-19 Thread Chris Hostetter
: (d) Now we might get stupid (or erroneous) : few words docs as top results; : (e) To solve this, pivoted doc-length-norm punishes too : long docs (longer than the average) but only slightly : rewards docs that are shorter than the average. I get that your calculation is much more gr

Re: search quality - assessment & improvements

2007-07-19 Thread Doron Cohen
> However ... i still think that if you realy want > a length norm that takes into account the average > length of the docs, you want one that rewards docs > for being near the average ... ... like SweetSpotSimilarity (SSS) > it doesn't seem to make a lot of sense to me to say > that a doc whose

Re: search quality - assessment & improvements

2007-07-18 Thread Chris Hostetter
: The Similarity portion of the payload functionality could be used for : scoring binary fields. that can be used as a hook to decide how to evaluate an arbitrary byte[] payload as a float for the purposes of scoring -- but it doesn't address the problem of how do we write/read a payload which is

Re: search quality - assessment & improvements

2007-07-18 Thread Chris Hostetter
: Yes, actually: 1 / sqrt((1 - Slope) * Pivot + (Slope) * Doclen) interesting ... it doesn't really seem like there is any direct relationship between your average length (Pivot) and your Doclen -- on the surface when i first read your example it seemed like it has more to do with the shifting o

Re: search quality - assessment & improvements

2007-07-16 Thread Doron Cohen
Chris Hostetter wrote: > isn't that just a flat line with a slope relative to teh > specified "Slope" > ? your pivot just seems to affect the y-intercept (which would be the > lengthNorm for field containing 0 terms) but doesn't that cancel out of > any scoring equation since the fieldNorm is mul

Re: search quality - assessment & improvements

2007-07-16 Thread Grant Ingersoll
On Jul 16, 2007, at 9:24 PM, Chris Hostetter wrote: Hmmm... perhaps what we need is a generalization of the pyaload API to allow storing/reading payloads on a per document, per field, or per index basis ... along with some sort of "PayloadMerger" that could be used by InexWriter when merg

Re: search quality - assessment & improvements

2007-07-16 Thread Doron Cohen
> : I think both are not good enough for large dynamic collections. > : Both are good enough for experiments. But it should be more > : efficient in a working dynamic large system. > > Hmmm... perhaps what we need is a generalization of the pyaload API to > allow storing/reading payloads on a per d

Re: search quality - assessment & improvements

2007-07-16 Thread Chris Hostetter
: Basically it is : (1 - Slope) * Pivot + (Slope) * Doclen : Where Pivot reflects on the average doc length, and : Smaller Slope reduces the amount by which short docs : are preferred over long ones. In collection with very isn't that just a flat line with a slope relative to teh specified "Slo

Re: search quality - assessment & improvements

2007-07-16 Thread Doron Cohen
Chris Hostetter wrote: > i guess i'm not following how exactly your pivoted norm calculation works > ... it sounds like you are still rewarding 1 term long fields more then True. > any other length ... is the distinction between your approach and the > default implementation just that the defaul

Re: search quality - assessment & improvements

2007-07-08 Thread Chris Hostetter
: Thanks for your comments Chris, and sorry for the delayed my turn for a delayed response ... i figured there was no rush since you were offline for 10 days :) : I didn't try this - passing the computed avg doc length to : SweetSpotSimilarity (SSS) - it would be interesting to try. I wonder : h

Re: search quality - assessment & improvements

2007-07-01 Thread Doron Cohen
Thanks for your comments Chris, and sorry for the delayed response - you raised some tough questions for me, and I felt I have to clear my thoughts on this before replying. (Well, as you'll see below they are not too clear now either, but I am going to be off-line for the next ~10 days, so decided

Re: search quality - assessment & improvements

2007-06-30 Thread Sean Timm
Is this the paper that you are refering to? A. Chowdhury, D. Grossman, O. Frieder, C. McCabe, "Document Normalization Revisited" , ACM-SIGIR, August 2002. http://ir.iit.edu/~abdur/publications/p381-chowdhury.pdf -Sean Doron Cohen wrote on 6/30/2007, 4:56 AM: > In particular for TREC > data,

Re: search quality - assessment & improvements

2007-06-30 Thread Doron Cohen
Doug Cutting wrote: > We should be careful not to tune things too much for any one application > and/or dataset. Tools to perform evaluation would clearly be valuable. > But changes that improve Lucene's results on TREC data may or may not > be of general utility. The best way to tune an appli

Re: search quality - assessment & improvements

2007-06-29 Thread Doron Cohen
Nadav Har'El wrote: > Another approach is to use Term Relevance Sets, described in [1]. > This new approach not only requires less manual labor than > TREC's approach, > but also works better when the corpus is evolving. > > [1] "Scaling IR-System Evaluation using Term Relevance Sets", > Einat Ami

Re: search quality - assessment & improvements

2007-06-26 Thread Otis Gospodnetic
nt: Monday, June 25, 2007 8:48:03 PM Subject: Re: search quality - assessment & improvements On Jun 25, 2007, at 2:19 PM, Doron Cohen wrote: >> IANAL and I didn't read the link, but I think people publish their >> MAP scores, etc. all the time on TREC data. I think it implies

Re: search quality - assessment & improvements

2007-06-26 Thread Nadav Har'El
On Mon, Jun 25, 2007, Grant Ingersoll wrote about "Re: search quality - assessment & improvements": > 1. Create our own judgements on Wikipedia or the Reuters collection. > This is no doubt hard and would require a fair number of volunteers > and could/would compete

Re: search quality - assessment & improvements

2007-06-25 Thread Chris Hostetter
: For the first change, logic is that Lucene's default length normalization : punishes long documents too much. I found contrib's sweet-spot-similarity : helpful here, but not enough. I found that a better doc-length : normalization method is one that considers collection statistics - e.g. : avera

Re: search quality - assessment & improvements

2007-06-25 Thread Doug Cutting
Marvin Humphrey wrote: Wikipedia is a moving target. I think the collection would have to be static. In theory, one can evaluate against other search engines results for Wikipedia. However this may violate their EULAs... Doug ---

Re: search quality - assessment & improvements

2007-06-25 Thread Grant Ingersoll
Yes you are correct, we could use the specific version that we use for benchmarking. I was assuming that one, just didn't say it! :-) -Grant On Jun 25, 2007, at 3:00 PM, Marvin Humphrey wrote: On Jun 25, 2007, at 11:56 AM, Grant Ingersoll wrote: To do this, we could use Reuters or Wikipe

Re: search quality - assessment & improvements

2007-06-25 Thread Marvin Humphrey
On Jun 25, 2007, at 11:56 AM, Grant Ingersoll wrote: To do this, we could use Reuters or Wikipedia. The hard part is generating the queries and having people make relevance judgments for a sufficient sample size. Wikipedia is a moving target. I think the collection would have to be sta

Re: search quality - assessment & improvements

2007-06-25 Thread Grant Ingersoll
On Jun 25, 2007, at 2:04 PM, Doug Cutting wrote: Doron Cohen wrote: It is very important that we would be able to assess the search quality in a repeatable manner - so that anyone can repeat the quality tests, and maybe find ways to improve them. (This would also allow to verify the "impro

Re: search quality - assessment & improvements

2007-06-25 Thread Grant Ingersoll
On Jun 25, 2007, at 2:19 PM, Doron Cohen wrote: IANAL and I didn't read the link, but I think people publish their MAP scores, etc. all the time on TREC data. I think it implies that you obtained the data through legal means. So you're saying that if person "X" got the TREC data legally, we

Re: search quality - assessment & improvements

2007-06-25 Thread Doron Cohen
Hey Grant, thanks for your comments! Grant Ingersoll wrote: > As I am sure you are aware: https://issues.apache.org/jira/browse/ > LUCENE-836 I remembered you mentioning setting our own doc/query judgment system but forgot it was in LUCENE-836, thanks for the reminder. > On Jun 25, 2007, at 3:1

Re: search quality - assessment & improvements

2007-06-25 Thread Doug Cutting
Doron Cohen wrote: It is very important that we would be able to assess the search quality in a repeatable manner - so that anyone can repeat the quality tests, and maybe find ways to improve them. (This would also allow to verify the "improvements claims" above...). This capability seems like a

Re: search quality - assessment & improvements

2007-06-25 Thread Grant Ingersoll
Just to throw in a few things: First off, this is great! As I am sure you are aware: https://issues.apache.org/jira/browse/ LUCENE-836 On Jun 25, 2007, at 3:15 AM, Doron Cohen wrote: hi, this could probably split into two threads but for context let's start it in a single discussion; R

search quality - assessment & improvements

2007-06-25 Thread Doron Cohen
hi, this could probably split into two threads but for context let's start it in a single discussion; Recently I was looking at the search quality of Lucene - Recall and Precision, focused at [EMAIL PROTECTED],5,10,20 and, mainly, MAP. -- Part 1 -- I found out that quality can be enhanced by mo