Re: TREC Data and Topic-Specific Index

2010-02-11 Thread Ivan Provalov
Thank you, Robert. --- On Wed, 2/10/10, Robert Muir wrote: > From: Robert Muir > Subject: Re: TREC Data and Topic-Specific Index > To: java-user@lucene.apache.org > Date: Wednesday, February 10, 2010, 9:23 AM > Hi, so you mean around 15% and 24% > respectively? i think you

Re: TREC Data and Topic-Specific Index

2010-02-10 Thread Robert Muir
tcSimilarity gets us more improvement: > 0.175-0.141=0.034. > > Thanks, > > Ivan > > --- On Sun, 2/7/10, Robert Muir wrote: > > > From: Robert Muir > > Subject: Re: TREC Data and Topic-Specific Index > > To: java-user@lucene.apache.org > > Date: Sunda

Re: TREC Data and Topic-Specific Index

2010-02-10 Thread Ivan Provalov
Muir wrote: > From: Robert Muir > Subject: Re: TREC Data and Topic-Specific Index > To: java-user@lucene.apache.org > Date: Sunday, February 7, 2010, 10:59 PM > you should do (a), and pretend you > know nothing about the relevance > judgements up front. > > it is true

Re: TREC Data and Topic-Specific Index

2010-02-07 Thread Robert Muir
you should do (a), and pretend you know nothing about the relevance judgements up front. it is true you might make some change to your search engine and wonder, how is it fair that I am bringing back possibly relevant docs that were never judged (and thus scored implicitly as non-relevant)? i.e. t

TREC Data and Topic-Specific Index

2010-02-07 Thread Ivan Provalov
Robert, We are using TREC-3 data and Ad Hoc topics 151-200. The relevance judgments list contains 97,319 entries, of which 68,559 are unique document ids. The TIPSTER collection which was used in TREC-3 is around 750,000 documents. Should we (a) index the entire 750,000 document collection