Re: Benchmarking on GOV2

2006-05-29 Thread Sebastiano Vigna
On Mon, 2006-05-29 at 14:35 -1000, Chuck Williams wrote: > I'm not sure what form you would like that help to take, but here are a > couple high-level points imho: Help in configuring Lucene so that it uses all resources available, and so that the results returned are identical to all other engin

Re: Benchmarking on GOV2

2006-05-29 Thread Chuck Williams
Sebastiano Vigna wrote on 05/28/2006 10:39 PM: > but we will certainly need > some help to configure Lucene so that it works at its best. > > We would like to measure indexing time and query answer time > I'm not sure what form you would like that help to take, but here are a couple high-level

Re: Benchmarking on GOV2

2006-05-29 Thread Marvin Humphrey
On May 29, 2006, at 10:58 AM, Andrzej Bialecki wrote: Has anyone used existing categorization data associated with the Reuters corpus to build a benchmarker that measured IR precision and/or recall? That would be RCV1 or RCV2, right? AFAIK the Reuters-21578 has no such information ... Th

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
Marvin Humphrey wrote: On May 29, 2006, at 10:34 AM, Andrzej Bialecki wrote: It could use the Reuters corpus Has anyone used existing categorization data associated with the Reuters corpus to build a benchmarker that measured IR precision and/or recall? That would be RCV1 or RCV2, right

Re: Benchmarking on GOV2

2006-05-29 Thread Marvin Humphrey
On May 29, 2006, at 10:34 AM, Andrzej Bialecki wrote: It could use the Reuters corpus Has anyone used existing categorization data associated with the Reuters corpus to build a benchmarker that measured IR precision and/ or recall? Marvin Humphrey Rectangular Research http://www.rectang

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
Otis Gospodnetic wrote: OG: But Andrzej, you already wrote that indexing benchmark tool (which we never put anywhere in SVN, I'm afraid) that works on some freely available Reuters corpus, I believe. Why couldn't that be adapted for testing Lucene, Egothor, and MG4J? Hmm, yes, indeed I h

Re: Benchmarking on GOV2

2006-05-29 Thread Otis Gospodnetic
Hi, - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> Dave Kor wrote: > Hi, > > On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote: >> Dear Lucene developers, >> I'd be interested in doing some benchmarking on (at least) Lucene, >> Egothor and MG4J. There is no actual dat

Re: Benchmarking on GOV2

2006-05-29 Thread Andrzej Bialecki
Dave Kor wrote: Hi, On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote: Dear Lucene developers, I'd be interested in doing some benchmarking on (at least) Lucene, Egothor and MG4J. There is no actual data around on publicly available collections, and it would be nice to have some more objec

Re: Benchmarking on GOV2

2006-05-29 Thread Sebastiano Vigna
On Mon, 2006-05-29 at 17:33 +0800, Dave Kor wrote: > I was wondering if you have seen the TREC 2004 paper by Giuseppe > Attardi, Andrea Esuli and Chirag Pate from the University of Pisa, > Italy, titled "Using Clustering and Blade Clusters in the TeraByte > task"? http://trec.nist.gov/pubs/trec13/

Re: Benchmarking on GOV2

2006-05-29 Thread eks dev
al Message From: Sebastiano Vigna <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Monday, 29 May, 2006 10:39:43 AM Subject: Benchmarking on GOV2 Dear Lucene developers, I'd be interested in doing some benchmarking on (at least) Lucene, Egothor and MG4J. There is no actual da

Re: Benchmarking on GOV2

2006-05-29 Thread Murat . Yakici
Hi, We have been doing such a benchmark over all TREC collections and TREC queries. Our participation to TREC in last years gives us the opportunity to work on the collections. Lucene is one of the systems that we look at. The measurements are based on two functionalities; indexing and querying. W

Re: Benchmarking on GOV2

2006-05-29 Thread Dave Kor
Hi, On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote: Dear Lucene developers, I'd be interested in doing some benchmarking on (at least) Lucene, Egothor and MG4J. There is no actual data around on publicly available collections, and it would be nice to have some more objective data on effi

Benchmarking on GOV2

2006-05-29 Thread Sebastiano Vigna
Dear Lucene developers, I'd be interested in doing some benchmarking on (at least) Lucene, Egothor and MG4J. There is no actual data around on publicly available collections, and it would be nice to have some more objective data on efficiency for a significantly large collection. We have GOV2 (25M