Aggregating category hits

2006-05-15 Thread Marvin Humphrey
Greets, If you needed to know not just the total number of hits, but the number of hits in each "category", how would you handle that? For instance, a search for "egg" would have to produce the 20 most relevant documents for "egg", but also a list like this: Holiday & Seasonal / Easte

Re: Aggregating category hits

2006-05-15 Thread Andrzej Bialecki
Marvin Humphrey wrote: Greets, If you needed to know not just the total number of hits, but the number of hits in each "category", how would you handle that? For instance, a search for "egg" would have to produce the 20 most relevant documents for "egg", but also a list like this: Holi

Re: Aggregating category hits

2006-05-15 Thread Erik Hatcher
On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote: If you needed to know not just the total number of hits, but the number of hits in each "category", how would you handle that? For instance, a search for "egg" would have to produce the 20 most relevant documents for "egg", but also a list

Re: Aggregating category hits

2006-05-15 Thread Kapil Chhabra
Even I am doing the same in my application. Once in a day, all the filters [for different categories] are initialized. Each time a query is fired, the Query BitSet is ANDed with the BitSet of each filter. The cardinality obtained is the desired output. @Eric: I would like to know more about the

RE: Aggregating category hits

2006-05-16 Thread Ramana Jelda
2006 7:38 AM > To: java-user@lucene.apache.org > Subject: Re: Aggregating category hits > > Even I am doing the same in my application. > Once in a day, all the filters [for different categories] are > initialized. Each time a query is fired, the Query BitSet is > ANDed with

Re: Aggregating category hits

2006-05-16 Thread Erik Hatcher
On May 16, 2006, at 1:37 AM, Kapil Chhabra wrote: Even I am doing the same in my application. Once in a day, all the filters [for different categories] are initialized. Each time a query is fired, the Query BitSet is ANDed with the BitSet of each filter. The cardinality obtained is the des

Re: Aggregating category hits

2006-05-16 Thread Kapil Chhabra
you have documents in million numbers and categories in thousands. So I preferred in my project FieldCache strategy. Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 7:38 AM To: java-user@lucene.apache.org Subject: Re: Aggregating cate

RE: Aggregating category hits

2006-05-16 Thread Ramana Jelda
o: java-user@lucene.apache.org > Subject: Re: Aggregating category hits > > Hi Jelda, > I have not yet migrated to Lucene 1.9 and I guess FieldCache > has been introduced in this release. > Can you please give me a pointer to your strategy of FieldCache? > > Thanks & Regards, &g

Re: Aggregating category hits

2006-05-16 Thread Kapil Chhabra
ginal Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:50 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Hi Jelda, I have not yet migrated to Lucene 1.9 and I guess FieldCache has been introduced in this release. Can you please

Re: Aggregating category hits

2006-05-16 Thread Marvin Humphrey
Thanks, all. The field cache and the bitsets both seem like good options until the collection grows too large, provided that the index does not need to be updated very frequently. Then for large collections, there's statistical sampling. Any of those options seems preferable to retriev

Re: Aggregating category hits

2006-05-22 Thread Kapil Chhabra
t (do lazy initialization of categoryCounts holder.FAQ.) //6 You are done.. :) All the best, Jelda -Original Message- From: Kapil Chhabra [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:50 AM To: java-user@lucene.apache.org Subject: Re: Aggregating category hits Hi Jel

RE: Aggregating category hits

2006-05-22 Thread Ramana Jelda
; Sent: Monday, May 22, 2006 2:07 AM > To: java-user@lucene.apache.org > Subject: Re: Aggregating category hits > > Hi Jelda, > Is there any way by which I can achieve sorting of search > results along with overriding the collect method of the > HitCollector in this case? &

Re: Aggregating category hits

2006-05-29 Thread zzzzz shalev
i know im a little late replying to this thread, but, in my humble opinion the best way to aggregate values (not necessarily terms, but whole values in fields) is as follows: startup stage: for each field you would like to aggregate create a hashmap open an index reader and run

Re: Aggregating category hits

2006-06-09 Thread Peter Keegan
I compared Solr's DocSetHitCollector and counting bitset intersections to get facet counts with a different approach that uses a custom hit collector that tests each docid hit (bit) with each facets' bitset and increments a count in a histogram. My assumption was that for queries with few hits, th

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
hi peter, two quick questions 1. could you let me know what kind of response time you were getting with solr (as well as the size of data and result sizes) 2. i took a really really quick look at DocSetHitCollector and saw the dreaded if (bits==null) bits = new BitSe

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote: 1. could you let me know what kind of response time you were getting with solr (as well as the size of data and result sizes) A can tell you a little bit about ours... on one CNET faceted browsing implementation using Solr, the number of fa

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/9/06, Peter Keegan <[EMAIL PROTECTED]> wrote: However, my throughput testing shows that the Solr method is at least 50% faster than mine. I'm seeing a big win with the use of the HashDocSet for lower hit counts. On my 64-bit platform, a MAX_SIZE value of 10K-20K seems to provide optimal perf

Re: Aggregating category hits

2006-06-10 Thread Chris Hostetter
: A can tell you a little bit about ours... on one CNET faceted browsing : implementation using Solr, the number of facets to check per request : average somewhere between 100 and 200 (the total number of unique : facets is much larger though). The median request time is 3ms (and I : don't think

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
hi yonik, thanks for the thurough reply,, a few more quick questions... "the number of facets to check per request average somewhere between 100 and 200 (the total number of unique facets is much larger though). " you mean 100 - 200 different catagories to facet? i ran

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote: "the number of facets to check per request average somewhere between 100 and 200 (the total number of unique facets is much larger though). " you mean 100 - 200 different catagories to facet? I was going by memory, but 100 to 200 set inte

Re: Aggregating category hits

2006-06-12 Thread Peter Keegan
I'm seeing query throughput of approx. 290 qps with OpenBitSet vs. 270 with BitSet. I had to reduce the max. HashDocSet size to 2K - 3K (from 10K-20K) to get optimal tradeoff. no. docs in index: 730,000 average no. results returned: 40 average response time: 50 msec (15-20 for counting facets) no

Re: Aggregating category hits

2006-06-14 Thread Peter Keegan
The performance results in my previous posting were based on an implementation that performs 2 searches, one for getting 'Hits' and another for getting the BitSet. I reimplemented this in one search using the code in 'SolrIndexSearcher.getDocListAndSetNC' and I'm now getting throughput of 350-375