Re: Aggregating/Grouping Document Search Results on a Field

2009-07-13 Thread Bradford Stephens
Thanks for this -- we're also trying out bobo-browse for Lucene, and early results look pretty enticing. They greatly sped up how fast you read in documents from disk, among other things: http://bobo-browse.wiki.sourceforge.net/ On Sat, Jul 11, 2009 at 12:10 AM, Shalin Shekhar

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-13 Thread Jason Rutherglen
SOLR 1.4 has a new feature https://issues.apache.org/jira/browse/SOLR-475that speeds up faceting on fields with many terms by adding an UnInvertedField. Bobo uses a custom field cache as well. It may be useful to benchmark the 3 different approaches (bitsets, SOLR-475, Bobo). This could be a good

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-13 Thread John Wang
Hi Brad: We have since (Bobo) added some perf tests which allows you to do some benchmarking very quickly: http://code.google.com/p/bobo-browse/wiki/BoboPerformance Let me know if you need help setting up. -John On Mon, Jul 13, 2009 at 10:41 AM, Jason Rutherglen

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-11 Thread Shalin Shekhar Mangar
On Sat, Jul 11, 2009 at 12:01 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-10 Thread Bradford Stephens
Does the facet aggregation take place on the Solr search server, or the Solr client? It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 million document index (about 36M unique values in the author field), a query that returns 131,000 hits takes about 20 seconds to calculate the

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-10 Thread Avlesh Singh
Does the facet aggregation take place on the Solr search server, or the Solr client? Solr server. Faceting is an expensive operation by nature, especially when the hits are large in number. Solr caches these values once computed. You might want to tweak cache related parameters in your solr

Aggregating/Grouping Document Search Results on a Field

2009-07-09 Thread Bradford Stephens
Greetings, We've been experimenting with grouping fields returned from document search results in Lucene, and we haven't gotten anything very encouraging. Basically, the more results we return, the longer it takes -- tens of seconds. Probably because we're doing expensive disks seeks. I'm hoping

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-09 Thread shb
you can refer to the facet search of solr, that might help you. 2009/7/10 Bradford Stephens bradfordsteph...@gmail.com Greetings, We've been experimenting with grouping fields returned from document search results in Lucene, and we haven't gotten anything very encouraging. Basically, the

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-09 Thread Bradford Stephens
It looks like field collapsing may be the key: http://issues.apache.org/jira/browse/SOLR-236 But it also doesn't seem to be 'finalized' yet. I wonder how performant it is with indexes of 50 million documents+? On Thu, Jul 9, 2009 at 9:42 PM, shbsuh...@gmail.com wrote: you can refer to the facet

Re: Aggregating/Grouping Document Search Results on a Field

2009-07-09 Thread Bradford Stephens
Oh, wow... I think that faceted search is the right path, especially since seeing this amazing site: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr I hope it's performant over hundreds of thousands of search results :) On Thu, Jul 9, 2009 at 10:13