faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the trigrams field with about 1 million of entries in the result set and more than 100 million of entries to facet on in the index. Currently the faceted search is very slow, taking about 5 minutes per query. Would running on

Re: faceting over ngrams

2011-03-16 Thread Toke Eskildsen
On Wed, 2011-03-16 at 13:05 +0100, Dmitry Kan wrote: Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the trigrams field with about 1 million of entries in the result set and more than 100 million of entries to facet on in the index. Currently the faceted search is very

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
I don't know anything about trying to use map-reduce with Solr. But I can tell you that with about 6 million entries in the result set, and around 10 million values to facet on (facetting on a multi-value field) -- I still get fine performance in my application. In the worst case it can take

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
Ah, wait, you're doing sharding? Yeah, I am NOT doing sharding, so that could explain our different experiences. It seems like sharding definitely has trade-offs, makes some things faster and other things slower. So far I've managed to avoid it, in the interest of keeping things simpler and

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Jonathan, Thanks for sharing useful bits. Each shard has 16G of heap. Unless I do something fundamentally wrong in the SOLR configuration, I have to admit, that counting ngrams up to trigrams across whole set of shard's documents is pretty intensive task, as each ngram can occur anywhere in

Re: faceting over ngrams

2011-03-16 Thread Jonathan Rochkind
Oh, doc count over 100M is a very different thing than doc count about 1M. In your original message you said I tried creating an index with 1M documents, each with 100 unique terms in a field. If you instead have 100M documents, your use is a couple orders of magnitude larger than mine. It

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Toke, Thanks a lot for trying this out. I have to mention, that the facetted search hits only one specific shard by design, so in general the time to query a shard directly and through the proxy SOLR should be comparable. Would it be feasible for you to make that field ngram'ed or is it too

Re: faceting over ngrams

2011-03-16 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 8:05 AM, Dmitry Kan dmitry@gmail.com wrote: Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over the trigrams field with about 1 million of entries in the result set and more than 100 million of entries to facet on in the index. Currently the

Re: faceting over ngrams

2011-03-16 Thread Dmitry Kan
Hi Yonik, I have ran the queries against single index solr with only 16M documents. After attaching facet.method=fc the results seemed to come faster (first two queries below), but still not fast enough. Here are the fieldValueCache stats: (facet.limit=100facet.mincount=5facet.method=fc,