Re: faceting over ngrams

Dmitry Kan Wed, 16 Mar 2011 13:51:28 -0700

Hi Yonik,

I have ran the queries against single index solr with only 16M documents.
After attaching facet.method=fc the results seemed to come faster (first two
queries below), but still not fast enough.


Here are the fieldValueCache stats:

(facet.limit=1000000&facet.mincount=5&facet.method=fc, 542094 hits, 1 min)
--> smallest result set

*name: *fieldValueCache  *class: *org.apache.solr.search.FastLRUCache  *
version: *1.0  *description: *Concurrent LRU Cache(maxSize=10000,
initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false)  *
stats: *lookups : 400
hits : 396
hitratio : 0.99
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 400
cumulative_hits : 396
cumulative_hitratio : 0.99
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=1786355392,tindexSize=17977426,time=662387,phase1=654707,nTerms=53492050,bigTerms=38,termInstances=602090958,uses=397}

(facet.limit=1000000&facet.mincount=5&facet.method=fc, 2837589 hits, 3 min 8
s) --> largest result set

*name: *fieldValueCache  *class: *org.apache.solr.search.FastLRUCache  *
version: *1.0  *description: *Concurrent LRU Cache(maxSize=10000,
initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false)  *
stats: *lookups : 401
hits : 397
hitratio : 0.99
inserts : 1
evictions : 0
size : 1
warmupTime : 0
cumulative_lookups : 401
cumulative_hits : 397
cumulative_hitratio : 0.99
cumulative_inserts : 1
cumulative_evictions : 0
item_shingleContent_trigram :
{field=shingleContent_trigram,memSize=1786355392,tindexSize=17977426,time=662387,phase1=654707,nTerms=53492050,bigTerms=38,termInstances=602090958,uses=398}


On Wed, Mar 16, 2011 at 9:46 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Wed, Mar 16, 2011 at 8:05 AM, Dmitry Kan <dmitry....@gmail.com> wrote:
> > Hello guys. We are using shard'ed solr 1.4 for heavy faceted search over
> the
> > trigrams field with about 1 million of entries in the result set and more
> > than 100 million of entries to facet on in the index. Currently the
> faceted
> > search is very slow, taking about 5 minutes per query. Would running on a
> > cloud with Hadoop make it faster (to seconds) as faceting seems to be a
> > natural map-reduce task?
>
> How many indexed tokens does each document have (for the field you are
> faceting on) on average?
> How many unique tokens are indexed in that field over the complete index?
>
> Or you could go to the admin/stats page and cut-n-paste the
> fieldValueCache entry after your faceting request - it should contain
> most of the info to further analyze this.
>
> -Yonik
> http://lucidimagination.com
>



-- 
Regards,

Dmitry Kan

Re: faceting over ngrams

Reply via email to