Hi Ming,

which Solr version are you using? In case you use one of the latest
versions (4.5 or above) try the new parameter facet.threads with a
reasonable value (4 to 8 gave me a massive performance speedup when
working with large facets, i.e. nTerms >> 10^7).

-Sascha


Mingfeng Yang wrote:
> I have an index with 170M documents, and two of the fields for each
> doc is "source" and "url".  And I want to know the top 500 most
> frequent urls from Video source.
> 
> So I did a facet with 
> "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and
> the matching documents are about 9 millions.
> 
> The solr cluster is hosted on two ec2 instances each with 4 cpu, and
> 32G memory. 16G is allocated tfor java heap.  4 master shards on one
> machine, and 4 replica on another machine. Connected together via
> zookeeper.
> 
> Whenever I did the query above, the response is just taking too long
> and the client will get timed out. Sometimes,  when the end user is
> impatient, so he/she may wait for a few second for the results, and
> then kill the connection, and then issue the same query again and
> again.  Then the server will have to deal with multiple such heavy
> queries simultaneously and being so busy that we got "no server
> hosting shard" error, probably due to lost communication between solr
> node and zookeeper.
> 
> Is there any way to deal with such problem?
> 
> Thanks, Ming
> 

Reply via email to