Problem of facet on 170M documents

Mingfeng Yang Fri, 01 Nov 2013 23:02:31 -0700

I have an index with 170M documents, and two of the fields for each doc is
"source" and "url".  And I want to know the top 500 most frequent urls from
Video source.


So I did a facet with
 "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and the
matching documents are about 9 millions.

The solr cluster is hosted on two ec2 instances each with 4 cpu, and  32G
memory. 16G is allocated tfor java heap.  4 master shards on one machine,
and 4 replica on another machine. Connected together via zookeeper.

Whenever I did the query above, the response is just taking too long and
the client will get timed out. Sometimes,  when the end user is impatient,
so he/she may wait for a few second for the results, and then kill the
connection, and then issue the same query again and again.  Then the server
will have to deal with multiple such heavy queries simultaneously and
 being so busy that we got "no server hosting shard" error, probably due to
lost communication between solr node and zookeeper.

Is there any way to deal with such problem?

Thanks,
Ming

Problem of facet on 170M documents

Reply via email to