I have an index with 170M documents, and two of the fields for each doc is "source" and "url". And I want to know the top 500 most frequent urls from Video source.
So I did a facet with "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and the matching documents are about 9 millions. The solr cluster is hosted on two ec2 instances each with 4 cpu, and 32G memory. 16G is allocated tfor java heap. 4 master shards on one machine, and 4 replica on another machine. Connected together via zookeeper. Whenever I did the query above, the response is just taking too long and the client will get timed out. Sometimes, when the end user is impatient, so he/she may wait for a few second for the results, and then kill the connection, and then issue the same query again and again. Then the server will have to deal with multiple such heavy queries simultaneously and being so busy that we got "no server hosting shard" error, probably due to lost communication between solr node and zookeeper. Is there any way to deal with such problem? Thanks, Ming