First of all, asking for that many rows will spend a lot of time
gathering the document fields. Assuming you have stored fields,
each doc requires
1> the aggregator node getting the candidate 100000 docs from each shard

2> The aggregator node sorting those 100000 docs from each shard into the true 
top 100000 based on the sort criteria (score by default)

3> the aggregator node going back to the shards and asking them for those docs 
of that 100000 that are resident on that shard

4> the aggregator node assembling the final docs to be sent to the client and 
sending them.

So my guess is that when you fire requests at a particular replica that has to 
get them from the other shard’s replica on another host, the network 
back-and-forth is killing your perf. It’s not that your network is having 
problems, just that you’re pushing a lot of data back and forth in your 
poorly-performing cases.

So first of all, specifying 100K rows is an anti-pattern. Outside of streaming, 
Solr is built on the presumption that you’re after the top few rows (< 100, 
say). The times vary a lot depending on whether you need to read stored fields 
BTW.

Second, I suspect your test is bogus. If you run the tests in the order you 
gave, the first one will read the necessary data from disk and probably have it 
in the OS disk cache for the second and subsequent. And/or you’re getting 
results from your queryResultCache (although you’d have to have a big one). 
Specifying the exact same query when trying to time things is usually a mistake.

If your use-case requires 100K rows, you should be using streaming or 
cursorMark. While that won’t make the end-to-end time shorter, but won’t put 
such a strain on the system.

Best,
Erick

> On May 27, 2020, at 10:38 AM, Anshuman Singh <singhanshuma...@gmail.com> 
> wrote:
> 
> I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
> shards on two different nodes. There are 4M records equally distributed
> across the shards.
> 
> If I query the collection like below, it is slow.
> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
> QTime: 6930
> 
> If I query a particular shard like below, it is also slow.
> http://localhost:8983/solr/*test_shard1_replica_n2*
> /select?q=*:*&rows=100000&shards=*shard2*
> QTime: 5494
> *Notice shard2 in shards parameter and shard1 in the core being queried.*
> 
> But this is faster:
> http://localhost:8983/solr/*test_shard1_replica_n2*
> /select?q=*:*&rows=100000&shards=*shard1*
> QTime: 57
> 
> This is also faster:
> http://localhost:8983/solr/*test_shard2_replica_n4*
> /select?q=*:*&rows=100000&shards=*shard2*
> QTime: 71
> 
> I don't think it is the network as I performed similar tests with a single
> node setup as well. If you query a particular core and the corresponding
> logical shard, it is much faster than querying a different shard or core.
> 
> Why is this behaviour? How to make the first two queries work as fast as
> the last two queries?

Reply via email to