Re: 20180917-Need Apache SOLR support

Ere Maijala Mon, 17 Sep 2018 19:18:02 -0700


Shawn Heisey kirjoitti 17.9.2018 klo 19.03:

7.       If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is start+100th index, for this caseany
performance issue will be raised ?
Let's say that you send a request with these parameters, and the indexhas three shards:
start=10000000&rows=100
Every shard in the index is going to return a result to the coordinatingnode of ten million plus 100. That's thirty million individualresults. The coordinating node will combine those results, sort them,and then request full documents for the 100 specific rows that wererequested. This takes a lot of time and a lot of memory.

What Shawn says above means that even if you give Solr a heap big enoughto handle that, you'll run into serious performance issues even with alight load since the these huge allocations easily lead tostop-the-world garbage collections that kill performance. I've tried itand it was bad.

If you are thinking of a user interface that allows jumping to anarbitrary result page, you'll have to limit it to some sensible numberof results (10 000 is probably safe, 100 000 may also work) or usesomething else than Solr. Cursor mark or streaming are great options,but only if you want to process all the records. Often the deep pagingneed is practically the need to see the last results, and that can alsobe achieved by allowing reverse sorting.


Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: 20180917-Need Apache SOLR support

Reply via email to