Shawn Heisey kirjoitti 17.9.2018 klo 19.03:
7.       If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is  start+100th index, for this case any
performance issue will be raised ?

Let's say that you send a request with these parameters, and the index has three shards:

start=10000000&rows=100

Every shard in the index is going to return a result to the coordinating node of ten million plus 100.  That's thirty million individual results.  The coordinating node will combine those results, sort them, and then request full documents for the 100 specific rows that were requested.  This takes a lot of time and a lot of memory.

What Shawn says above means that even if you give Solr a heap big enough to handle that, you'll run into serious performance issues even with a light load since the these huge allocations easily lead to stop-the-world garbage collections that kill performance. I've tried it and it was bad.

If you are thinking of a user interface that allows jumping to an arbitrary result page, you'll have to limit it to some sensible number of results (10 000 is probably safe, 100 000 may also work) or use something else than Solr. Cursor mark or streaming are great options, but only if you want to process all the records. Often the deep paging need is practically the need to see the last results, and that can also be achieved by allowing reverse sorting.

Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to