Shawn Heisey kirjoitti 17.9.2018 klo 19.03:
7. If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is start+100th index, for this case
any
performance issue will be raised ?
Let's say that you send a request with these parameters, and the index
has three shards:
start=10000000&rows=100
Every shard in the index is going to return a result to the coordinating
node of ten million plus 100. That's thirty million individual
results. The coordinating node will combine those results, sort them,
and then request full documents for the 100 specific rows that were
requested. This takes a lot of time and a lot of memory.
What Shawn says above means that even if you give Solr a heap big enough
to handle that, you'll run into serious performance issues even with a
light load since the these huge allocations easily lead to
stop-the-world garbage collections that kill performance. I've tried it
and it was bad.
If you are thinking of a user interface that allows jumping to an
arbitrary result page, you'll have to limit it to some sensible number
of results (10 000 is probably safe, 100 000 may also work) or use
something else than Solr. Cursor mark or streaming are great options,
but only if you want to process all the records. Often the deep paging
need is practically the need to see the last results, and that can also
be achieved by allowing reverse sorting.
Regards,
Ere
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland