Hi,

I have a question regarding solr /export handler. Here is the scenario -
I want to use the /export handler - I only need sorted data and this is the
fastest way to get it. I am doing multiple level joins using streams using
/export handler. I know the number of top level records to be retrieved but
not for each individual stream rolling up to the final result.
I observed that calling close() on a /export stream is too expensive. It
reads the stream to the very end of hits. Assuming there are 100 million
hits for each stream ,first 1k records were found after joins and we call
close() after that, it would take many minutes/hours to finish it.
Currently I have put close() call in a different thread - basically fire
and forget. But the cluster is very strained because of the unneccessary
reads.

Internally streaming uses ChunkedInputStream of HttpClient and it has to be
drained in the close() call. But from server point of view, it should stop
sending more data once close() has been issued.
There is a read() call in close() method of ChunkedInputStream that is
indistinguishable from real read(). If /export handler stops sending more
data after close it would be very useful.

Another option would be to use /select handler and get into business of
managing a custom cursor mark that is based on the stream sort and is reset
until it fetches the required records at topmost level.

Any thoughts.

Thanks,
Susmit

Reply via email to