Hi, I have a question regarding solr /export handler. Here is the scenario - I want to use the /export handler - I only need sorted data and this is the fastest way to get it. I am doing multiple level joins using streams using /export handler. I know the number of top level records to be retrieved but not for each individual stream rolling up to the final result. I observed that calling close() on a /export stream is too expensive. It reads the stream to the very end of hits. Assuming there are 100 million hits for each stream ,first 1k records were found after joins and we call close() after that, it would take many minutes/hours to finish it. Currently I have put close() call in a different thread - basically fire and forget. But the cluster is very strained because of the unneccessary reads.
Internally streaming uses ChunkedInputStream of HttpClient and it has to be drained in the close() call. But from server point of view, it should stop sending more data once close() has been issued. There is a read() call in close() method of ChunkedInputStream that is indistinguishable from real read(). If /export handler stops sending more data after close it would be very useful. Another option would be to use /select handler and get into business of managing a custom cursor mark that is based on the stream sort and is reset until it fetches the required records at topmost level. Any thoughts. Thanks, Susmit