Hi Joel, I was on solr 6.3 branch. I see HttpClient deprecated methods are all fixed in master. I had forgot to mention that I used a custom SolrClientCache to have higher limits for maxConnectionPerHost settings thats why I saw difference in behavior. SolrClientCache also looks configurable with a new constructor on master branch.
I guess it is all good going forward on master. Thanks, Susmit On Tue, Jun 27, 2017 at 10:14 AM, Joel Bernstein <joels...@gmail.com> wrote: > Ok, I see where it's not set the stream context. This needs to be fixed. > > I'm curious about where you're seeing deprecated methods in the > HttpClientUtil? I was reviewing the master version of HttpClientUtil and > didn't see any deprecations in my IDE. > > I'm wondering if you're using an older version of HttpClientUtil then I > used when I was testing SOLR-10698? > > You also mentioned that the SolrStream and the SolrClientCache were using > the same approach to create the client. In that case changing the > ParallelStream to set the streamContext shouldn't have any effect on the > close() issue. > > > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Sun, Jun 25, 2017 at 10:48 AM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > > > Hi Joel, > > > > Looked at the fix for SOLR-10698, there could be 2 potential issues > > > > - Parallel Stream does not set stream context on newly created > SolrStreams > > in open() method. > > > > - This results in creation of new uncached HttpSolrClient in open() > method > > of SolrStream. This client is created using deprecated methods of http > > client library (HttpClientUtil.createClient) and behaves differently on > > close() than the one created using HttpClientBuilder API. SolrClientCache > > too uses the same deprecated API > > > > This test case shows the problem > > > > ParallelStream ps = new parallelStream(tupleStream,...) > > > > while(true){ > > > > read(); > > > > break after 2 iterations > > > > } > > > > ps.close() > > > > //close() reads through the end of tupleStream. > > > > I tried with HttpClient created by *org**.**apache**.**http**.**impl**.* > > *client**.HttpClientBuilder.create()* and close() is working for that. > > > > > > Thanks, > > > > Susmit > > > > On Wed, May 17, 2017 at 7:33 AM, Susmit Shukla <shukla.sus...@gmail.com> > > wrote: > > > > > Thanks Joel, will try that. > > > Binary response would be more performant. > > > I observed the server sends responses in 32 kb chunks and the client > > reads > > > it with 8 kb buffer on inputstream. I don't know if changing that can > > > impact anything on performance. Even if buffer size is increased on > > > httpclient, it can't override the hardcoded 8kb buffer on > > > sun.nio.cs.StreamDecoder > > > > > > Thanks, > > > Susmit > > > > > > On Wed, May 17, 2017 at 5:49 AM, Joel Bernstein <joels...@gmail.com> > > > wrote: > > > > > >> Susmit, > > >> > > >> You could wrap a LimitStream around the outside of all the relational > > >> algebra. For example: > > >> > > >> parallel(limit((intersect(intersect(search, search), union(search, > > >> search))))) > > >> > > >> In this scenario the limit would happen on the workers. > > >> > > >> As far as the worker/replica ratio. This will depend on how heavy the > > >> export is. If it's a light export, small number of fields, mostly > > numeric, > > >> simple sort params, then I've seen a ratio of 5 (workers) to 1 > (replica) > > >> work well. This will basically saturate the CPU on the replica. But > > >> heavier > > >> exports will saturate the replicas with fewer workers. > > >> > > >> Also I tend to use Direct DocValues to get the best performance. I'm > not > > >> sure how much difference this makes, but it should eliminate the > > >> compression overhead fetching the data from the DocValues. > > >> > > >> Varun's suggestion of using the binary transport will provide a nice > > >> performance increase as well. But you'll need to upgrade. You may need > > to > > >> do that anyway as the fix on the early stream close will be on a later > > >> version that was refactored to support the binary transport. > > >> > > >> Joel Bernstein > > >> http://joelsolr.blogspot.com/ > > >> > > >> On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein <joels...@gmail.com> > > >> wrote: > > >> > > >> > Yep, saw it. I'll comment on the ticket for what I believe needs to > be > > >> > done. > > >> > > > >> > Joel Bernstein > > >> > http://joelsolr.blogspot.com/ > > >> > > > >> > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker <va...@vthacker.in> > > >> wrote: > > >> > > > >> >> Hi Joel,Susmit > > >> >> > > >> >> I created https://issues.apache.org/jira/browse/SOLR-10698 to > track > > >> the > > >> >> issue > > >> >> > > >> >> @Susmit looking at the stack trace I see the expression is using > > >> >> JSONTupleStream > > >> >> . I wonder if you tried using JavabinTupleStreamParser could it > help > > >> >> improve performance ? > > >> >> > > >> >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla < > > >> shukla.sus...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > Hi Joel, > > >> >> > > > >> >> > queries can be arbitrarily nested with AND/OR/NOT joins e.g. > > >> >> > > > >> >> > (intersect(intersect(search, search), union(search, search))). > If I > > >> cut > > >> >> off > > >> >> > the innermost stream with a limit, the complete intersection > would > > >> not > > >> >> > happen at upper levels. Also would the limit stream have same > > effect > > >> as > > >> >> > using /select handler with rows parameter? > > >> >> > I am trying to force input stream close through reflection, just > to > > >> see > > >> >> if > > >> >> > it gives performance gains. > > >> >> > > > >> >> > 2) would experiment with null streams. Is workers = number of > > >> replicas > > >> >> in > > >> >> > data collection a good thumb rule? is parallelstream performance > > >> upper > > >> >> > bounded by number of replicas? > > >> >> > > > >> >> > Thanks, > > >> >> > Susmit > > >> >> > > > >> >> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein < > > joels...@gmail.com> > > >> >> > wrote: > > >> >> > > > >> >> > > Your approach looks OK. The single sharded worker collection is > > >> only > > >> >> > needed > > >> >> > > if you were using CloudSolrStream to send the initial Streaming > > >> >> > Expression > > >> >> > > to the /stream handler. You are not doing this, so you're > > approach > > >> is > > >> >> > fine. > > >> >> > > > > >> >> > > Here are some thoughts on what you described: > > >> >> > > > > >> >> > > 1) If you are closing the parallel stream after the top 1000 > > >> results, > > >> >> > then > > >> >> > > try wrapping the intersect in a LimitStream. This stream > doesn't > > >> exist > > >> >> > yet > > >> >> > > so it will be a custom stream. The LimitStream can return the > EOF > > >> >> tuple > > >> >> > > after it reads N tuples. This will cause the worker nodes to > > close > > >> the > > >> >> > > underlying stream and cause the Broken Pipe exception to occur > at > > >> the > > >> >> > > /export handler, which will stop the /export. > > >> >> > > > > >> >> > > Here is the basic approach: > > >> >> > > > > >> >> > > parallel(limit(intersect(search, search))) > > >> >> > > > > >> >> > > > > >> >> > > 2) It can be tricky to understand where the bottleneck lies > when > > >> using > > >> >> > the > > >> >> > > ParallelStream for parallel relational algebra. You can use the > > >> >> > NullStream > > >> >> > > to get an understanding of why performance is not increasing > when > > >> you > > >> >> > > increase the workers. Here is the basic approach: > > >> >> > > > > >> >> > > parallel(null(intersect(search, search))) > > >> >> > > > > >> >> > > The NullStream will eat all the tuples on the workers and > return > > a > > >> >> single > > >> >> > > tuple with the tuple count and the time taken to run the > > >> expression. > > >> >> So > > >> >> > > you'll get one tuple from each worker. This will eliminate any > > >> >> bottleneck > > >> >> > > on tuples returning through the ParallelStream and you can > focus > > on > > >> >> the > > >> >> > > performance of the intersect and the /export handler. > > >> >> > > > > >> >> > > Then experiment with: > > >> >> > > > > >> >> > > 1) Increasing the number of parallel workers. > > >> >> > > 2) Increasing the number of replicas in the data collections. > > >> >> > > > > >> >> > > And watch the timing information coming back from the > NullStream > > >> >> tuples. > > >> >> > If > > >> >> > > increasing the workers is not improving performance then the > > >> >> bottleneck > > >> >> > may > > >> >> > > be in the /export handler. So try increasing replicas and see > if > > >> that > > >> >> > > improves performance. Different partitions of the streams will > be > > >> >> served > > >> >> > by > > >> >> > > different replicas. > > >> >> > > > > >> >> > > If performance doesn't improve with the NullStream after > > increasing > > >> >> both > > >> >> > > workers and replicas then we know the bottleneck is the > network. > > >> >> > > > > >> >> > > Joel Bernstein > > >> >> > > http://joelsolr.blogspot.com/ > > >> >> > > > > >> >> > > On Mon, May 15, 2017 at 10:37 PM, Susmit Shukla < > > >> >> shukla.sus...@gmail.com > > >> >> > > > > >> >> > > wrote: > > >> >> > > > > >> >> > > > Hi Joel, > > >> >> > > > > > >> >> > > > Regarding the implementation, I am wrapping the topmost > > >> TupleStream > > >> >> in > > >> >> > a > > >> >> > > > ParallelStream and execute it on the worker cluster (one of > the > > >> >> joined > > >> >> > > > cluster doubles up as worker cluster). ParallelStream does > > submit > > >> >> the > > >> >> > > query > > >> >> > > > to /stream handler. > > >> >> > > > for #2, for e.g. I am creating 2 CloudSolrStreams , wrapping > > >> them in > > >> >> > > > IntersectStream and wrapping that in ParallelStream and > reading > > >> out > > >> >> the > > >> >> > > > tuples from parallel stream. close() is called on > > >> parallelStream. I > > >> >> do > > >> >> > > have > > >> >> > > > custom streams but that is similar to intersectStream. > > >> >> > > > I am on solr 6.3.1 > > >> >> > > > The 2 solr clusters serving the join queries are having many > > >> shards. > > >> >> > > Worker > > >> >> > > > collection is also multi sharded and is one from the main > > >> clusters, > > >> >> so > > >> >> > do > > >> >> > > > you imply I should be using a single sharded "worker" > > collection? > > >> >> Would > > >> >> > > the > > >> >> > > > joins execute faster? > > >> >> > > > On a side note, increasing the workers beyond 1 was not > > improving > > >> >> the > > >> >> > > > execution times but was degrading if number was 3 and above. > > >> That is > > >> >> > > > counter intuitive since the joins are huge and putting more > > >> workers > > >> >> > > should > > >> >> > > > have improved the performance. > > >> >> > > > > > >> >> > > > Thanks, > > >> >> > > > Susmit > > >> >> > > > > > >> >> > > > > > >> >> > > > On Mon, May 15, 2017 at 6:47 AM, Joel Bernstein < > > >> joels...@gmail.com > > >> >> > > > >> >> > > > wrote: > > >> >> > > > > > >> >> > > > > Ok please do report any issues you run into. This is quite > a > > >> good > > >> >> bug > > >> >> > > > > report. > > >> >> > > > > > > >> >> > > > > I reviewed the code and I believe I see the problem. The > > >> problem > > >> >> > seems > > >> >> > > to > > >> >> > > > > be that output code from the /stream handler is not > properly > > >> >> > accounting > > >> >> > > > for > > >> >> > > > > client disconnects and closing the underlying stream. What > I > > >> see > > >> >> in > > >> >> > the > > >> >> > > > > code is that exceptions coming from read() in the stream do > > >> >> > > automatically > > >> >> > > > > close the underlying stream. But exceptions from the > writing > > of > > >> >> the > > >> >> > > > stream > > >> >> > > > > do not close the stream. This needs to be fixed. > > >> >> > > > > > > >> >> > > > > A few questions about your streaming implementation: > > >> >> > > > > > > >> >> > > > > 1) Are you sending requests to the /stream handler? Or are > > you > > >> >> > > embedding > > >> >> > > > > CloudSolrStream in your application and bypassing the > /stream > > >> >> > handler? > > >> >> > > > > > > >> >> > > > > 2) If you're sending Streaming Expressions to the stream > > >> handler > > >> >> are > > >> >> > > you > > >> >> > > > > using SolrStream or CloudSolrStream to send the expression? > > >> >> > > > > > > >> >> > > > > 3) What version of Solr are you using. > > >> >> > > > > > > >> >> > > > > 4) Have you implemented any custom streams? > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > #2 is an important question. If you're sending expressions > to > > >> the > > >> >> > > /stream > > >> >> > > > > handler using CloudSolrStream the collection running the > > >> >> expression > > >> >> > > would > > >> >> > > > > have to be setup a specific way. The collection running the > > >> >> > expression > > >> >> > > > will > > >> >> > > > > have to be a* single shard collection*. You can have as > many > > >> >> replicas > > >> >> > > as > > >> >> > > > > you want but only one shard. That's because CloudSolrStream > > >> picks > > >> >> one > > >> >> > > > > replica in each shard to forward the request to then merges > > the > > >> >> > results > > >> >> > > > > from the shards. So if you send in an expression using > > >> >> > CloudSolrStream > > >> >> > > > that > > >> >> > > > > expression will be sent to each shard to be run and each > > shard > > >> >> will > > >> >> > be > > >> >> > > > > duplicating the work and return duplicate results. > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > Joel Bernstein > > >> >> > > > > http://joelsolr.blogspot.com/ > > >> >> > > > > > > >> >> > > > > On Sat, May 13, 2017 at 7:03 PM, Susmit Shukla < > > >> >> > > shukla.sus...@gmail.com> > > >> >> > > > > wrote: > > >> >> > > > > > > >> >> > > > > > Thanks Joel > > >> >> > > > > > Streaming is awesome, just had a huge implementation in > my > > >> >> > project. I > > >> >> > > > > found > > >> >> > > > > > out a couple more issues with streaming and did local > hacks > > >> for > > >> >> > them, > > >> >> > > > > would > > >> >> > > > > > raise them too. > > >> >> > > > > > > > >> >> > > > > > On Sat, May 13, 2017 at 2:09 PM, Joel Bernstein < > > >> >> > joels...@gmail.com> > > >> >> > > > > > wrote: > > >> >> > > > > > > > >> >> > > > > > > Ah, then this is unexpected behavior. Can you open a > > ticket > > >> >> for > > >> >> > > this? > > >> >> > > > > > > > > >> >> > > > > > > Joel Bernstein > > >> >> > > > > > > http://joelsolr.blogspot.com/ > > >> >> > > > > > > > > >> >> > > > > > > On Sat, May 13, 2017 at 2:51 PM, Susmit Shukla < > > >> >> > > > > shukla.sus...@gmail.com> > > >> >> > > > > > > wrote: > > >> >> > > > > > > > > >> >> > > > > > > > Hi Joel, > > >> >> > > > > > > > > > >> >> > > > > > > > I was using CloudSolrStream for the above test. Below > > is > > >> the > > >> >> > call > > >> >> > > > > > stack. > > >> >> > > > > > > > > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.http.impl.io.ChunkedInputStream.read( > > >> >> > > > > > > > ChunkedInputStream.java:215) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.http.impl.io.ChunkedInputStream.close( > > >> >> > > > > > > > ChunkedInputStream.java:316) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.http.impl.execchain.ResponseEntityProxy. > > >> >> > streamClosed( > > >> >> > > > > > > > ResponseEntityProxy.java:128) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.http.conn.EofSensorInputStream. > checkClose( > > >> >> > > > > > > > EofSensorInputStream.java:228) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.http.conn.EofSensorInputStream.close( > > >> >> > > > > > > > EofSensorInputStream.java:174) > > >> >> > > > > > > > at sun.nio.cs.StreamDecoder.implC > > >> >> lose(StreamDecoder.java:378) > > >> >> > > > > > > > at sun.nio.cs.StreamDecoder.close > > >> (StreamDecoder.java:193) > > >> >> > > > > > > > at java.io.InputStreamReader.clos > > >> >> e(InputStreamReader.java:199) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.solr.client.solrj. > > io.stream.JSONTupleStream. > > >> >> > > > > > > > close(JSONTupleStream.java:91) > > >> >> > > > > > > > at > > >> >> > > > > > > > org.apache.solr.client.solrj. > > io.stream.SolrStream.close( > > >> >> > > > > > > > SolrStream.java:186) > > >> >> > > > > > > > > > >> >> > > > > > > > Thanks, > > >> >> > > > > > > > Susmit > > >> >> > > > > > > > > > >> >> > > > > > > > On Sat, May 13, 2017 at 10:48 AM, Joel Bernstein < > > >> >> > > > joels...@gmail.com > > >> >> > > > > > > > >> >> > > > > > > > wrote: > > >> >> > > > > > > > > > >> >> > > > > > > > > I was just reading the Java docs on the > > >> >> ChunkedInputStream. > > >> >> > > > > > > > > > > >> >> > > > > > > > > "Note that this class NEVER closes the underlying > > >> stream" > > >> >> > > > > > > > > > > >> >> > > > > > > > > In that scenario the /export would indeed continue > to > > >> send > > >> >> > > data. > > >> >> > > > I > > >> >> > > > > > > think > > >> >> > > > > > > > we > > >> >> > > > > > > > > can consider this an anti-pattern for the /export > > >> handler > > >> >> > > > > currently. > > >> >> > > > > > > > > > > >> >> > > > > > > > > I would suggest using one of the Streaming Clients > to > > >> >> connect > > >> >> > > to > > >> >> > > > > the > > >> >> > > > > > > > export > > >> >> > > > > > > > > handler. Either CloudSolrStream or SolrStream will > > both > > >> >> > > interact > > >> >> > > > > with > > >> >> > > > > > > the > > >> >> > > > > > > > > /export handler in a the way that it expects. > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > Joel Bernstein > > >> >> > > > > > > > > http://joelsolr.blogspot.com/ > > >> >> > > > > > > > > > > >> >> > > > > > > > > On Sat, May 13, 2017 at 12:28 PM, Susmit Shukla < > > >> >> > > > > > > shukla.sus...@gmail.com > > >> >> > > > > > > > > > > >> >> > > > > > > > > wrote: > > >> >> > > > > > > > > > > >> >> > > > > > > > > > Hi Joel, > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > I did not observe that. On calling close() on > > >> stream, it > > >> >> > > cycled > > >> >> > > > > > > through > > >> >> > > > > > > > > all > > >> >> > > > > > > > > > the hits that /export handler calculated. > > >> >> > > > > > > > > > e.g. with a *:* query and export handler on a > 100k > > >> >> document > > >> >> > > > > index, > > >> >> > > > > > I > > >> >> > > > > > > > > could > > >> >> > > > > > > > > > see the 100kth record printed on the http wire > > debug > > >> log > > >> >> > > > although > > >> >> > > > > > > close > > >> >> > > > > > > > > was > > >> >> > > > > > > > > > called after reading 1st tuple. The time taken > for > > >> the > > >> >> > > > operation > > >> >> > > > > > with > > >> >> > > > > > > > > > close() call was same as that if I had read all > the > > >> 100k > > >> >> > > > tuples. > > >> >> > > > > > > > > > As I have pointed out, close() on underlying > > >> >> > > ChunkedInputStream > > >> >> > > > > > calls > > >> >> > > > > > > > > > read() and solr server has probably no way to > > >> >> distinguish > > >> >> > it > > >> >> > > > from > > >> >> > > > > > > > read() > > >> >> > > > > > > > > > happening from regular tuple reads.. > > >> >> > > > > > > > > > I think there should be an abort() API for solr > > >> streams > > >> >> > that > > >> >> > > > > hooks > > >> >> > > > > > > into > > >> >> > > > > > > > > > httpmethod.abort() . That would enable client to > > >> >> disconnect > > >> >> > > > early > > >> >> > > > > > and > > >> >> > > > > > > > > > probably that would disconnect the underlying > > socket > > >> so > > >> >> > there > > >> >> > > > > would > > >> >> > > > > > > be > > >> >> > > > > > > > no > > >> >> > > > > > > > > > leaks. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Thanks, > > >> >> > > > > > > > > > Susmit > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > On Sat, May 13, 2017 at 7:42 AM, Joel Bernstein < > > >> >> > > > > > joels...@gmail.com> > > >> >> > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > If the client closes the connection to the > export > > >> >> handler > > >> >> > > > then > > >> >> > > > > > this > > >> >> > > > > > > > > > > exception will occur automatically on the > server. > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Joel Bernstein > > >> >> > > > > > > > > > > http://joelsolr.blogspot.com/ > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > On Sat, May 13, 2017 at 1:46 AM, Susmit Shukla > < > > >> >> > > > > > > > > shukla.sus...@gmail.com> > > >> >> > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > Hi Joel, > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > Thanks for the insight. How can this > exception > > be > > >> >> > > > > thrown/forced > > >> >> > > > > > > > from > > >> >> > > > > > > > > > > client > > >> >> > > > > > > > > > > > side. Client can't do a System.exit() as it > is > > >> >> running > > >> >> > > as a > > >> >> > > > > > > webapp. > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > Thanks, > > >> >> > > > > > > > > > > > Susmit > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > On Fri, May 12, 2017 at 4:44 PM, Joel > > Bernstein < > > >> >> > > > > > > > joels...@gmail.com> > > >> >> > > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > In this scenario the /export handler > > continues > > >> to > > >> >> > > export > > >> >> > > > > > > results > > >> >> > > > > > > > > > until > > >> >> > > > > > > > > > > it > > >> >> > > > > > > > > > > > > encounters a "Broken Pipe" exception. This > > >> >> exception > > >> >> > is > > >> >> > > > > > trapped > > >> >> > > > > > > > and > > >> >> > > > > > > > > > > > ignored > > >> >> > > > > > > > > > > > > rather then logged as it's not considered > an > > >> >> > exception > > >> >> > > if > > >> >> > > > > the > > >> >> > > > > > > > > client > > >> >> > > > > > > > > > > > > disconnects early. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Joel Bernstein > > >> >> > > > > > > > > > > > > http://joelsolr.blogspot.com/ > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Fri, May 12, 2017 at 2:10 PM, Susmit > > Shukla > > >> < > > >> >> > > > > > > > > > > shukla.sus...@gmail.com> > > >> >> > > > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Hi, > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > I have a question regarding solr /export > > >> >> handler. > > >> >> > > Here > > >> >> > > > is > > >> >> > > > > > the > > >> >> > > > > > > > > > > scenario > > >> >> > > > > > > > > > > > - > > >> >> > > > > > > > > > > > > > I want to use the /export handler - I > only > > >> need > > >> >> > > sorted > > >> >> > > > > data > > >> >> > > > > > > and > > >> >> > > > > > > > > > this > > >> >> > > > > > > > > > > is > > >> >> > > > > > > > > > > > > the > > >> >> > > > > > > > > > > > > > fastest way to get it. I am doing > multiple > > >> level > > >> >> > > joins > > >> >> > > > > > using > > >> >> > > > > > > > > > streams > > >> >> > > > > > > > > > > > > using > > >> >> > > > > > > > > > > > > > /export handler. I know the number of top > > >> level > > >> >> > > records > > >> >> > > > > to > > >> >> > > > > > be > > >> >> > > > > > > > > > > retrieved > > >> >> > > > > > > > > > > > > but > > >> >> > > > > > > > > > > > > > not for each individual stream rolling up > > to > > >> the > > >> >> > > final > > >> >> > > > > > > result. > > >> >> > > > > > > > > > > > > > I observed that calling close() on a > > /export > > >> >> stream > > >> >> > > is > > >> >> > > > > too > > >> >> > > > > > > > > > expensive. > > >> >> > > > > > > > > > > > It > > >> >> > > > > > > > > > > > > > reads the stream to the very end of hits. > > >> >> Assuming > > >> >> > > > there > > >> >> > > > > > are > > >> >> > > > > > > > 100 > > >> >> > > > > > > > > > > > million > > >> >> > > > > > > > > > > > > > hits for each stream ,first 1k records > were > > >> >> found > > >> >> > > after > > >> >> > > > > > joins > > >> >> > > > > > > > and > > >> >> > > > > > > > > > we > > >> >> > > > > > > > > > > > call > > >> >> > > > > > > > > > > > > > close() after that, it would take many > > >> >> > minutes/hours > > >> >> > > to > > >> >> > > > > > > finish > > >> >> > > > > > > > > it. > > >> >> > > > > > > > > > > > > > Currently I have put close() call in a > > >> different > > >> >> > > > thread - > > >> >> > > > > > > > > basically > > >> >> > > > > > > > > > > > fire > > >> >> > > > > > > > > > > > > > and forget. But the cluster is very > > strained > > >> >> > because > > >> >> > > of > > >> >> > > > > the > > >> >> > > > > > > > > > > > unneccessary > > >> >> > > > > > > > > > > > > > reads. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Internally streaming uses > > ChunkedInputStream > > >> of > > >> >> > > > > HttpClient > > >> >> > > > > > > and > > >> >> > > > > > > > it > > >> >> > > > > > > > > > has > > >> >> > > > > > > > > > > > to > > >> >> > > > > > > > > > > > > be > > >> >> > > > > > > > > > > > > > drained in the close() call. But from > > server > > >> >> point > > >> >> > of > > >> >> > > > > view, > > >> >> > > > > > > it > > >> >> > > > > > > > > > should > > >> >> > > > > > > > > > > > > stop > > >> >> > > > > > > > > > > > > > sending more data once close() has been > > >> issued. > > >> >> > > > > > > > > > > > > > There is a read() call in close() method > of > > >> >> > > > > > > ChunkedInputStream > > >> >> > > > > > > > > that > > >> >> > > > > > > > > > > is > > >> >> > > > > > > > > > > > > > indistinguishable from real read(). If > > >> /export > > >> >> > > handler > > >> >> > > > > > stops > > >> >> > > > > > > > > > sending > > >> >> > > > > > > > > > > > more > > >> >> > > > > > > > > > > > > > data after close it would be very useful. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Another option would be to use /select > > >> handler > > >> >> and > > >> >> > > get > > >> >> > > > > into > > >> >> > > > > > > > > > business > > >> >> > > > > > > > > > > of > > >> >> > > > > > > > > > > > > > managing a custom cursor mark that is > based > > >> on > > >> >> the > > >> >> > > > stream > > >> >> > > > > > > sort > > >> >> > > > > > > > > and > > >> >> > > > > > > > > > is > > >> >> > > > > > > > > > > > > reset > > >> >> > > > > > > > > > > > > > until it fetches the required records at > > >> topmost > > >> >> > > level. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Any thoughts. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Thanks, > > >> >> > > > > > > > > > > > > > Susmit > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >