Mikhail, Yes, +1. This question comes up a few times a year. Grant created a JIRA issue for this many moons ago.
https://issues.apache.org/jira/browse/LUCENE-2127 https://issues.apache.org/jira/browse/SOLR-1726 Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Wed, Jul 24, 2013 at 9:58 PM, Mikhail Khludnev <mkhlud...@griddynamics.com> wrote: > fwiw, > i did some prototype with the following differences: > - it streams straight to the socket output stream > - it streams on-going during collecting, without necessity to store a > bitset. > It might have some limited extreme usage. Is there anyone interested? > > > On Wed, Jul 24, 2013 at 7:19 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > >> On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber <mlie...@impetus.com> wrote: >> >> > That sounds like a satisfactory solution for the time being - >> > I am assuming you dump the data from Solr in a csv format? >> > >> >> JSON >> >> >> > How did you implement the streaming processor ? (what tool did you use >> for >> > this? Not familiar with that) >> > >> >> this is what dumps the docs: >> >> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/response/JSONDumper.java >> >> it is called by one of our batch processors, which can pass it a bitset of >> recs >> >> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchProviderDumpIndex.java >> >> as far as streaming is concerned, we were all very nicely surprised, a few >> GB file (on local network) took ridiculously short time - in fact, a >> colleague of mine was assuming it is not working, until we looked into the >> downloaded file ;-), you may want to look at line 463 >> >> https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/handler/batch/BatchHandler.java >> >> roman >> >> >> > You say it takes a few minutes only to dump the data - how long does it >> to >> > stream it back in, are performances acceptable (~ within minutes) ? >> > >> > Thanks, >> > Matt >> > >> > On 7/23/13 6:57 PM, "Roman Chyla" <roman.ch...@gmail.com> wrote: >> > >> > >Hello Matt, >> > > >> > >You can consider writing a batch processing handler, which receives a >> > >query >> > >and instead of sending results back, it writes them into a file which is >> > >then available for streaming (it has its own UUID). I am dumping many >> GBs >> > >of data from solr in few minutes - your query + streaming writer can go >> > >very long way :) >> > > >> > >roman >> > > >> > > >> > >On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber <mlie...@impetus.com> >> wrote: >> > > >> > >> Hello Solr users, >> > >> >> > >> Question regarding processing a lot of docs returned from a query; I >> > >> potentially have millions of documents returned back from a query. >> What >> > >>is >> > >> the common design to deal with this ? >> > >> >> > >> 2 ideas I have are: >> > >> - create a client service that is multithreaded to handled this >> > >> - Use the Solr "pagination" to retrieve a batch of rows at a time >> > >>("start, >> > >> rows" in Solr Admin console ) >> > >> >> > >> Any other ideas that I may be missing ? >> > >> >> > >> Thanks, >> > >> Matt >> > >> >> > >> >> > >> ________________________________ >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> NOTE: This message may contain information that is confidential, >> > >> proprietary, privileged or otherwise protected by law. The message is >> > >> intended solely for the named addressee. If received in error, please >> > >> destroy and notify the sender. Any use of this email is prohibited >> when >> > >> received in error. Impetus does not represent, warrant and/or >> guarantee, >> > >> that the integrity of this communication has been maintained nor that >> > >>the >> > >> communication is free of errors, virus, interception or interference. >> > >> >> > >> > >> > ________________________________ >> > >> > >> > >> > >> > >> > >> > NOTE: This message may contain information that is confidential, >> > proprietary, privileged or otherwise protected by law. The message is >> > intended solely for the named addressee. If received in error, please >> > destroy and notify the sender. Any use of this email is prohibited when >> > received in error. Impetus does not represent, warrant and/or guarantee, >> > that the integrity of this communication has been maintained nor that the >> > communication is free of errors, virus, interception or interference. >> > >> > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com>