Toke, won't be able to use TermsComponent as i had complex filter criteria on other fields.
Michael, i understood your idea of paging without using start=, will prototype it as it is possible in my usecase also and post here results i got with this approach. On Sun, Jan 18, 2015 at 10:05 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > You can also implement your own cursor easily enough if you have a unique > sortkey (not relevance score). Say you can sort by id, then you select > batch 1 (50k docs, say) and record the last (maximum) id in the batch. For > the next batch, limit it to id > last_id and get the first 50k docs (don't > use start= for paging). This scales much better when scanning a large > result set; you'll get constant time across the whole set instead of having > it increase as you page deeper. > > -Mike > > > On 1/18/2015 7:45 AM, Naresh Yadav wrote: > >> Hi Toke, >> >> Thanks for sharing solr internal's for my problem. I will definitely try >> Cursor also but only problem is my current >> solr version is 4.6.1 in which i guess cursor support is not there. Any >> other option i have for this problem ?? >> >> Also as per your suggestion i will try to avoid regional units in post. >> >> Thanks >> Naresh >> >> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <t...@statsbiblioteket.dk> >> wrote: >> >> Naresh Yadav [nyadav....@gmail.com] wrote: >>> >>>> In both setups, we are reading in batches of 50k and each batch taking >>>> Setup1 : approx 7 seconds and for completing all batches of total 10 >>>> >>> lakh >>> >>>> results takes 1 to 2 minutes. >>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10 >>>> >>> lakh >>> >>>> results takes 114 minutes. >>>> >>> Deep paging across shards without cursors means that for each request, >>> the >>> full result set up to that point must be requested from each shard. The >>> deeper your page, the longer it takes for each request. If you only >>> extracted 500K results instead of the 1M in setup 2, it would likely >>> take a >>> lot less than 114/2 minutes. >>> >>> Since you are exporting the full result set, you should be using a >>> cursor: >>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results >>> This should make your extraction linear to the number of documents and >>> hopefully a lot faster than your current setup. >>> >>> Also, please refrain from using regional units such as "lakh" in an >>> international forum. It requires some readers (me for example) to >>> perform a >>> search in order to be sure on what you are talking about. >>> >>> - Toke Eskildsen >>> >>> >