I had to do this recently on a Solr Cloud cluster. I wanted to export all the IDs, but they weren’t stored as docvalues.
The fastest approach was to fetch all the IDs in one request. First, I make a request for zero rows to get the numFound. Then I fetch numFound+1000 (in case docs were added while I wasn’t looking) in one request. I also have a hairy shell script to do /export on each leader after parsing cluster status. That might be a little large to post to this list, but I can do it if there is general interest. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 1, 2019, at 9:14 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > First, thanks for taking the time to ask a question with enough supporting > details that I can hope to be able to answer in one exchange ;). It’s a > pleasure to see. > > Second, NP with asking on Stack Overflow, they have some excellent answers > there. But you’re right, this list gets more Solr-centered eyeballs. > > On to your question. I think the best answer was that “/export wasn’t > designed to deal with scores”, which you’ll find disappointing. > > You could use the Streaming “search” expression (using qt=/select or just > leave qt out) but that’ll sort all of the docs you’re exporting into a huge > list, which may perform worse than CursorMark even if it doesn’t blow up > memory. > > The root of this problem is that export can sort in batches since the values > it’s sorting on are contained in each document, so it can iterate in batches, > send them out, then iterate again on the remaining documents. > > Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to > know where a doc lands in the final set relative to any other doc, so if it > were going to work it’d have to have enough memory to hold the scores of all > the docs in an ordered list, which is very expensive. Conceptually this is an > ordered list up to maxDoc long. Not only does there have to be enough memory > to hold the entire list, every doc has to be inserted individually which can > kill performance. This is the “deep paging” problem. > > In the usual case of returning, say, 20 docs, the sorted list only has to be > 20 long, higher scoring docs evict lower scoring docs. > > So I think CursorMark is your best bet. > > Best, > Erick > >> On Oct 1, 2019, at 3:59 AM, Edward Turner <eddtur...@gmail.com> wrote: >> >> Hi all, >> >> As far as I understand, SolrCloud currently does not allow the use of >> sorting by the pseudofield, score in the /export request handler (i.e., get >> the results in relevancy order). If we do attempt this, we get an >> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently >> supported with xsort". We could use Solr's cursorMark, but this takes a >> very long time ... >> >> Exporting results does work, however, when exporting result sets by a >> specific document field that has docValues set to true. >> >> Question: >> Does anyone know if/when it will be possible to sort by score in the >> /export handler? >> >> Research on the problem: >> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and >> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this >> issue, but don't fix it. Maybe I've missed a more relevant issue? >> >> Our use-case We are using Solrcloud in our team and it's added a huge >> amount of value to our users. >> >> We show a table of search results ordered by score (relevancy) that was >> obtained from sending a query to the standard /select handler. We're >> working in the life-sciences domain and it is common for our result sets to >> contain many millions of results (unfortunately). After users browse their >> results, they then may want to download the results that they see, to do >> some post-processing. However, to do this, such that the results appear in >> the order that the user originally saw them, we'd need to be able to export >> results based on score/relevancy. >> >> Any suggestions or advice on this would be greatly appreciated! >> >> Many thanks! >> >> Edd >> >> PS. apologies for posting also on Stackoverflow ( >> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score) >> -- >> I only discovered the Solr mailing-list afterwards and thought it probably >> better to reach out directly to Solr's people (I can share any answer from >> this forum on there retrospectively). >