Re: Solrcloud export all results sorted by score

Walter Underwood Tue, 01 Oct 2019 09:33:36 -0700

I had to do this recently on a Solr Cloud cluster. I wanted to export all the 
IDs, but they weren’t stored as docvalues.


The fastest approach was to fetch all the IDs in one request. First, I make a 
request for zero rows to get the numFound. Then I fetch numFound+1000 (in case 
docs were added while I wasn’t looking) in one request.

I also have a hairy shell script to do /export on each leader after parsing 
cluster status. That might be a little large to post to this list, but I can do 
it if there is general interest.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 1, 2019, at 9:14 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> First, thanks for taking the time to ask a question with enough supporting 
> details that I can hope to be able to answer in one exchange ;). It’s a 
> pleasure to see.
> 
> Second, NP with asking on Stack Overflow, they have some excellent answers 
> there. But you’re right, this list gets more Solr-centered eyeballs.
> 
> On to your question. I think the best answer was that “/export wasn’t 
> designed to deal with scores”, which you’ll find disappointing. 
> 
> You could use the Streaming “search” expression (using qt=/select or just 
> leave qt out) but that’ll sort all of the docs you’re exporting into a huge 
> list, which may perform worse than CursorMark even if it doesn’t blow up 
> memory.
> 
> The root of this problem is that export can sort in batches since the values 
> it’s sorting on are contained in each document, so it can iterate in batches, 
> send them out, then iterate again on the remaining documents.
> 
> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to 
> know where a doc lands in the final set relative to any other doc, so if it 
> were going to work it’d have to have enough memory to hold the scores of all 
> the docs in an ordered list, which is very expensive. Conceptually this is an 
> ordered list up to maxDoc long. Not only does there have to be enough memory 
> to hold the entire list, every doc has to be inserted individually which can 
> kill performance. This is the “deep paging” problem.
> 
> In the usual case of returning, say, 20 docs, the sorted list only has to be 
> 20 long, higher scoring docs evict lower scoring docs.
> 
> So I think CursorMark is your best bet.
> 
> Best,
> Erick
> 
>> On Oct 1, 2019, at 3:59 AM, Edward Turner <eddtur...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> As far as I understand, SolrCloud currently does not allow the use of
>> sorting by the pseudofield, score in the /export request handler (i.e., get
>> the results in relevancy order). If we do attempt this, we get an
>> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
>> supported with xsort". We could use Solr's cursorMark, but this takes a
>> very long time ...
>> 
>> Exporting results does work, however, when exporting result sets by a
>> specific document field that has docValues set to true.
>> 
>> Question:
>> Does anyone know if/when it will be possible to sort by score in the
>> /export handler?
>> 
>> Research on the problem:
>> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
>> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
>> issue, but don't fix it. Maybe I've missed a more relevant issue?
>> 
>> Our use-case We are using Solrcloud in our team and it's added a huge
>> amount of value to our users.
>> 
>> We show a table of search results ordered by score (relevancy) that was
>> obtained from sending a query to the standard /select handler. We're
>> working in the life-sciences domain and it is common for our result sets to
>> contain many millions of results (unfortunately). After users browse their
>> results, they then may want to download the results that they see, to do
>> some post-processing. However, to do this, such that the results appear in
>> the order that the user originally saw them, we'd need to be able to export
>> results based on score/relevancy.
>> 
>> Any suggestions or advice on this would be greatly appreciated!
>> 
>> Many thanks!
>> 
>> Edd
>> 
>> PS. apologies for posting also on Stackoverflow (
>> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
>> --
>> I only discovered the Solr mailing-list afterwards and thought it probably
>> better to reach out directly to Solr's people (I can share any answer from
>> this forum on there retrospectively).
>

Re: Solrcloud export all results sorted by score

Reply via email to