Actually, i realize that it is an incorrect use on my part to pass only id+score in fl and specify more fields in the hl.fl fields. This was somehow supported in older versions but the new behavior is actually a performance improvement for the scenario when user is asking for only ids.
Nawab On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Thanks Erick for the pointing to better option. I will explore that. After > your email, I found that if i have specified 'fl=*' in the query then it is > doing the right thing (a 2 pass process). However, my queries had > 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found > that the shards are asked for highlighting all the results on the first > request (and there is no second request). > > The fl=* query is (in my sample case) finishing in 100 msec while same > query with fl=id+score finishes in 1200 msec. > > Here are the two queries; > > http://solrdev.test.net:8984/solr/filesearch/select?&hl=on& > fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net: > 8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch, > solrdev.test.net:8986/solr/filesearch&wt=json > > > http://solrdev.test.net:8984/solr/filesearch/select?&hl=on& > fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev. > test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/ > filesearch,solrdev.test.net:8986/solr/filesearch&wt=json > > > Thanks > Nawab > > > > > On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> I don't think you're reading it correctly. First of all, if you're >> going to do be doing deep paging you should be using cusorMark, see: >> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results. >> >> Second, it's a two-pass process if you don't use cursormark. The first >> pass gets the candidate docs from each shard. But all it returns is >> the ID and sort criteria. Then the aggregator node gets the _true_ top >> N after sorting all the lists from each shard and issues a second >> request for _only_ those docs that have made the top N from each sub >> shard, and those should be the only ones highlighted. >> >> Do you have any evidence to the contrary that they're all being >> highlighted? Or are you misinterpreting the log message for the first >> pass? >> >> Best, >> Erick >> >> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> >> wrote: >> > Hi, >> > >> > In a multi-node solr installation (without SolrCloud), during a paging >> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200 >> rows >> > from each shard. If highlighting is ON, then the primary node is asking >> for >> > highlighting all the 1200 results from each shard, which doesn't scale >> > well. Is there a way to break the shard query in two steps e.g. ask for >> the >> > 1200 rows and after sorting the 1200 responses from each shard and >> finding >> > final rows to return (1001 to 1200) , issue another query to shards for >> > asking highlighted response for the relevant docs? >> > >> > >> > >> > Thanks >> > Nawab >> > >