Thanks Erick for the pointing to better option. I will explore that. After your email, I found that if i have specified 'fl=*' in the query then it is doing the right thing (a 2 pass process). However, my queries had 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found that the shards are asked for highlighting all the results on the first request (and there is no second request).
The fl=* query is (in my sample case) finishing in 100 msec while same query with fl=id+score finishes in 1200 msec. Here are the two queries; http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json Thanks Nawab On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com> wrote: > I don't think you're reading it correctly. First of all, if you're > going to do be doing deep paging you should be using cusorMark, see: > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results. > > Second, it's a two-pass process if you don't use cursormark. The first > pass gets the candidate docs from each shard. But all it returns is > the ID and sort criteria. Then the aggregator node gets the _true_ top > N after sorting all the lists from each shard and issues a second > request for _only_ those docs that have made the top N from each sub > shard, and those should be the only ones highlighted. > > Do you have any evidence to the contrary that they're all being > highlighted? Or are you misinterpreting the log message for the first > pass? > > Best, > Erick > > On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> > wrote: > > Hi, > > > > In a multi-node solr installation (without SolrCloud), during a paging > > scenario (e.g., start=1000, rows=200), the primary node asks for 1200 > rows > > from each shard. If highlighting is ON, then the primary node is asking > for > > highlighting all the 1200 results from each shard, which doesn't scale > > well. Is there a way to break the shard query in two steps e.g. ask for > the > > 1200 rows and after sorting the 1200 responses from each shard and > finding > > final rows to return (1001 to 1200) , issue another query to shards for > > asking highlighted response for the relevant docs? > > > > > > > > Thanks > > Nawab >