Thanks Erick for the pointing to better option. I will explore that. After
your email, I found that if i have specified 'fl=*' in the query then it is
doing the right thing (a 2 pass process). However, my queries had
'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
that the shards are asked for highlighting all the results on the first
request (and there is no second request).

The fl=* query is (in my sample case) finishing in 100 msec while same
query with fl=id+score finishes in 1200 msec.

Here are the two queries;

http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json


http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json


Thanks
Nawab




On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I don't think you're reading it correctly. First of all, if you're
> going to do be doing deep paging you should be using cusorMark, see:
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>
> Second, it's a two-pass process if you don't use cursormark. The first
> pass gets the candidate docs from each shard. But all it returns is
> the ID and sort criteria. Then the aggregator node gets the _true_ top
> N after sorting all the lists from each shard and issues a second
> request for _only_ those docs that have made the top N from each sub
> shard, and those should be the only ones highlighted.
>
> Do you have any evidence to the contrary that they're all being
> highlighted? Or are you misinterpreting the log message for the first
> pass?
>
> Best,
> Erick
>
> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
> wrote:
> > Hi,
> >
> > In a multi-node solr installation (without SolrCloud), during a paging
> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
> rows
> > from each shard. If highlighting is ON, then the primary node is asking
> for
> > highlighting all the 1200 results from each shard, which doesn't scale
> > well. Is there a way to break the shard query in two steps e.g. ask for
> the
> > 1200 rows and after sorting the 1200 responses from each shard and
> finding
> > final rows to return (1001 to 1200) , issue another query to shards for
> > asking highlighted response for the relevant docs?
> >
> >
> >
> > Thanks
> > Nawab
>

Reply via email to