If you look at the Solr wiki, one of the limitations of distributed
searching it mentions is with regards to the start parameter.

http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

"Makes it more inefficient to use a high "start" parameter. For example, if
you request start=500000&rows=25 on an index with 500,000+ docs per shard,
this will currently result in 500,000 records getting sent over the network
from the shard to the coordinating Solr instance. If you had a single-shard
index, in contrast, only 25 records would ever get sent over the network."

While I may not have a start parameter of 500,000, I could easily have one
of 50,000, and it concerns me the hit in performance I may take when using
such a high start parameter with distributed searching. I would use this if
the user had issued a search query that resulted in say 50,000+ matches. I
may only display 40 matches per web page, with the user having the ability
to "jump" to the end of the results. So specifying a high start parameter is
certainly likely, and I know this sort of scenario is common for a lot of
websites. Are there tricks that can be played to avoid the performance hit
associated with specifying a high start parameter when doing distributed
searching?

Thanks,
Ben

Reply via email to