[jira] Commented: (SOLR-659) Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr

Brian Whitman (JIRA) Fri, 25 Jul 2008 07:18:29 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616903#action_12616903
 ]


Brian Whitman commented on SOLR-659:
------------------------------------

An example of a bulk query using this patch. Without this patch such bulk 
queries will eventually time out or cause exceptions in the server as too much 
data is passed back and forth.

{code:java}
public SolrDocumentList blockQuery(SolrQuery q, int blockSize, int maxResults) {
    SolrDocumentList allResults = new SolrDocumentList();
    if(blockSize > maxResults) { blockSize = maxResults;  }
    for(int i=0; i<maxResults; i=i+blockSize) {
      // Sets rows of this query to the most results that could ever come back 
- the blockSize * the number of shards
      q.setRows(blockSize * getNumberOfHosts());
      // Don't set a start on the main query
      q.setStart(0);
      // But do set start and rows on the individual shards. 
      q.set("shards.start", String.valueOf(i));
      q.set("shards.rows", String.valueOf(blockSize));
      // Perform the query.
      QueryResponse sub = query(q);
      // For each returned document (up to blockSize*numberOfHosts() of them), 
append them to the main result
      for(SolrDocument s : sub.getResults()) {
        allResults.add(s);
        // Break if we've reached our requested limit
        if(allResults.size() > maxResults) { break; }
      }
      if(allResults.size() > maxResults) { break; }
    }
    return allResults;
  }
{code}

> Explicitly set start and rows per shard for more efficient bulk queries 
> across distributed Solr
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-659
>                 URL: https://issues.apache.org/jira/browse/SOLR-659
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Brian Whitman
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: shards.start_rows.patch
>
>
> The default behavior of setting start and rows on distributed solr (SOLR-303) 
> is to set start at 0 across all shards and set rows to start+rows across each 
> shard. This ensures all results are returned for any arbitrary start and rows 
> setting, but during "bulk queries" (where start is incrementally increased 
> and rows is kept consistent) the client would need finer control of the 
> per-shard start and rows parameter as retrieving many thousands of documents 
> becomes intractable as start grows higher.
> Attaching a patch that creates a &shards.start and &shards.rows parameter. If 
> used, the logic that sets rows to start+rows per shard is overridden and each 
> shard gets the exact start and rows set in shards.start and shards.rows. The 
> client will receive up to shards.rows * nShards results and should set rows 
> accordingly. This makes bulk queries across distributed solr possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-659) Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr

Reply via email to