[ https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616903#action_12616903 ]
Brian Whitman commented on SOLR-659: ------------------------------------ An example of a bulk query using this patch. Without this patch such bulk queries will eventually time out or cause exceptions in the server as too much data is passed back and forth. {code:java} public SolrDocumentList blockQuery(SolrQuery q, int blockSize, int maxResults) { SolrDocumentList allResults = new SolrDocumentList(); if(blockSize > maxResults) { blockSize = maxResults; } for(int i=0; i<maxResults; i=i+blockSize) { // Sets rows of this query to the most results that could ever come back - the blockSize * the number of shards q.setRows(blockSize * getNumberOfHosts()); // Don't set a start on the main query q.setStart(0); // But do set start and rows on the individual shards. q.set("shards.start", String.valueOf(i)); q.set("shards.rows", String.valueOf(blockSize)); // Perform the query. QueryResponse sub = query(q); // For each returned document (up to blockSize*numberOfHosts() of them), append them to the main result for(SolrDocument s : sub.getResults()) { allResults.add(s); // Break if we've reached our requested limit if(allResults.size() > maxResults) { break; } } if(allResults.size() > maxResults) { break; } } return allResults; } {code} > Explicitly set start and rows per shard for more efficient bulk queries > across distributed Solr > ----------------------------------------------------------------------------------------------- > > Key: SOLR-659 > URL: https://issues.apache.org/jira/browse/SOLR-659 > Project: Solr > Issue Type: Improvement > Components: search > Affects Versions: 1.3 > Reporter: Brian Whitman > Priority: Minor > Fix For: 1.3 > > Attachments: shards.start_rows.patch > > > The default behavior of setting start and rows on distributed solr (SOLR-303) > is to set start at 0 across all shards and set rows to start+rows across each > shard. This ensures all results are returned for any arbitrary start and rows > setting, but during "bulk queries" (where start is incrementally increased > and rows is kept consistent) the client would need finer control of the > per-shard start and rows parameter as retrieving many thousands of documents > becomes intractable as start grows higher. > Attaching a patch that creates a &shards.start and &shards.rows parameter. If > used, the logic that sets rows to start+rows per shard is overridden and each > shard gets the exact start and rows set in shards.start and shards.rows. The > client will receive up to shards.rows * nShards results and should set rows > accordingly. This makes bulk queries across distributed solr possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.