I suppose that /get is the query by id API. I wonder if its reasonable to expect it to be smart in SolrCloud usage.
On Thursday, January 14, 2016, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Stupid thought/question. Is there a query by id API that understands > SolrCloud routing and can simply fwd the query to the shard that would hold > said document? Barring that, can one use SolrJ's routing brains to see what > shard a given id would be routed to and only query that shard? > > -Doug > > On Thursday, January 14, 2016, Jack Krupansky <jack.krupan...@gmail.com > <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>> wrote: > >> Add &debug=all to your query to see where the time is spent in the >> "timing" >> section to see which Solr search component is consuming the time. >> >> You may also have to add &debug=track to get the shard-specific info. >> >> In theory, 19 of the shards should return nothing and the 20th will return >> a single document. >> >> Maybe one of the shard nodes is having trouble and takes way too long to >> do >> essentially nothing. >> >> Does the document ID have any special characters in it? If so, be sure to >> escape them or put the ID in quotes, otherwise some piece of the ID may >> match lots of documents, although even that should not be a big problem. >> >> And make sure the ID field is string or numeric, not tokenized text. >> >> >> -- Jack Krupansky >> >> On Thu, Jan 14, 2016 at 7:53 PM, Shawn Heisey <apa...@elyograg.org> >> wrote: >> >> > On 1/14/2016 5:20 PM, Shivaji Dutta wrote: >> > > I am working with a customer that has about a billion documents on 20 >> > shards. The documents are extremely small about 100 characters each. >> > > The insert rate is pretty good, but they are trying to fetch the >> > document by using SolrJ SolrQuery >> > > >> > > Solr Query is taking about 1 min to return. >> > > >> > > The query is very simple >> > > id:<documentid> >> > > Note the content of the document is just the documentid. >> > > >> > > Request for Information >> > > >> > > A) I am looking for some information as how I could go about tuning >> the >> > query. >> > > B) An alternate approach that I am thinking of is to use the "/get" >> > request handler >> > > Is this going to be faster than "/select" >> > > C) I am looking at the debugQuery option, but I am unsure how to >> > interpret this. I saw an slide share which talked about " >> > http://explain.solr.pl/help", but it only supports older versions of >> solr. >> > >> > I have no idea whether /get would be faster. You'd need to try it. >> > >> > Can you provide the SolrJ code that you are using to do the query? >> > Another useful item would be the entire entry from the Solr logfile for >> > this query. There will probably be multiple log entries for one query, >> > usually the relevant log entry is the last one in the series. I may >> > need the schema, but we'll decide that later. >> > >> > Are all 20 shards on the same server, or have you got them spread out >> > across multiple machines? What is the replicationFactor on the >> > collection? If there are multiple machines, how many shards live on >> > each machine, and how many machines do you have total? Do you happen to >> > know how large the Lucene index is for each of these shards? How much >> > total memory does each server have, and how large is the Java heap? Is >> > there software other than Solr running on the machine(s)? >> > >> > I am suspecting that you don't have enough memory for the operating >> > system to effectively cache your index. Good performance for a billion >> > documents is going to require a lot of memory and probably a lot of >> > servers. >> > >> > https://wiki.apache.org/solr/SolrPerformanceProblems >> > >> > Thanks, >> > Shawn >> > >> > >> > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections > <http://opensourceconnections.com>, LLC | 240.476.9983 > Author: Relevant Search <http://manning.com/turnbull> > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. > > -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections <http://opensourceconnections.com>, LLC | 240.476.9983 Author: Relevant Search <http://manning.com/turnbull> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.