Stupid thought/question. Is there a query by id API that understands
SolrCloud routing and can simply fwd the query to the shard that would hold
said document? Barring that, can one use SolrJ's routing brains to see what
shard a given id would be routed to and only query that shard?

-Doug

On Thursday, January 14, 2016, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Add &debug=all to your query to see where the time is spent in the "timing"
> section to see which Solr search component is consuming the time.
>
> You may also have to add &debug=track to get the shard-specific info.
>
> In theory, 19 of the shards should return nothing and the 20th will return
> a single document.
>
> Maybe one of the shard nodes is having trouble and takes way too long to do
> essentially nothing.
>
> Does the document ID have any special characters in it? If so, be sure to
> escape them or put the ID in quotes, otherwise some piece of the ID may
> match lots of documents, although even that should not be a big problem.
>
> And make sure the ID field is string or numeric, not tokenized text.
>
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 7:53 PM, Shawn Heisey <apa...@elyograg.org
> <javascript:;>> wrote:
>
> > On 1/14/2016 5:20 PM, Shivaji Dutta wrote:
> > > I am working with a customer that has about a billion documents on 20
> > shards. The documents are extremely small about 100 characters each.
> > > The insert rate is pretty good, but they are trying to fetch the
> > document by using SolrJ SolrQuery
> > >
> > > Solr Query is taking about 1 min to return.
> > >
> > > The query is very simple
> > > id:<documentid>
> > > Note the content of the document is just the documentid.
> > >
> > > Request for Information
> > >
> > > A) I am looking for some information as how I could go about tuning the
> > query.
> > > B) An alternate approach that I am thinking of is to use the "/get"
> > request handler
> > > Is this going to be faster than "/select"
> > > C) I am looking at the debugQuery option, but I am unsure how to
> > interpret this. I saw an slide share which talked about "
> > http://explain.solr.pl/help";, but it only supports older versions of
> solr.
> >
> > I have no idea whether /get would be faster.  You'd need to try it.
> >
> > Can you provide the SolrJ code that you are using to do the query?
> > Another useful item would be the entire entry from the Solr logfile for
> > this query.  There will probably be multiple log entries for one query,
> > usually the relevant log entry is the last one in the series.  I may
> > need the schema, but we'll decide that later.
> >
> > Are all 20 shards on the same server, or have you got them spread out
> > across multiple machines?  What is the replicationFactor on the
> > collection?  If there are multiple machines, how many shards live on
> > each machine, and how many machines do you have total?  Do you happen to
> > know how large the Lucene index is for each of these shards?  How much
> > total memory does each server have, and how large is the Java heap?  Is
> > there software other than Solr running on the machine(s)?
> >
> > I am suspecting that you don't have enough memory for the operating
> > system to effectively cache your index.  Good performance for a billion
> > documents is going to require a lot of memory and probably a lot of
> > servers.
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Thanks,
> > Shawn
> >
> >
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Reply via email to