On 1/14/2016 5:20 PM, Shivaji Dutta wrote: > I am working with a customer that has about a billion documents on 20 shards. > The documents are extremely small about 100 characters each. > The insert rate is pretty good, but they are trying to fetch the document by > using SolrJ SolrQuery > > Solr Query is taking about 1 min to return. > > The query is very simple > id:<documentid> > Note the content of the document is just the documentid. > > Request for Information > > A) I am looking for some information as how I could go about tuning the query. > B) An alternate approach that I am thinking of is to use the "/get" request > handler > Is this going to be faster than "/select" > C) I am looking at the debugQuery option, but I am unsure how to interpret > this. I saw an slide share which talked about "http://explain.solr.pl/help", > but it only supports older versions of solr.
I have no idea whether /get would be faster. You'd need to try it. Can you provide the SolrJ code that you are using to do the query? Another useful item would be the entire entry from the Solr logfile for this query. There will probably be multiple log entries for one query, usually the relevant log entry is the last one in the series. I may need the schema, but we'll decide that later. Are all 20 shards on the same server, or have you got them spread out across multiple machines? What is the replicationFactor on the collection? If there are multiple machines, how many shards live on each machine, and how many machines do you have total? Do you happen to know how large the Lucene index is for each of these shards? How much total memory does each server have, and how large is the Java heap? Is there software other than Solr running on the machine(s)? I am suspecting that you don't have enough memory for the operating system to effectively cache your index. Good performance for a billion documents is going to require a lot of memory and probably a lot of servers. https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn