Re: Solr Performance bottleneck

Grant Ingersoll Fri, 24 Apr 2009 03:56:41 -0700

Hi Jon,

Can you share the stack traces for the exceptions? Also, I don't knowwhat it is about it, but an index with non-stored items of only 925Kitems being about 2.3GB seems weird to me for some reason. How manyunique terms do you have?

Also, in Lucene there is a standalone program called CheckIndex, canyou point that at your index and see what it says?

What would happen if you didn't access the index over the network andinstead used local disk? Is that an option? Or am I notunderstanding your setup correctly?

Finally, what changed this week? And what is a "paper backlog" andhow does it factor in?


Thanks,
Grant

On Apr 24, 2009, at 12:25 AM, Jon Bodner wrote:

Hi all,
I am trying to solve a serious performance problem with our Solrsearch
index.  We're running under Solr 1.3.  We've sharded our index into 4
shards. Index data is stored on a network mount that is accessedover Fibre
Channel.  Each document's text is indexed, but not stored.  Each day,
roughly 10K - 20K new documents are added. After a document issubmitted,it is compared, sentence by sentence, against every document we haveindexedin its category. It's a requirement that we keep our index as up-to-dateas possible. We reload our indexes once a minute in order to missas fewmatches as possible. We are not expecting to find matches, so ourdocumentcache hits rates are abysmal. We also don't expect many repeatedsentencesacross documents, so cached query hits rates are also practicallyzero.
After running fine for over 9 months, the system broke down thisweek. Thequeries per second are around 17 to 18, and our paper backlog iswell northof 14,000. The number of papers in the index has hit 3.7 million,and each
shard is 2.3GB in size (roughly 925K papers in each index).
In order to increase throughput, we tried to stand up additionalread-onlySolr instances pointed at the shared indexes, but got I/O errorsfrom thesecondary Solr instances when the reload time came. We triedswitching the
locking mechanize from single to simple, but the I/O error continued.
We're running on 64-bit Linux with a 64-bit JVM (Java1.6.something), with
4GB of RAM assigned to each Solr instance.
Has anyone else seen a problem like this before? Can anyone suggestanysolutions? Will Solr 1.4 help (and is Solr 1.4 ready for productionuse)?
Any answers would be greatly appreciated.

Thanks,

Jon

--
View this message in context: 
http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23209595.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Solr Performance bottleneck

Reply via email to