Hi Jon,

Can you share the stack traces for the exceptions? Also, I don't know what it is about it, but an index with non-stored items of only 925K items being about 2.3GB seems weird to me for some reason. How many unique terms do you have?

Also, in Lucene there is a standalone program called CheckIndex, can you point that at your index and see what it says?

What would happen if you didn't access the index over the network and instead used local disk? Is that an option? Or am I not understanding your setup correctly?

Finally, what changed this week? And what is a "paper backlog" and how does it factor in?

Thanks,
Grant

On Apr 24, 2009, at 12:25 AM, Jon Bodner wrote:


Hi all,

I am trying to solve a serious performance problem with our Solr search
index.  We're running under Solr 1.3.  We've sharded our index into 4
shards. Index data is stored on a network mount that is accessed over Fibre
Channel.  Each document's text is indexed, but not stored.  Each day,
roughly 10K - 20K new documents are added. After a document is submitted, it is compared, sentence by sentence, against every document we have indexed in its category. It's a requirement that we keep our index as up- to-date as possible. We reload our indexes once a minute in order to miss as few matches as possible. We are not expecting to find matches, so our document cache hits rates are abysmal. We also don't expect many repeated sentences across documents, so cached query hits rates are also practically zero.

After running fine for over 9 months, the system broke down this week. The queries per second are around 17 to 18, and our paper backlog is well north of 14,000. The number of papers in the index has hit 3.7 million, and each
shard is 2.3GB in size (roughly 925K papers in each index).

In order to increase throughput, we tried to stand up additional read-only Solr instances pointed at the shared indexes, but got I/O errors from the secondary Solr instances when the reload time came. We tried switching the
locking mechanize from single to simple, but the I/O error continued.

We're running on 64-bit Linux with a 64-bit JVM (Java 1.6.something), with
4GB of RAM assigned to each Solr instance.

Has anyone else seen a problem like this before? Can anyone suggest any solutions? Will Solr 1.4 help (and is Solr 1.4 ready for production use)?

Any answers would be greatly appreciated.

Thanks,

Jon

--
View this message in context: 
http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23209595.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to