Re: my index has 500 million docs ,how to improve solr search performance？

Lance Norskog Mon, 15 Nov 2010 02:13:48 -0800

It's not that EC2 instances have slow disks, it's that they have noquota system to guarantee you X amount of throughput. I've benchmarked1x to 3x on the same instance type at different times. That is, 300%variation in disk speeds.

Filter queries are only slow once; after that they create a filter andthe second time it is very fast. You should use one of the Trie datatypes (TrieInt for example) for the prices. The Trie data types can betuned for fast range queries but I don't know how.


Post your schema and solrconfig, we might be able to help you.

Lance

Toke Eskildsen wrote:

On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote:

In addition,my index has only two store fields, id and price, and other
fields are index. I increase the document and query cache. the ec2
m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.

Looking at http://aws.amazon.com/ec2/instance-types/ I can see that
Amazon recommends using "EBS to get improved storage I/O performance for
disk bound applications". As Lucene/Solr is very often I/O bound (or
more precisely random access I/O bound), you might consider the EBS
option.

I found this article that looks very relevant:
http://www.coreyhulen.org/?p=326
It is about Cassandra (a database), but I'm guessing that the I/O
pattern is fairly similar to Lucene/Solr with a lot random access reads.

Extrapolating wildly it would seem that disk I/O latency is a problem
with Amazon's cloud, at least compared with the obvious choice of using
SSD on a local machine. If this holds true, some things you could try
would be better warming of your searches, holding (part of) your index
in RAM, switching to EBS or ... moving away from the cloud.


All this is assuming that it really is I/O that is your problem. Have
you looked at CPU-load vs. I/O wait while issuing a batch of queries?


Disclaimer: I have no experience with Amazon's cloud service.

Re: my index has 500 million docs ,how to improve solr search performance？

Reply via email to