It's not that EC2 instances have slow disks, it's that they have no quota system to guarantee you X amount of throughput. I've benchmarked 1x to 3x on the same instance type at different times. That is, 300% variation in disk speeds.

Filter queries are only slow once; after that they create a filter and the second time it is very fast. You should use one of the Trie data types (TrieInt for example) for the prices. The Trie data types can be tuned for fast range queries but I don't know how.

Post your schema and solrconfig, we might be able to help you.

Lance

Toke Eskildsen wrote:
On Mon, 2010-11-15 at 06:35 +0100, lu.rongbin wrote:
In addition,my index has only two store fields, id and price, and other
fields are index. I increase the document and query cache. the ec2
m2.4xLarge instance is 8 cores, 68G memery. all indexs size is about 100G.
Looking at http://aws.amazon.com/ec2/instance-types/ I can see that
Amazon recommends using "EBS to get improved storage I/O performance for
disk bound applications". As Lucene/Solr is very often I/O bound (or
more precisely random access I/O bound), you might consider the EBS
option.

I found this article that looks very relevant:
http://www.coreyhulen.org/?p=326
It is about Cassandra (a database), but I'm guessing that the I/O
pattern is fairly similar to Lucene/Solr with a lot random access reads.

Extrapolating wildly it would seem that disk I/O latency is a problem
with Amazon's cloud, at least compared with the obvious choice of using
SSD on a local machine. If this holds true, some things you could try
would be better warming of your searches, holding (part of) your index
in RAM, switching to EBS or ... moving away from the cloud.


All this is assuming that it really is I/O that is your problem. Have
you looked at CPU-load vs. I/O wait while issuing a batch of queries?


Disclaimer: I have no experience with Amazon's cloud service.

Reply via email to