I see the CPU working very hard, and at the same time I see 2 MB/sec disk
access for that 15 seconds. I am not running it this instant, but it seems
to me that there was more CPU cycles available, so unless it's an issue of
not being able to multithread it any  further I'd say it's more IO related.

I'm going to set up solr cloud and shard across the 2 servers I have
available for now. It's not an optimal setup we have while we're in a
private beta period, but maybe it'll improve things (I've got 2 servers with
2x 4TB disks in raid-0 shared with the webservers).

I'll work towards some improved IO performance and maybe more shards and see
how things go. I'll also be able to up the RAM in just a couple of weeks.

Are there any settings I should think of in terms of improving cache
performance when I can give it say 10GB of RAM?

Thanks, this has been tremendously helpful.


Hi David and Jan,

I wrote the blog post, and David, you are right, the problem we had was with
phrase queries because our positions lists are so huge.  Boolean
queries don't need to read the positions lists.   I think you need to
determine whether you are CPU bound or I/O bound.    It is possible that
you are I/O bound and reading the term frequency postings for 90 million
docs is taking a long time.  In that case, More memory in the machine (but
not dedicated to Solr) might help because Solr relies on OS disk caching for
caching the postings lists.  You would still need to do some cache warming
with your most common terms.

On the other hand as Jan pointed out, you may be cpu bound because Solr
doesn't have early termination and has to rank all 90 million docs in order
to show the top 10 or 25.

Did you try the OR search to see if your CPU is at 100%?


On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi
> There might not be a final cure with more RAM if you are CPU bound.
> Scoring 90M docs is some work. Can you check what's going on during 
> those
> 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search 
> which generates >100mill hits and see if that is slow too, even if you 
> don't use frequent words.
> I'm sure you can find other frequent terms in your corpus which 
> display similar behaviour, words which are even more frequent than 
> "book". Are you using "AND" as default operator? You will benefit from 
> limiting the number of results as much as possible.
> The real solution is to shard across N number of servers, until you 
> reach the desired performance for the desired indexing/querying load.
> --
> Jan Høydahl, search solution architect Cominvent AS - 
> www.cominvent.com Solr Training - www.solrtraining.com

