"book" by itself returns in 4s (non-optimized disk IO), running it a second
time returned 0s, so I think I can presume that the query was not cached the
first time. This system has been up for week, so it's warm.

I'm going to give your article a good long read, thanks for that.   

I guess good fast disks/SSDs and sharding should also improve on the base 4
sec query time. How _does_ Google get their queries times down to 0.35s
anyway? I presume their indexes are larger than my 150G index. :)

I still am a bit worried about what will happen when my index is 500GB
(it'll happen soon enough), even with sharding... well... I'd just need a
lot of servers it seems, and my feeling of it is that if I need a lot of
servers for a few users, how will it scale to many users?

Thanks for the great discussion,
Dave


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, March 25, 2013 10:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Slow queries for common terms

take a look here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

looking at memory consumption can be a bit tricky to interpret with
MMapDirectory.

But you say "I see the CPU working very hard" which implies that your issue
is just scoring 90M documents. A way to test: try q=*:*&fq=field:book. My
bet is that that will be much faster, in which case scoring is your
choke-point and you'll need to spread that load across more servers, i.e.
shard.

When running the above, make sure of a couple of things:
1> you haven't run the fq query before (or you have filterCache turned
completely off).
2> you _have_ run a query or two that warms up your low-level caches.
Doesn't matter what, just as long as it doesn't have an fq clause.

Best
Erick



On Sat, Mar 23, 2013 at 3:10 AM, David Parks <davidpark...@yahoo.com> wrote:

> I see the CPU working very hard, and at the same time I see 2 MB/sec 
> disk access for that 15 seconds. I am not running it this instant, but 
> it seems to me that there was more CPU cycles available, so unless 
> it's an issue of not being able to multithread it any  further I'd say
it's more IO related.
>
> I'm going to set up solr cloud and shard across the 2 servers I have 
> available for now. It's not an optimal setup we have while we're in a 
> private beta period, but maybe it'll improve things (I've got 2 
> servers with 2x 4TB disks in raid-0 shared with the webservers).
>
> I'll work towards some improved IO performance and maybe more shards 
> and see how things go. I'll also be able to up the RAM in just a 
> couple of weeks.
>
> Are there any settings I should think of in terms of improving cache 
> performance when I can give it say 10GB of RAM?
>
> Thanks, this has been tremendously helpful.
>
> David
>
>
> -----Original Message-----
> From: Tom Burton-West [mailto:tburt...@umich.edu]
> Sent: Saturday, March 23, 2013 1:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow queries for common terms
>
> Hi David and Jan,
>
> I wrote the blog post, and David, you are right, the problem we had 
> was with phrase queries because our positions lists are so huge.  
> Boolean
> queries don't need to read the positions lists.   I think you need to
> determine whether you are CPU bound or I/O bound.    It is possible that
> you are I/O bound and reading the term frequency postings for 90 
> million docs is taking a long time.  In that case, More memory in the 
> machine (but not dedicated to Solr) might help because Solr relies on 
> OS disk caching for caching the postings lists.  You would still need 
> to do some cache warming with your most common terms.
>
> On the other hand as Jan pointed out, you may be cpu bound because 
> Solr doesn't have early termination and has to rank all 90 million 
> docs in order to show the top 10 or 25.
>
> Did you try the OR search to see if your CPU is at 100%?
>
> Tom
>
> On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl <jan....@cominvent.com>
> wrote:
>
> > Hi
> >
> > There might not be a final cure with more RAM if you are CPU bound.
> > Scoring 90M docs is some work. Can you check what's going on during 
> > those
> > 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search 
> > which generates >100mill hits and see if that is slow too, even if 
> > you don't use frequent words.
> >
> > I'm sure you can find other frequent terms in your corpus which 
> > display similar behaviour, words which are even more frequent than 
> > "book". Are you using "AND" as default operator? You will benefit 
> > from limiting the number of results as much as possible.
> >
> > The real solution is to shard across N number of servers, until you 
> > reach the desired performance for the desired indexing/querying load.
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS - 
> > www.cominvent.com Solr Training - www.solrtraining.com
> >
> >
>
>

Reply via email to