"book" by itself returns in 4s (non-optimized disk IO), running it a second time returned 0s, so I think I can presume that the query was not cached the first time. This system has been up for week, so it's warm.
I'm going to give your article a good long read, thanks for that. I guess good fast disks/SSDs and sharding should also improve on the base 4 sec query time. How _does_ Google get their queries times down to 0.35s anyway? I presume their indexes are larger than my 150G index. :) I still am a bit worried about what will happen when my index is 500GB (it'll happen soon enough), even with sharding... well... I'd just need a lot of servers it seems, and my feeling of it is that if I need a lot of servers for a few users, how will it scale to many users? Thanks for the great discussion, Dave -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, March 25, 2013 10:04 PM To: solr-user@lucene.apache.org Subject: Re: Slow queries for common terms take a look here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html looking at memory consumption can be a bit tricky to interpret with MMapDirectory. But you say "I see the CPU working very hard" which implies that your issue is just scoring 90M documents. A way to test: try q=*:*&fq=field:book. My bet is that that will be much faster, in which case scoring is your choke-point and you'll need to spread that load across more servers, i.e. shard. When running the above, make sure of a couple of things: 1> you haven't run the fq query before (or you have filterCache turned completely off). 2> you _have_ run a query or two that warms up your low-level caches. Doesn't matter what, just as long as it doesn't have an fq clause. Best Erick On Sat, Mar 23, 2013 at 3:10 AM, David Parks <davidpark...@yahoo.com> wrote: > I see the CPU working very hard, and at the same time I see 2 MB/sec > disk access for that 15 seconds. I am not running it this instant, but > it seems to me that there was more CPU cycles available, so unless > it's an issue of not being able to multithread it any further I'd say it's more IO related. > > I'm going to set up solr cloud and shard across the 2 servers I have > available for now. It's not an optimal setup we have while we're in a > private beta period, but maybe it'll improve things (I've got 2 > servers with 2x 4TB disks in raid-0 shared with the webservers). > > I'll work towards some improved IO performance and maybe more shards > and see how things go. I'll also be able to up the RAM in just a > couple of weeks. > > Are there any settings I should think of in terms of improving cache > performance when I can give it say 10GB of RAM? > > Thanks, this has been tremendously helpful. > > David > > > -----Original Message----- > From: Tom Burton-West [mailto:tburt...@umich.edu] > Sent: Saturday, March 23, 2013 1:38 AM > To: solr-user@lucene.apache.org > Subject: Re: Slow queries for common terms > > Hi David and Jan, > > I wrote the blog post, and David, you are right, the problem we had > was with phrase queries because our positions lists are so huge. > Boolean > queries don't need to read the positions lists. I think you need to > determine whether you are CPU bound or I/O bound. It is possible that > you are I/O bound and reading the term frequency postings for 90 > million docs is taking a long time. In that case, More memory in the > machine (but not dedicated to Solr) might help because Solr relies on > OS disk caching for caching the postings lists. You would still need > to do some cache warming with your most common terms. > > On the other hand as Jan pointed out, you may be cpu bound because > Solr doesn't have early termination and has to rank all 90 million > docs in order to show the top 10 or 25. > > Did you try the OR search to see if your CPU is at 100%? > > Tom > > On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl <jan....@cominvent.com> > wrote: > > > Hi > > > > There might not be a final cure with more RAM if you are CPU bound. > > Scoring 90M docs is some work. Can you check what's going on during > > those > > 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search > > which generates >100mill hits and see if that is slow too, even if > > you don't use frequent words. > > > > I'm sure you can find other frequent terms in your corpus which > > display similar behaviour, words which are even more frequent than > > "book". Are you using "AND" as default operator? You will benefit > > from limiting the number of results as much as possible. > > > > The real solution is to shard across N number of servers, until you > > reach the desired performance for the desired indexing/querying load. > > > > -- > > Jan Høydahl, search solution architect Cominvent AS - > > www.cominvent.com Solr Training - www.solrtraining.com > > > > > >