I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching.
I think that at any point of time, there can be a maximum of <number of threads> concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high as 80%. But if I keep the number of threads low (~10), the load average never goes beyond ~8). So probably thats the number of requests I can expect Solr to serve concurrently on this index size with this hardware. Can anyone give a general opinion as to how much hardware should be sufficient for a Solr deployment with an index size of ~43GB, containing around 2.5 million documents? I'm expecting it to serve at least 20 requests per second. Any experiences? Thanks On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <tburtonw...@gmail.com>wrote: > > How much of your memory are you allocating to the JVM and how much are you > leaving free? > > If you don't leave enough free memory for the OS, the OS won't have a large > enough disk cache, and you will be hitting the disk for lots of queries. > > You might want to monitor your Disk I/O using iostat and look at the > iowait. > > If you are doing phrase queries and your *prx file is significantly larger > than the available memory then when a slow phrase query hits Solr, the > contention for disk I/O with other queries could be slowing everything > down. > You might also want to look at the 90th and 99th percentile query times in > addition to the average. For our large indexes, we found at least an order > of magnitude difference between the average and 99th percentile queries. > Again, if Solr gets hit with a few of those 99th percentile slow queries > and > your not hitting your caches, chances are you will see serious contention > for disk I/O.. > > Of course if you don't see any waiting on i/o, then your bottleneck is > probably somewhere else:) > > See > > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 > for more background on our experience. > > Tom Burton-West > University of Michigan Library > www.hathitrust.org > > > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <siddhantg...@gmail.com > > >wrote: > > > > > Hi everyone, > > > > > > I have an index corresponding to ~2.5 million documents. The index size > > is > > > 43GB. The configuration of the machine which is running Solr is - Dual > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, > > 8GB > > > RAM, and 250 GB HDD. > > > > > > I'm observing a strange trend in the queries that I send to Solr. The > > query > > > times for queries that I send earlier is much lesser than the queries I > > > send > > > afterwards. For instance, if I write a script to query solr 5000 times > > > (with > > > 5000 distinct queries, most of them containing not more than 3-5 words) > > > with > > > 10 threads running in parallel, the average times for queries goes from > > > ~50ms in the beginning to ~6000ms. Is this expected or is there > > something > > > wrong with my configuration. Currently I've configured the > > queryResultCache > > > and the documentCache to contain 2048 entries (hit ratios for both is > > close > > > to 50%). > > > > > > Apart from this, a general question that I want to ask is that is such > a > > > hardware enough for this scenario? I'm aiming at achieving around 20 > > > queries > > > per second with the hardware mentioned above. > > > > > > Thanks, > > > > > > Regards, > > > > > > -- > > > - Siddhant > > > > > > > > > -- > - Siddhant > > > > -- > View this message in context: > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant