Sounds like you're pretty well on your way then. This is pretty typical of multi-threaded situations... Threads 1-n wait around on I/O and increasing the number of threads increases throughput without changing (much) the individual response time.
Threads n+1 - p don't change throughput much, but increase the response time for each request. On aggregate, though, the throughput doesn't change (much). Adding threads after p+1 *decreases* throughput while *increasing* individual response time as your processors start spending waaaayyyy to much time context and/or memory swapping. The trick is finding out what n and p are <G>. Best Erick On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel <siddhantg...@gmail.com>wrote: > Hi, > > Thanks for your responses. It actually feels good to be able to locate > where > the bottlenecks are. > > I've created two sets of data - in the first one I'm measuring the time > took > purely on Solr's end, and in the other one I'm including network latency > (just for reference). The data that I'm posting below contains the time > took > purely by Solr. > > I'm running 10 threads simultaneously and the average response time (for > each query in each thread) remains close to 40 to 50 ms. But as soon as I > increase the number of threads to something like 100, the response time > goes > up to ~600ms, and further up when the number of threads is close to 500. > Yes > the average time definitely depends on the number of concurrent requests. > > Going from memory, debugQuery=on will let you know how much time > > was spent in various operations in SOLR. It's important to know > > whether it was the searching, assembling the response, or > > transmitting the data back to the client. > > > I just tried this. The information that it gives me for a query that took > 7165ms is - http://pastebin.ca/1835644 > > So out of the total time 7165ms, QueryComponent took most of the time. Plus > I can see the load average going up when the number of threads is really > high. So it actually makes sense. (I didn't add any other component while > searching; it was a plain /select?q=query call). > Like I mentioned earlier in this mail, I'm maintaining separate sets for > data with/without network latency, and I don't think its the bottleneck. > > > > How many threads does it take to peg the CPU? And what > > response times are you getting when your number of threads is > > around 10? > > > > If the number of threads is greater than 100, that really takes its toll on > the CPU. So probably thats the number. > > When the number of threads is around 10, the response times average to > something like 60ms (and 95% of the queries fall within 100ms of that > value). > > Thanks, > > > > > > > > Erick > > > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel <siddhantg...@gmail.com > > >wrote: > > > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS > > disk > > > caching. > > > > > > I think that at any point of time, there can be a maximum of <number of > > > threads> concurrent requests, which happens to make sense btw (does > it?). > > > > > > As I increase the number of threads, the load average shown by top goes > > up > > > to as high as 80%. But if I keep the number of threads low (~10), the > > load > > > average never goes beyond ~8). So probably thats the number of requests > I > > > can expect Solr to serve concurrently on this index size with this > > > hardware. > > > > > > Can anyone give a general opinion as to how much hardware should be > > > sufficient for a Solr deployment with an index size of ~43GB, > containing > > > around 2.5 million documents? I'm expecting it to serve at least 20 > > > requests > > > per second. Any experiences? > > > > > > Thanks > > > > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West < > tburtonw...@gmail.com > > > >wrote: > > > > > > > > > > > How much of your memory are you allocating to the JVM and how much > are > > > you > > > > leaving free? > > > > > > > > If you don't leave enough free memory for the OS, the OS won't have a > > > large > > > > enough disk cache, and you will be hitting the disk for lots of > > queries. > > > > > > > > You might want to monitor your Disk I/O using iostat and look at the > > > > iowait. > > > > > > > > If you are doing phrase queries and your *prx file is significantly > > > larger > > > > than the available memory then when a slow phrase query hits Solr, > the > > > > contention for disk I/O with other queries could be slowing > everything > > > > down. > > > > You might also want to look at the 90th and 99th percentile query > times > > > in > > > > addition to the average. For our large indexes, we found at least an > > > order > > > > of magnitude difference between the average and 99th percentile > > queries. > > > > Again, if Solr gets hit with a few of those 99th percentile slow > > queries > > > > and > > > > your not hitting your caches, chances are you will see serious > > contention > > > > for disk I/O.. > > > > > > > > Of course if you don't see any waiting on i/o, then your bottleneck > is > > > > probably somewhere else:) > > > > > > > > See > > > > > > > > > > > > > > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 > > > > for more background on our experience. > > > > > > > > Tom Burton-West > > > > University of Michigan Library > > > > www.hathitrust.org > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel < > > siddhantg...@gmail.com > > > > > >wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > I have an index corresponding to ~2.5 million documents. The > index > > > size > > > > > is > > > > > > 43GB. The configuration of the machine which is running Solr is - > > > Dual > > > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB > > > cache, > > > > > 8GB > > > > > > RAM, and 250 GB HDD. > > > > > > > > > > > > I'm observing a strange trend in the queries that I send to Solr. > > The > > > > > query > > > > > > times for queries that I send earlier is much lesser than the > > queries > > > I > > > > > > send > > > > > > afterwards. For instance, if I write a script to query solr 5000 > > > times > > > > > > (with > > > > > > 5000 distinct queries, most of them containing not more than 3-5 > > > words) > > > > > > with > > > > > > 10 threads running in parallel, the average times for queries > goes > > > from > > > > > > ~50ms in the beginning to ~6000ms. Is this expected or is there > > > > > something > > > > > > wrong with my configuration. Currently I've configured the > > > > > queryResultCache > > > > > > and the documentCache to contain 2048 entries (hit ratios for > both > > is > > > > > close > > > > > > to 50%). > > > > > > > > > > > > Apart from this, a general question that I want to ask is that is > > > such > > > > a > > > > > > hardware enough for this scenario? I'm aiming at achieving around > > 20 > > > > > > queries > > > > > > per second with the hardware mentioned above. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Regards, > > > > > > > > > > > > -- > > > > > > - Siddhant > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > - Siddhant > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > > > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html > > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > > > > > > > -- > > > - Siddhant > > > > > > > > > -- > - Siddhant >