Sounds like you're pretty well on your way then. This is pretty typical
of multi-threaded situations... Threads 1-n wait around on I/O and
increasing the number of threads increases throughput without
changing (much) the individual response time.

Threads n+1 - p don't change throughput much, but increase
the response time for each request. On aggregate, though, the
throughput doesn't change (much).

Adding threads after p+1 *decreases* throughput while
*increasing* individual response time as your processors start
spending waaaayyyy to much time context and/or memory
swapping.

The trick is finding out what n and p are <G>.

Best
Erick

On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel <siddhantg...@gmail.com>wrote:

> Hi,
>
> Thanks for your responses. It actually feels good to be able to locate
> where
> the bottlenecks are.
>
> I've created two sets of data - in the first one I'm measuring the time
> took
> purely on Solr's end, and in the other one I'm including network latency
> (just for reference). The data that I'm posting below contains the time
> took
> purely by Solr.
>
> I'm running 10 threads simultaneously and the average response time (for
> each query in each thread) remains close to 40 to 50 ms. But as soon as I
> increase the number of threads to something like 100, the response time
> goes
> up to ~600ms, and further up when the number of threads is close to 500.
> Yes
> the average time definitely depends on the number of concurrent requests.
>
> Going from memory, debugQuery=on will let you know how much time
> > was spent in various operations in SOLR. It's important to know
> > whether it was the searching, assembling the response, or
> > transmitting the data back to the client.
>
>
> I just tried this. The information that it gives me for a query that took
> 7165ms is - http://pastebin.ca/1835644
>
> So out of the total time 7165ms, QueryComponent took most of the time. Plus
> I can see the load average going up when the number of threads is really
> high. So it actually makes sense. (I didn't add any other component while
> searching; it was a plain /select?q=query call).
> Like I mentioned earlier in this mail, I'm maintaining separate sets for
> data with/without network latency, and I don't think its the bottleneck.
>
>
> > How many threads does it take to peg the CPU? And what
> > response times are you getting when your number of threads is
> > around 10?
> >
>
> If the number of threads is greater than 100, that really takes its toll on
> the CPU. So probably thats the number.
>
> When the number of threads is around 10, the response times average to
> something like 60ms (and 95% of the queries fall within 100ms of that
> value).
>
> Thanks,
>
>
>
>
> >
> > Erick
> >
> > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel <siddhantg...@gmail.com
> > >wrote:
> >
> > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> > disk
> > > caching.
> > >
> > > I think that at any point of time, there can be a maximum of <number of
> > > threads> concurrent requests, which happens to make sense btw (does
> it?).
> > >
> > > As I increase the number of threads, the load average shown by top goes
> > up
> > > to as high as 80%. But if I keep the number of threads low (~10), the
> > load
> > > average never goes beyond ~8). So probably thats the number of requests
> I
> > > can expect Solr to serve concurrently on this index size with this
> > > hardware.
> > >
> > > Can anyone give a general opinion as to how much hardware should be
> > > sufficient for a Solr deployment with an index size of ~43GB,
> containing
> > > around 2.5 million documents? I'm expecting it to serve at least 20
> > > requests
> > > per second. Any experiences?
> > >
> > > Thanks
> > >
> > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <
> tburtonw...@gmail.com
> > > >wrote:
> > >
> > > >
> > > > How much of your memory are you allocating to the JVM and how much
> are
> > > you
> > > > leaving free?
> > > >
> > > > If you don't leave enough free memory for the OS, the OS won't have a
> > > large
> > > > enough disk cache, and you will be hitting the disk for lots of
> > queries.
> > > >
> > > > You might want to monitor your Disk I/O using iostat and look at the
> > > > iowait.
> > > >
> > > > If you are doing phrase queries and your *prx file is significantly
> > > larger
> > > > than the available memory then when a slow phrase query hits Solr,
> the
> > > > contention for disk I/O with other queries could be slowing
> everything
> > > > down.
> > > > You might also want to look at the 90th and 99th percentile query
> times
> > > in
> > > > addition to the average. For our large indexes, we found at least an
> > > order
> > > > of magnitude difference between the average and 99th percentile
> > queries.
> > > > Again, if Solr gets hit with a few of those 99th percentile slow
> > queries
> > > > and
> > > > your not hitting your caches, chances are you will see serious
> > contention
> > > > for disk I/O..
> > > >
> > > > Of course if you don't see any waiting on i/o, then your bottleneck
> is
> > > > probably somewhere else:)
> > > >
> > > > See
> > > >
> > > >
> > >
> >
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> > > > for more background on our experience.
> > > >
> > > > Tom Burton-West
> > > > University of Michigan Library
> > > > www.hathitrust.org
> > > >
> > > >
> > > >
> > > > >
> > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <
> > siddhantg...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I have an index corresponding to ~2.5 million documents. The
> index
> > > size
> > > > > is
> > > > > > 43GB. The configuration of the machine which is running Solr is -
> > > Dual
> > > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
> > > cache,
> > > > > 8GB
> > > > > > RAM, and 250 GB HDD.
> > > > > >
> > > > > > I'm observing a strange trend in the queries that I send to Solr.
> > The
> > > > > query
> > > > > > times for queries that I send earlier is much lesser than the
> > queries
> > > I
> > > > > > send
> > > > > > afterwards. For instance, if I write a script to query solr 5000
> > > times
> > > > > > (with
> > > > > > 5000 distinct queries, most of them containing not more than 3-5
> > > words)
> > > > > > with
> > > > > > 10 threads running in parallel, the average times for queries
> goes
> > > from
> > > > > > ~50ms in the beginning to ~6000ms. Is this expected or is there
> > > > > something
> > > > > > wrong with my configuration. Currently I've configured the
> > > > > queryResultCache
> > > > > > and the documentCache to contain 2048 entries (hit ratios for
> both
> > is
> > > > > close
> > > > > > to 50%).
> > > > > >
> > > > > > Apart from this, a general question that I want to ask is that is
> > > such
> > > > a
> > > > > > hardware enough for this scenario? I'm aiming at achieving around
> > 20
> > > > > > queries
> > > > > > per second with the hardware mentioned above.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > --
> > > > > > - Siddhant
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > - Siddhant
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > > >
> > >
> > >
> > > --
> > > - Siddhant
> > >
> >
>
>
>
> --
> - Siddhant
>

Reply via email to