Re: Solr Performance Issues

Erick Erickson Fri, 12 Mar 2010 06:02:53 -0800

You've probably already looked at this, but here goes anyway. The
first question probably should have been "what are you measuring"?
I've been fooled before by looking at, say, average response time
and extrapolating. You're getting 20 qps if your response time is
1 second, but you have 20 threads running simultaneously, ditto
if you're getting 2 second response time and 40 threads. So....


And what is "response time"? It would clarify things a lot if you
broke out which parts of the operation are taking the time. Going
from memory, debugQuery=on will let you know how much time
was spent in various operations in SOLR. It's important to know
whether it was the searching, assembling the response, or
transmitting the data back to the client. If your timings are
all just how long it takes the response to get back to the
client, you could even be hammered by network latency.

How many threads does it take to peg the CPU? And what
response times are you getting when your number of threads is
around 10?

Erick

On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel <siddhantg...@gmail.com>wrote:

> I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
> caching.
>
> I think that at any point of time, there can be a maximum of <number of
> threads> concurrent requests, which happens to make sense btw (does it?).
>
> As I increase the number of threads, the load average shown by top goes up
> to as high as 80%. But if I keep the number of threads low (~10), the load
> average never goes beyond ~8). So probably thats the number of requests I
> can expect Solr to serve concurrently on this index size with this
> hardware.
>
> Can anyone give a general opinion as to how much hardware should be
> sufficient for a Solr deployment with an index size of ~43GB, containing
> around 2.5 million documents? I'm expecting it to serve at least 20
> requests
> per second. Any experiences?
>
> Thanks
>
> On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <tburtonw...@gmail.com
> >wrote:
>
> >
> > How much of your memory are you allocating to the JVM and how much are
> you
> > leaving free?
> >
> > If you don't leave enough free memory for the OS, the OS won't have a
> large
> > enough disk cache, and you will be hitting the disk for lots of queries.
> >
> > You might want to monitor your Disk I/O using iostat and look at the
> > iowait.
> >
> > If you are doing phrase queries and your *prx file is significantly
> larger
> > than the available memory then when a slow phrase query hits Solr, the
> > contention for disk I/O with other queries could be slowing everything
> > down.
> > You might also want to look at the 90th and 99th percentile query times
> in
> > addition to the average. For our large indexes, we found at least an
> order
> > of magnitude difference between the average and 99th percentile queries.
> > Again, if Solr gets hit with a few of those 99th percentile slow queries
> > and
> > your not hitting your caches, chances are you will see serious contention
> > for disk I/O..
> >
> > Of course if you don't see any waiting on i/o, then your bottleneck is
> > probably somewhere else:)
> >
> > See
> >
> >
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> > for more background on our experience.
> >
> > Tom Burton-West
> > University of Michigan Library
> > www.hathitrust.org
> >
> >
> >
> > >
> > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <siddhantg...@gmail.com
> > > >wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I have an index corresponding to ~2.5 million documents. The index
> size
> > > is
> > > > 43GB. The configuration of the machine which is running Solr is -
> Dual
> > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
> cache,
> > > 8GB
> > > > RAM, and 250 GB HDD.
> > > >
> > > > I'm observing a strange trend in the queries that I send to Solr. The
> > > query
> > > > times for queries that I send earlier is much lesser than the queries
> I
> > > > send
> > > > afterwards. For instance, if I write a script to query solr 5000
> times
> > > > (with
> > > > 5000 distinct queries, most of them containing not more than 3-5
> words)
> > > > with
> > > > 10 threads running in parallel, the average times for queries goes
> from
> > > > ~50ms in the beginning to ~6000ms. Is this expected or is there
> > > something
> > > > wrong with my configuration. Currently I've configured the
> > > queryResultCache
> > > > and the documentCache to contain 2048 entries (hit ratios for both is
> > > close
> > > > to 50%).
> > > >
> > > > Apart from this, a general question that I want to ask is that is
> such
> > a
> > > > hardware enough for this scenario? I'm aiming at achieving around 20
> > > > queries
> > > > per second with the hardware mentioned above.
> > > >
> > > > Thanks,
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > - Siddhant
> > > >
> > >
> >
> >
> >
> > --
> > - Siddhant
> >
> >
> >
> > --
> > View this message in context:
> > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> - Siddhant
>

Re: Solr Performance Issues

Reply via email to