Try cutting back Solr's memory - the OS knows how to manage disk
caches better than Solr does.

Another approach is to raise and lower the queryResultCache and see if
the hitratio changes.

On Wed, Mar 17, 2010 at 9:44 AM, Siddhant Goel <siddhantg...@gmail.com> wrote:
> Hi,
>
> Apparently the bottleneck seem to be the time periods when CPU is waiting to
> do some I/O. Out of all the numbers I can see, the CPU wait times for I/O
> seem to be the highest. I've alloted 4GB to Solr out of the total 8GB
> available. There's only 47MB free on the machine, so I assume the rest of
> the memory is being used for OS disk caches. In addition, the hit ratios for
> queryResultCache isn't going beyond 20%. So the problem I think is not at
> Solr's end. Are there any pointers available on how can I resolve such
> issues related to disk I/O? Does this mean I need more overall memory? Or
> reducing the amount of memory allocated to Solr so that the disk cache has
> more memory, would help?
>
> Thanks,
>
> On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Sounds like you're pretty well on your way then. This is pretty typical
>> of multi-threaded situations... Threads 1-n wait around on I/O and
>> increasing the number of threads increases throughput without
>> changing (much) the individual response time.
>>
>> Threads n+1 - p don't change throughput much, but increase
>> the response time for each request. On aggregate, though, the
>> throughput doesn't change (much).
>>
>> Adding threads after p+1 *decreases* throughput while
>> *increasing* individual response time as your processors start
>> spending waaaayyyy to much time context and/or memory
>> swapping.
>>
>> The trick is finding out what n and p are <G>.
>>
>> Best
>> Erick
>>
>> On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel <siddhantg...@gmail.com
>> >wrote:
>>
>> > Hi,
>> >
>> > Thanks for your responses. It actually feels good to be able to locate
>> > where
>> > the bottlenecks are.
>> >
>> > I've created two sets of data - in the first one I'm measuring the time
>> > took
>> > purely on Solr's end, and in the other one I'm including network latency
>> > (just for reference). The data that I'm posting below contains the time
>> > took
>> > purely by Solr.
>> >
>> > I'm running 10 threads simultaneously and the average response time (for
>> > each query in each thread) remains close to 40 to 50 ms. But as soon as I
>> > increase the number of threads to something like 100, the response time
>> > goes
>> > up to ~600ms, and further up when the number of threads is close to 500.
>> > Yes
>> > the average time definitely depends on the number of concurrent requests.
>> >
>> > Going from memory, debugQuery=on will let you know how much time
>> > > was spent in various operations in SOLR. It's important to know
>> > > whether it was the searching, assembling the response, or
>> > > transmitting the data back to the client.
>> >
>> >
>> > I just tried this. The information that it gives me for a query that took
>> > 7165ms is - http://pastebin.ca/1835644
>> >
>> > So out of the total time 7165ms, QueryComponent took most of the time.
>> Plus
>> > I can see the load average going up when the number of threads is really
>> > high. So it actually makes sense. (I didn't add any other component while
>> > searching; it was a plain /select?q=query call).
>> > Like I mentioned earlier in this mail, I'm maintaining separate sets for
>> > data with/without network latency, and I don't think its the bottleneck.
>> >
>> >
>> > > How many threads does it take to peg the CPU? And what
>> > > response times are you getting when your number of threads is
>> > > around 10?
>> > >
>> >
>> > If the number of threads is greater than 100, that really takes its toll
>> on
>> > the CPU. So probably thats the number.
>> >
>> > When the number of threads is around 10, the response times average to
>> > something like 60ms (and 95% of the queries fall within 100ms of that
>> > value).
>> >
>> > Thanks,
>> >
>> >
>> >
>> >
>> > >
>> > > Erick
>> > >
>> > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel <siddhantg...@gmail.com
>> > > >wrote:
>> > >
>> > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
>> > > disk
>> > > > caching.
>> > > >
>> > > > I think that at any point of time, there can be a maximum of <number
>> of
>> > > > threads> concurrent requests, which happens to make sense btw (does
>> > it?).
>> > > >
>> > > > As I increase the number of threads, the load average shown by top
>> goes
>> > > up
>> > > > to as high as 80%. But if I keep the number of threads low (~10), the
>> > > load
>> > > > average never goes beyond ~8). So probably thats the number of
>> requests
>> > I
>> > > > can expect Solr to serve concurrently on this index size with this
>> > > > hardware.
>> > > >
>> > > > Can anyone give a general opinion as to how much hardware should be
>> > > > sufficient for a Solr deployment with an index size of ~43GB,
>> > containing
>> > > > around 2.5 million documents? I'm expecting it to serve at least 20
>> > > > requests
>> > > > per second. Any experiences?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West <
>> > tburtonw...@gmail.com
>> > > > >wrote:
>> > > >
>> > > > >
>> > > > > How much of your memory are you allocating to the JVM and how much
>> > are
>> > > > you
>> > > > > leaving free?
>> > > > >
>> > > > > If you don't leave enough free memory for the OS, the OS won't have
>> a
>> > > > large
>> > > > > enough disk cache, and you will be hitting the disk for lots of
>> > > queries.
>> > > > >
>> > > > > You might want to monitor your Disk I/O using iostat and look at
>> the
>> > > > > iowait.
>> > > > >
>> > > > > If you are doing phrase queries and your *prx file is significantly
>> > > > larger
>> > > > > than the available memory then when a slow phrase query hits Solr,
>> > the
>> > > > > contention for disk I/O with other queries could be slowing
>> > everything
>> > > > > down.
>> > > > > You might also want to look at the 90th and 99th percentile query
>> > times
>> > > > in
>> > > > > addition to the average. For our large indexes, we found at least
>> an
>> > > > order
>> > > > > of magnitude difference between the average and 99th percentile
>> > > queries.
>> > > > > Again, if Solr gets hit with a few of those 99th percentile slow
>> > > queries
>> > > > > and
>> > > > > your not hitting your caches, chances are you will see serious
>> > > contention
>> > > > > for disk I/O..
>> > > > >
>> > > > > Of course if you don't see any waiting on i/o, then your bottleneck
>> > is
>> > > > > probably somewhere else:)
>> > > > >
>> > > > > See
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
>> > > > > for more background on our experience.
>> > > > >
>> > > > > Tom Burton-West
>> > > > > University of Michigan Library
>> > > > > www.hathitrust.org
>> > > > >
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel <
>> > > siddhantg...@gmail.com
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > I have an index corresponding to ~2.5 million documents. The
>> > index
>> > > > size
>> > > > > > is
>> > > > > > > 43GB. The configuration of the machine which is running Solr is
>> -
>> > > > Dual
>> > > > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB
>> > > > cache,
>> > > > > > 8GB
>> > > > > > > RAM, and 250 GB HDD.
>> > > > > > >
>> > > > > > > I'm observing a strange trend in the queries that I send to
>> Solr.
>> > > The
>> > > > > > query
>> > > > > > > times for queries that I send earlier is much lesser than the
>> > > queries
>> > > > I
>> > > > > > > send
>> > > > > > > afterwards. For instance, if I write a script to query solr
>> 5000
>> > > > times
>> > > > > > > (with
>> > > > > > > 5000 distinct queries, most of them containing not more than
>> 3-5
>> > > > words)
>> > > > > > > with
>> > > > > > > 10 threads running in parallel, the average times for queries
>> > goes
>> > > > from
>> > > > > > > ~50ms in the beginning to ~6000ms. Is this expected or is there
>> > > > > > something
>> > > > > > > wrong with my configuration. Currently I've configured the
>> > > > > > queryResultCache
>> > > > > > > and the documentCache to contain 2048 entries (hit ratios for
>> > both
>> > > is
>> > > > > > close
>> > > > > > > to 50%).
>> > > > > > >
>> > > > > > > Apart from this, a general question that I want to ask is that
>> is
>> > > > such
>> > > > > a
>> > > > > > > hardware enough for this scenario? I'm aiming at achieving
>> around
>> > > 20
>> > > > > > > queries
>> > > > > > > per second with the hardware mentioned above.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > --
>> > > > > > > - Siddhant
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > - Siddhant
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > View this message in context:
>> > > > >
>> > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
>> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > - Siddhant
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > - Siddhant
>> >
>>
>
>
>
> --
> - Siddhant
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to