Hi Shawn;

You say that:

*... your documents are about 50KB each.  That would translate to an index
that's at least 25GB*

I know we can not say an exact size but what is the approximately ratio of
document size / index size according to your experiences?


2013/4/9 Shawn Heisey <s...@elyograg.org>

> On 4/9/2013 2:10 PM, Manuel Le Normand wrote:
>
>> Thanks for replying.
>> My config:
>>
>>     - 40 dedicated servers, dual-core each
>>     - Running Tomcat servlet on Linux
>>     - 12 Gb RAM per server, splitted half between OS and Solr
>>     - Complex queries (up to 30 conditions on different fields), 1 qps
>> rate
>>
>> Sharding my index was done for two reasons, based on 2 servers (4shards)
>> tests:
>>
>>     1. As index grew above few million of docs qTime raised greatly, while
>>     sharding the index to smaller pieces (about 0.5M docs) gave way better
>>     results, so I bound every shard to have 0.5M docs.
>>     2. Tests showed i was cpu-bounded during queries. As i have low qps
>> rate
>>     (emphasize: lower than expected qTime) and as a query runs
>> single-threaded
>>     on each shard, it made sense to accord a cpu to each shard.
>>
>> For the same amount of docs per shards I do expect a raise in total qTime
>> for the reasons:
>>
>>     1. The response should wait for the slowest shard
>>     2. Merging the responses from 40 different shards takes time
>>
>> What i understand from your explanation is that it's the merging that
>> takes
>> time and as qTime ends only after the second retrieval phase, the qTime on
>> each shard will take longer. Meaning during a significant proportion of
>> the
>> first query phase (right after the [id,score] are retieved), all cpu's are
>> idle except the response-merger thread running on a single cpu. I thought
>> of the merge as a simple sorting of [id,score], way more simple than
>> additional 300 ms cpu time.
>>
>> Why would a RAM increase improve my performances, as it's a
>> "response-merge" (CPU resource) bottleneck?
>>
>
> If you have not tweaked the Tomcat configuration, that can lead to
> problems, but if your total query volume is really only one query per
> second, this is probably not a worry for you.  A tomcat connector can be
> configured with a maxThreads parameter.  The recommended value there is
> 10000, but Tomcat defaults to 200.
>
> You didn't include the index sizes.  There's half a million docs per
> shard, but I don't know what that translates to in terms of MB or GB of
> disk space.
>
> On another email thread you mention that your documents are about 50KB
> each.  That would translate to an index that's at least 25GB, possibly
> more.  That email thread also says that optimization for you takes an hour,
> further indications that you've got some really big indexes.
>
> You're saying that you have given 6GB out of the 12GB to Solr, leaving
> only 6GB for the OS and caching.  Ideally you want to have enough RAM to
> cache the entire index, but in reality you can usually get away with
> caching between half and two thirds of the index.  Exactly what ratio works
> best is highly dependent on your schema.
>
> If my numbers are even close to right, then you've got a lot more index on
> each server than available RAM.  Based on what I can deduce, you would want
> 24 to 48GB of RAM per server.  If my numbers are wrong, then this estimate
> is wrong.
>
> I would be interested in seeing your queries.  If the complexity can be
> expressed as filter queries that get re-used a lot, the filter cache can be
> a major boost to performance.  Solr's caches in general can make a big
> difference.  There is no guarantee that caches will help, of course.
>
> Thanks,
> Shawn
>
>

Reply via email to