Alain,

The reduction of compaction is having significant impact lowering response
time, especially at the 90th percentile level, for us.

For the record, we are using AWS's i2.2xl instance types (these are ssd).
We were running compaction_throughput_mb_per_sec at 18. Now we are running
at 10. Latency variation for reads is hugely reduced. This is very
promising.

Thanks, Alain.

Best,
Carl


On Fri, Jun 26, 2015 at 7:40 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Here is something I wrote some time ago:
>
>
> http://planetcassandra.org/blog/interview/video-advertising-platform-teads-chose-cassandra-spm-and-opscenter-to-monitor-a-personalized-ad-experience/
>
> Monitoring absolutely necessary to understand what is happening in the
> system. There is no magic in there and if you find bottlenecks, you can
> think about how to alleviate things. I would say at least as much as the
> design of your data models.
>
> "I've lowered compaction threshhold from 18 to 10mb/s. Will see what
> happens."
> If you have no SSD and compactions are creating a bottleneck at the disk
> the disk, this looks reasonable as long as the "compactions pending" metric
> remains low enough.
>
> If it is a cpu issue and you have many cores, I would advice you to try
> lowering the concurrent_compactor: number. (by default 1 compactor per
>  core)
>
> Once again it will depend on were the pressure is. Anyway, you might want
> to do anything you will try on one node only to test it first. Also, one
> option at the time (or a couple that you believe would have a synergy), and
> monitor the evolutions.
>
> C*heers,
>
> Alain
>
> 2015-06-26 21:30 GMT+02:00 Carl Hu <m...@carlhu.com>:
>
>> Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
>> compaction threshhold from 18 to 10mb/s. Will see what happens.
>>
>> >  I hope you have a monitoring tool up and running and an easy way to
>> detect errors on your logs.
>>
>> We do not have this. What do you use for this?
>>
>> Thank you,
>> Carl
>>
>>
>> On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> "It is not possible to mix sequential repair and incremental repairs."
>>>
>>> I guess that is a system limitation, even if I am not sure of it (I
>>> don't have used C*2.1 yet)
>>>
>>> I would focus on tuning your repair by :
>>> - Monitoring performance / logs (see why the cluster hangs)
>>> - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
>>> list run it per table (
>>> http://www.datastax.com/dev/blog/advanced-repair-techniques)
>>>
>>> Depending on what's the root issue that makes hang your cluster it is
>>> hard to help you.
>>>
>>> - If CPU is a limit, then some tuning around compactions or GC might be
>>> needed (or a few more things)
>>> - if you have Disk IO limitations, you might want to add machines or
>>> tune compaction throughput
>>> - If your network is the issue, there are commands to tune the bandwidth
>>> used by streams.
>>>
>>> You need to troubleshot this and give us more informations. I hope you
>>> have a monitoring tool up and running and an easy way to detect errors on
>>> your logs.
>>>
>>> C*heers,
>>>
>>> Alain
>>>
>>> 2015-06-26 16:26 GMT+02:00 Carl Hu <m...@carlhu.com>:
>>>
>>>> Dear colleagues,
>>>>
>>>> We are using incremental repair and have noticed that every few
>>>> repairs, the cluster experiences pauses.
>>>>
>>>> We run the repair with the following command: nodetool repair -par -inc
>>>>
>>>> I have tried to run it not in parallel, but get the following error:
>>>> "It is not possible to mix sequential repair and incremental repairs."
>>>>
>>>> Does anyone have any suggestions?
>>>>
>>>> Many thanks in advance,
>>>> Carl
>>>>
>>>>
>>>
>>
>

Reply via email to