I ended up changing memtable_flush_queue_size to be large enough to contain
the biggest flood I saw.

I monitored tpstats over time using a collection script and an analysis
script that I wrote to figure out what my largest peaks were.  In my case,
all my mutation drops correlated with hitting the maximum
memtable_flush_queue_size and then mutations drops stopped as soon as the
queue size dropped below the max.

I threw the scripts up on github in case they're useful...

https://github.com/hancockks/tpstats




On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev <shuty...@gmail.com>wrote:

> Thanks for you answers.
>
> *srmore*,
>
> We are using v2.0.0. As for GC I guess it does not correlate in our case,
> because we had cassandra running 9 days under production load and no
> dropped messages and I guess that during this time there were a lot of GCs.
>
> *Ken*,
>
> I've checked the values you indicated. Here they are:
>
> node1     6498
> node2     6476
> node3     6642
>
> I guess this is not good :) What can we do to fix this problem?
>
>
> 2013/12/19 Ken Hancock <ken.hanc...@schange.com>
>
>> We had issues where the number of CF families that were being flushed
>> would align and then block writes for a very brief period. If that happened
>> when a bunch of writes came in, we'd see a spike in Mutation drops.
>>
>> Check nodetool tpstats for FlushWriter all time blocked.
>>
>>
>> On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev 
>> <shuty...@gmail.com>wrote:
>>
>>> Hi all!
>>>
>>> We've had a problem with cassandra recently. We had 2 one-minute periods
>>> when we got a lot of timeouts on the client side (the only timeouts during
>>> 9 days we are using cassandra in production). In the logs we've found
>>> corresponding messages saying something about MUTATION messages dropped.
>>>
>>> Now, the official faq [1] says that this is an indicator that the load
>>> is too high. We've checked our monitoring and found out that 1-minute
>>> average cpu load had a local peak at the time of the problem, but it was
>>> like 0.8 against 0.2 usual which I guess is nothing for a 2 core virtual
>>> machine. We've also checked java threads - there was no peak there and
>>> their count was reasonable ~240-250.
>>>
>>> Can anyone give us a hint - what should we monitor to see this "high
>>> load" and what should we tune to make it acceptable?
>>>
>>> Thanks in advance,
>>> Alexander
>>>
>>> [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages
>>>
>>
>>
>>
>> --
>>  *Ken Hancock *| System Architect, Advanced Advertising
>> SeaChange International
>> 50 Nagog Park
>> Acton, Massachusetts 01720
>> ken.hanc...@schange.com | www.schange.com | 
>> NASDAQ:SEAC<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
>>
>> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
>> LinkedIn] <http://www.linkedin.com/in/kenhancock>
>>
>> [image: SeaChange International]
>>  <http://www.schange.com/>This e-mail and any attachments may contain
>> information which is SeaChange International confidential. The information
>> enclosed is intended only for the addressees herein and may not be copied
>> or forwarded without permission from SeaChange International.
>>
>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAC<http://www.schange.com/en-US/Company/InvestorRelations.aspx>

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
 <http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Reply via email to