> I ended up changing memtable_flush_queue_size to be large enough to contain 
> the biggest flood I saw.
As part of the flush process the “Switch Lock” is taken to synchronise around 
the commit log. This is a reentrant Read Write lock, the flush path takes the 
write lock and write path takes the read part. When flushing a CF the write 
lock is taken, the commit log is updated, and memtable is added to the flush 
queue. If the queue is full then the write lock will be held blocking the write 
threads from taking the read lock. 

There are a few reasons why the queue may be full, the simple one is the disk 
IO is not fast enough. Others are that the commit log segments are too small, 
there are lots of CF’s and/or lots of secondary indexes, or nodetoo flush is 
called frequently. 

Increasing the size of the queue is a good work around, and the correct 
approach if you have a lot of CF’s and/or secondary indexes. 

Hope that helps.


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/12/2013, at 6:03 am, Ken Hancock <ken.hanc...@schange.com> wrote:

> I ended up changing memtable_flush_queue_size to be large enough to contain 
> the biggest flood I saw.
> 
> I monitored tpstats over time using a collection script and an analysis 
> script that I wrote to figure out what my largest peaks were.  In my case, 
> all my mutation drops correlated with hitting the maximum 
> memtable_flush_queue_size and then mutations drops stopped as soon as the 
> queue size dropped below the max.
> 
> I threw the scripts up on github in case they're useful...
> 
> https://github.com/hancockks/tpstats
> 
> 
> 
> 
> On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev <shuty...@gmail.com> 
> wrote:
> Thanks for you answers.
> 
> srmore,
> 
> We are using v2.0.0. As for GC I guess it does not correlate in our case, 
> because we had cassandra running 9 days under production load and no dropped 
> messages and I guess that during this time there were a lot of GCs.
> 
> Ken,
> 
> I've checked the values you indicated. Here they are:
> 
> node1     6498
> node2     6476
> node3     6642
> 
> I guess this is not good :) What can we do to fix this problem?
> 
> 
> 2013/12/19 Ken Hancock <ken.hanc...@schange.com>
> We had issues where the number of CF families that were being flushed would 
> align and then block writes for a very brief period. If that happened when a 
> bunch of writes came in, we'd see a spike in Mutation drops.
> 
> Check nodetool tpstats for FlushWriter all time blocked.
> 
> 
> On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev <shuty...@gmail.com> 
> wrote:
> Hi all!
> 
> We've had a problem with cassandra recently. We had 2 one-minute periods when 
> we got a lot of timeouts on the client side (the only timeouts during 9 days 
> we are using cassandra in production). In the logs we've found corresponding 
> messages saying something about MUTATION messages dropped.
> 
> Now, the official faq [1] says that this is an indicator that the load is too 
> high. We've checked our monitoring and found out that 1-minute average cpu 
> load had a local peak at the time of the problem, but it was like 0.8 against 
> 0.2 usual which I guess is nothing for a 2 core virtual machine. We've also 
> checked java threads - there was no peak there and their count was reasonable 
> ~240-250.
> 
> Can anyone give us a hint - what should we monitor to see this "high load" 
> and what should we tune to make it acceptable?
> 
> Thanks in advance,
> Alexander
> 
> [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages
> 
> 
> 
> -- 
> Ken Hancock | System Architect, Advanced Advertising 
> SeaChange International 
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC 
> Office: +1 (978) 889-3329 |  ken.hanc...@schange.com | hancockks | hancockks  
> 
> 
> This e-mail and any attachments may contain information which is SeaChange 
> International confidential. The information enclosed is intended only for the 
> addressees herein and may not be copied or forwarded without permission from 
> SeaChange International.
> 
> 
> 
> 
> -- 
> Ken Hancock | System Architect, Advanced Advertising 
> SeaChange International 
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC 
> Office: +1 (978) 889-3329 |  ken.hanc...@schange.com | hancockks | hancockks  
> 
> 
> This e-mail and any attachments may contain information which is SeaChange 
> International confidential. The information enclosed is intended only for the 
> addressees herein and may not be copied or forwarded without permission from 
> SeaChange International.

Reply via email to