> No, all keyspaces that we created do not have secondary indexes. So probably 
> the settings 'memtable_flush_queue_size' is not relevant? 
It may be. 
If you had a lot a CF's and cassandra tried to flush more than 
memtable_flush_queue_size at once. 

> one would think that the compaction would probably hitting
> the limit I/O of the system and hence, the FlushWriter has to be block.
Maybe sort of. 
The flush writer will block if it cannot flush data fast enough. So it depends 
on the number of CF's being flushed and the speed it is written it. It's 
unlikely that compaction is hurting that much on one machine. 

> nodetool has a command 'setcompactionthroughput', would setting
> the throughput to a lower number to a number lower than 16 helps?
Cannot hurt to try. 

> What other methods to find out if all the nodes in the system is in OK or 
> problem from cassandra point of view? 
Monitoring of whatever flavour. e.g. http://www.datastax.com/products/opscenter

Look in the logs for nodes thinking node-3 is DOWN

> when the logging in cassandra is enable with DEBUG, here is the log I got for 
> node-1
This is node-1 failing to write hints to node-3. 


I forgot to ask, what consistency level are you using for writes ? 
Have you checked the disk health on node-3 ?

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 1:10 AM, Jason Wee <peich...@gmail.com> wrote:

> Hi Aaron, thank you for your reply. Please read inline comment.
> 
> On Tue, Sep 18, 2012 at 7:36 PM, aaron morton <aa...@thelastpickle.com> wrote:
> What version are you on ?
> cassandra version 1.0.8 
> 
>>  HTimedOutException is logged for all the nodes. 
> TimedOutException happens when less than CL replica nodes respond to the 
> coordinator in time. 
> You could get the error from all nodes in your cluster if the 3 nodes that 
> store the key are having problems. 
> we have check the system load and tpstats for all the nodes in the cluster. 
> Only node 3 having high active and pending 
> count for MutationStage and FlushWriter. 
> 
> 
>> MutationStage 16 2177067 879092633 0 0
> This looks like mutations are blocked or running very very slowly. 
> 
>> FlushWriter 0 0 5616 0 1321
> The All Timed Blocked number means there were 1,321 times a thread tried to 
> flush a memtable but the queue of flushers was full. Do you use secondary 
> indexes ? If so take a look at the comments for memtable_flush_queue_size in 
> the yaml file. 
> 
> No, all keyspaces that we created do not have secondary indexes. So probably 
> the settings 'memtable_flush_queue_size' is not relevant? 
> 
> What I interpret based on the output as above and the output of 
> compactionstats on node-3, one would think that the compaction would probably 
> hitting
> the limit I/O of the system and hence, the FlushWriter has to be block. I 
> read further, nodetool has a command 'setcompactionthroughput', would setting
> the throughput to a lower number to a number lower than 16 helps?
>>   and cluster settings, it should be possible in this scenario, write 
>> success 
>>   on one of the nodes even though node-3 is too busy or failing for any 
>> reason? 
> Yes. 
> If only one node fails to respond the write should succeed. If you got a 
> TimedOut with CL ONE it sounds like more nodes were having problems. 
> We check the nodes by running command like top or nodetool cfstats on all the 
> nodes in the cluster. The load for the rest of the nodes all below 1 and 
> there is no
> active or pending count for all the stages. What other methods to find out if 
> all the nodes in the system is in OK or problem from cassandra point of view? 
> Because 
> if there is a concrete method to gauge all the nodes, we will have better 
> clues where exactly is the problem.
> 
>> * when hector client failover to other nodes, basically all the nodes fail, 
>> why
>>   is this so?
> Sorry I don't understand this question. 
> 
> 14.Sep 12:14:04,430 WARN HConnectionManager [operateWithFailover]: Could not 
> fullfill request on this host CassandraClient<node-1.example.com:9160-88909>
> 14.Sep 12:14:04,430 WARN HConnectionManager [operateWithFailover]: Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
> at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
> at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
> ..
> ..
> ..
> 14.Sep 12:14:04,302 WARN HConnectionManager [operateWithFailover]: Could not 
> fullfill request on this host CassandraClient<node-2.example.com:9160-89196>
> 14.Sep 12:14:04,302 WARN HConnectionManager [operateWithFailover]: Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
> at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
> at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
>  
> 
> the log show that similar log with hector trying to failover to all the nine 
> nodes.
> 
>> * what factors that increase MutationStage active and pending values?
> Check the log for ERRORs
> Check for failing or overloaded IO. 
> See comment above about memtable flush queue size. 
> 
> when the logging in cassandra is enable with DEBUG, here is the log I got for 
> node-1
> 
> INFO [HintedHandoff:1] 2012-09-14 12:27:38,955 HintedHandOffManager.java 
> (line 354) Timed out replaying hints to /node-3; aborting further deliveries
> INFO [HintedHandoff:1] 2012-09-14 12:27:38,956 HintedHandOffManager.java 
> (line 373) Finished hinted handoff of 0 rows to endpoint /node-3
> 
> DEBUG [RPC-Thread:577363] 2012-09-14 12:52:12,484 StorageProxy.java (line 
> 212) Write timeout java.util.concurrent.TimeoutException for one (or more) of:
> DEBUG [RPC-Thread:577363] 2012-09-14 12:52:12,484 CassandraServer.java (line 
> 648) ... timed out
>  
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/09/2012, at 4:24 PM, Jason Wee <peich...@gmail.com> wrote:
> 
>> Hello,
>> 
>> A context to our environment, we have a clusters of 9 nodes with a few 
>> keyspaces. The client write to the cluster with consistency level of one to 
>> a keyspace in the cluster with a replication factor of 3. The hector client 
>> is configured such that all the nodes in cluster is specified and so that we 
>> would want to ensure that at any write request, two nodes, can fail and one 
>> write is succcess to the cluster node.
>> 
>> However, under certain situation, we seen in the log, HTimedOutException is 
>> logged during writing to the cluster. Hector client thus failover to the 
>> next node in the cluster but what we noticed is that, the same exception, 
>> HTimedOutException is logged for all the nodes. This result that the cluster 
>> is not working as a whole. Logically, we checked all the nodes in the 
>> cluster for load. Only node-3 seem to have high pending MutationStage when 
>> nodetool tpstats is run. Other nodes are fine with 0 active and 0 pending 
>> for all the stages. 
>> 
>> /nodetool -h localhost tpstats
>> Pool Name Active Pending Completed Blocked All time blocked
>> ReadStage 0 0 11116983 0 0
>> RequestResponseStage 0 0 1252368951 0 0
>> MutationStage 16 2177067 879092633 0 0
>> ReadRepairStage 0 0 3648106 0 0
>> ReplicateOnWriteStage 0 0 33722610 0 0
>> GossipStage 0 0 20504608 0 0
>> AntiEntropyStage 0 0 1197 0 0
>> MigrationStage 0 0 89 0 0
>> MemtablePostFlusher 0 0 5659 0 0
>> StreamStage 0 0 296 0 0
>> FlushWriter 0 0 5616 0 1321
>> MiscStage 0 0 5964 0 0
>> AntiEntropySessions 0 0 88 0 0
>> InternalResponseStage 0 0 27 0 0
>> HintedHandoff 1 2 5976 0 0
>> 
>> Message type Dropped
>> RANGE_SLICE 0
>> READ_REPAIR 0
>> BINARY 0
>> READ 178
>> MUTATION 17467
>> REQUEST_RESPONSE 0
>> 
>> We proceed to check if there is any compaction in node-3 and found out the 
>> following:
>> 
>> ./nodetool -hlocalhost compactionstats
>> pending tasks: 196
>> compaction type keyspace column family bytes compacted bytes total progress
>> Cleanup MyKeyspace MyCF 6946398685 10230720119 67.90%
>> 
>> 
>> Question:
>> * with a replication factor of 3 in the keyspace and client write 
>> consistency 
>>   level of one, in the situation above, and the current hector client 
>> settings 
>>   and cluster settings, it should be possible in this scenario, write 
>> success 
>>   on one of the nodes even though node-3 is too busy or failing for any 
>> reason? 
>>   
>> * when hector client failover to other nodes, basically all the nodes fail, 
>> why
>>   is this so?
>>   
>> * what factors that increase MutationStage active and pending values?
>> 
>> Thank you for any comments and insight
>> 
>> Regards,
>> Jason
> 
> 

Reply via email to