Re: HTimedOutException and cluster not working

Jason Wee Wed, 19 Sep 2012 21:05:33 -0700

Hi Aaron, thank you for comment.

>    It may be.
>    If you had a lot a CF's and cassandra tried to flush more than
memtable_flush_queue_size at once.


We created 6 keyspaces where maximum CF in a keyspace is 5 CF. The setting
'memtable_flush_queue_size' is using the default value which is 4.

>    Maybe sort of.
>    The flush writer will block if it cannot flush data fast enough. So it
depends on the number of CF's being flushed and the speed it is written it.
It's unlikely that compaction is hurting that much on one machine.

Good comment. I managed to go through the log in node-3 and found out
GCInspector try to free up the heap size. It continue flushing a lot of CFs
as shown in the log and then no more showing flushing cf in the log but the
following two lines continue to shown until we restart node-3 as that time,
the hector client showing HTimedOutException and fail on all the nodes.

ERROR [Selector-Thread-1] 2012-09-14 07:43:07,795 TNonblockingServer.java
(line 467) Read an invalid frame size of 0. Are you using TFramedTransport
on the client side?
ERROR [Selector-Thread-0] 2012-09-14 07:43:08,311 TNonblockingServer.java
(line 467) Read an invalid frame size of 0. Are you using TFramedTransport
on the client side?

>    Cannot hurt to try.

Thanks, will start with "iostat -x -m 1" and use the output iostat, value
in column wMB/s as a start to set.


>    Monitoring of whatever flavour. e.g.
http://www.datastax.com/products/opscenter
>    Look in the logs for nodes thinking node-3 is DOWN

Yes, we do have opscenter monitoring for the cluster. what we noticed in
opscenter, node-3 is shown doing compaction and it is up.
In the opscenter performance metrics, graph 'Write Request Latency' spike
to maximum 453652.23us/op where usually the write latency hovering on
average at 3000us/op.




>    I forgot to ask, what consistency level are you using for writes ?
>    Have you checked the disk health on node-3 ?

consistency level for write? mentioned in the first thread, we are using CL
ONE. But if you mean read CL? for read, we are using CL QUORUM.
no, will install smartmontools.


On Thu, Sep 20, 2012 at 6:56 AM, aaron morton <aa...@thelastpickle.com>wrote:

>
> No, all keyspaces that we created do not have secondary indexes. So
> probably the settings 'memtable_flush_queue_size' is not relevant?
>
> It may be.
> If you had a lot a CF's and cassandra tried to flush more than
> memtable_flush_queue_size at once.
>
> one would think that the compaction would probably hitting
> the limit I/O of the system and hence, the FlushWriter has to be block.
>
> Maybe sort of.
> The flush writer will block if it cannot flush data fast enough. So it
> depends on the number of CF's being flushed and the speed it is written it.
> It's unlikely that compaction is hurting that much on one machine.
>
> nodetool has a command 'setcompactionthroughput', would setting
> the throughput to a lower number to a number lower than 16 helps?
>
> Cannot hurt to try.
>
> What other methods to find out if all the nodes in the system is in OK or
> problem from cassandra point of view?
>
> Monitoring of whatever flavour. e.g.
> http://www.datastax.com/products/opscenter
>
> Look in the logs for nodes thinking node-3 is DOWN
>
> when the logging in cassandra is enable with DEBUG, here is the log I got
> for node-1
>
> This is node-1 failing to write hints to node-3.
>
>
> I forgot to ask, what consistency level are you using for writes ?
> Have you checked the disk health on node-3 ?
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/09/2012, at 1:10 AM, Jason Wee <peich...@gmail.com> wrote:
>
> Hi Aaron, thank you for your reply. Please read inline comment.
>
> On Tue, Sep 18, 2012 at 7:36 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> What version are you on ?
>>
> cassandra version 1.0.8
>
>>
>>  HTimedOutException is logged for all the nodes.
>>
>> TimedOutException happens when less than CL replica nodes respond to the
>> coordinator in time.
>> You could get the error from all nodes in your cluster if the 3 nodes
>> that store the key are having problems.
>>
> we have check the system load and tpstats for all the nodes in the
> cluster. Only node 3 having high active and pending
> count for MutationStage and FlushWriter.
>
>
>> MutationStage 16 2177067 879092633 0 0
>>
>> This looks like mutations are blocked or running very very slowly.
>>
>> FlushWriter 0 0 5616 0 1321
>>
>> The All Timed Blocked number means there were 1,321 times a thread tried
>> to flush a memtable but the queue of flushers was full. Do you use
>> secondary indexes ? If so take a look at the comments for
>> memtable_flush_queue_size in the yaml file.
>>
>> No, all keyspaces that we created do not have secondary indexes. So
> probably the settings 'memtable_flush_queue_size' is not relevant?
>
> What I interpret based on the output as above and the output of
> compactionstats on node-3, one would think that the compaction would
> probably hitting
> the limit I/O of the system and hence, the FlushWriter has to be block. I
> read further, nodetool has a command 'setcompactionthroughput', would
> setting
> the throughput to a lower number to a number lower than 16 helps?
>
>>   and cluster settings, it should be possible in this scenario, write
>> success
>>   on one of the nodes even though node-3 is too busy or failing for any
>> reason?
>>
>> Yes.
>> If only one node fails to respond the write should succeed. If you got a
>> TimedOut with CL ONE it sounds like more nodes were having problems.
>>
> We check the nodes by running command like top or nodetool cfstats on all
> the nodes in the cluster. The load for the rest of the nodes all below 1
> and there is no
> active or pending count for all the stages. What other methods to find out
> if all the nodes in the system is in OK or problem from cassandra point of
> view? Because
> if there is a concrete method to gauge all the nodes, we will have better
> clues where exactly is the problem.
>
>>
>> * when hector client failover to other nodes, basically all the nodes
>> fail, why
>>   is this so?
>>
>> Sorry I don't understand this question.
>>
>
> 14.Sep 12:14:04,430 WARN HConnectionManager [operateWithFailover]: Could
> not fullfill request on this host CassandraClient<node-1.example.com
> :9160-88909>
> 14.Sep 12:14:04,430 WARN HConnectionManager [operateWithFailover]:
> Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException:
> TimedOutException()
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
> ..
> ..
> ..
> 14.Sep 12:14:04,302 WARN HConnectionManager [operateWithFailover]: Could
> not fullfill request on this host CassandraClient<node-2.example.com
> :9160-89196>
> 14.Sep 12:14:04,302 WARN HConnectionManager [operateWithFailover]:
> Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException:
> TimedOutException()
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
>
>
> the log show that similar log with hector trying to failover to all the
> nine nodes.
>
>>
>> * what factors that increase MutationStage active and pending values?
>>
>> Check the log for ERRORs
>> Check for failing or overloaded IO.
>> See comment above about memtable flush queue size.
>>
>> when the logging in cassandra is enable with DEBUG, here is the log I got
> for node-1
>
> INFO [HintedHandoff:1] 2012-09-14 12:27:38,955 HintedHandOffManager.java
> (line 354) Timed out replaying hints to /node-3; aborting further deliveries
> INFO [HintedHandoff:1] 2012-09-14 12:27:38,956 HintedHandOffManager.java
> (line 373) Finished hinted handoff of 0 rows to endpoint /node-3
>
> DEBUG [RPC-Thread:577363] 2012-09-14 12:52:12,484 StorageProxy.java (line
> 212) Write timeout java.util.concurrent.TimeoutException for one (or more)
> of:
> DEBUG [RPC-Thread:577363] 2012-09-14 12:52:12,484 CassandraServer.java
> (line 648) ... timed out
>
>
>> Hope that helps.
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/09/2012, at 4:24 PM, Jason Wee <peich...@gmail.com> wrote:
>>
>> Hello,
>>
>> A context to our environment, we have a clusters of 9 nodes with a few
>> keyspaces. The client write to the cluster with consistency level of one to
>> a keyspace in the cluster with a replication factor of 3. The hector client
>> is configured such that all the nodes in cluster is specified and so that
>> we would want to ensure that at any write request, two nodes, can fail and
>> one write is succcess to the cluster node.
>>
>> However, under certain situation, we seen in the log, HTimedOutException
>> is logged during writing to the cluster. Hector client thus failover to the
>> next node in the cluster but what we noticed is that, the same exception,
>> HTimedOutException is logged for all the nodes. This result that the
>> cluster is not working as a whole. Logically, we checked all the nodes in
>> the cluster for load. Only node-3 seem to have high pending MutationStage
>> when nodetool tpstats is run. Other nodes are fine with 0 active and 0
>> pending for all the stages.
>>
>> /nodetool -h localhost tpstats
>> Pool Name Active Pending Completed Blocked All time blocked
>> ReadStage 0 0 11116983 0 0
>> RequestResponseStage 0 0 1252368951 0 0
>> MutationStage 16 2177067 879092633 0 0
>> ReadRepairStage 0 0 3648106 0 0
>> ReplicateOnWriteStage 0 0 33722610 0 0
>> GossipStage 0 0 20504608 0 0
>> AntiEntropyStage 0 0 1197 0 0
>> MigrationStage 0 0 89 0 0
>> MemtablePostFlusher 0 0 5659 0 0
>> StreamStage 0 0 296 0 0
>> FlushWriter 0 0 5616 0 1321
>> MiscStage 0 0 5964 0 0
>> AntiEntropySessions 0 0 88 0 0
>> InternalResponseStage 0 0 27 0 0
>> HintedHandoff 1 2 5976 0 0
>>
>> Message type Dropped
>> RANGE_SLICE 0
>> READ_REPAIR 0
>> BINARY 0
>> READ 178
>> MUTATION 17467
>> REQUEST_RESPONSE 0
>>
>> We proceed to check if there is any compaction in node-3 and found out
>> the
>> following:
>>
>> ./nodetool -hlocalhost compactionstats
>> pending tasks: 196
>> compaction type keyspace column family bytes compacted bytes total
>> progress
>> Cleanup MyKeyspace MyCF 6946398685 10230720119 67.90%
>>
>>
>> Question:
>> * with a replication factor of 3 in the keyspace and client write
>> consistency
>>   level of one, in the situation above, and the current hector client
>> settings
>>   and cluster settings, it should be possible in this scenario, write
>> success
>>   on one of the nodes even though node-3 is too busy or failing for any
>> reason?
>>
>> * when hector client failover to other nodes, basically all the nodes
>> fail, why
>>   is this so?
>>
>> * what factors that increase MutationStage active and pending values?
>>
>> Thank you for any comments and insight
>>
>> Regards,
>> Jason
>>
>>
>>
>
>

Re: HTimedOutException and cluster not working

Reply via email to