Re: Cluster not accepting insert while one node is down

Wei Zhu Thu, 14 Feb 2013 12:36:08 -0800

>From the exception, looks like astyanax didn't even try to call Cassandra. My 
>guess would be astyanax is token aware, it detects the node is down and it 
>doesn't even try. If you use Hector, it might try to write since it's not 
>token aware. But As Byran said, it eventually will fail. I guess hinted hand 
>off won't help since the write doesn't satisfy CL.ONE.



________________________________
 From: Bryan Talbot <btal...@aeriagames.com>
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:30 AM
Subject: Re: Cluster not accepting insert while one node is down
 

Generally data isn't written to whatever node the client connects to.  In your 
case, a row is written to one of the nodes based on the hash of the row key.  
If that one replica node is down, it won't matter which coordinator node you 
attempt a write with CL.ONE: the write will fail.

If you want the write to succeed, you could do any one of: write with CL.ANY, 
increase RF to 2+, write using a row key that hashes to an UP node.

-Bryan




On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

I will let commiters or anyone that has knowledge on Cassandra internal answer 
this.
>
>
>From what I understand, you should be able to insert data on any up node with 
>your configuration...
>
>Alain
>
>
>
>2013/2/14 Traian Fratean <traian.frat...@gmail.com>
>
>You're right as regarding data availability on that node. And my config, being 
>the default one, is not suited for a cluster.
>>What I don't get is that my 67 node was down and I was trying to insert in 66 
>>node, as can be seen from the stacktrace. Long story short: when node 67 was 
>>down I could not insert into any machine in the cluster. Not what I was 
>>expecting.
>>
>>
>>Thank you for the reply!Traian.
>>
>>
>>2013/2/14 Alain RODRIGUEZ <arodr...@gmail.com>
>>
>>Hi Traian,
>>>
>>>
>>>There is your problem. You are using RF=1, meaning that each node is 
>>>responsible for its range, and nothing more. So when a node goes down, do 
>>>the math, you just can't read 1/5 of your data.
>>>
>>>
>>>This is very cool for performances since each node owns its own part of the 
>>>data and any write or read need to reach only one node, but it removes the 
>>>SPOF, which is a main point of using C*. So you have poor availability and 
>>>poor consistency.
>>>
>>>
>>>An usual configuration with 5 nodes would be RF=3 and both CL (R&W) = QUORUM.
>>>
>>>
>>>This will replicate your data to 2 nodes + the natural endpoints (total of 
>>>3/5 nodes owning any data) and any read or write would need to reach at 
>>>least 2 nodes before being considered as being successful ensuring a strong 
>>>consistency.
>>>
>>>
>>>This configuration allow you to shut down a node (crash or configuration 
>>>update/rolling restart) without degrading the service (at least allowing you 
>>>to reach any data) but at cost of more data on each node.
>>>
>>>Alain
>>>
>>>
>>>
>>>2013/2/14 Traian Fratean <traian.frat...@gmail.com>
>>>
>>>I am using defaults for both RF and CL. As the keyspace was created using 
>>>cassandra-cli the default RF should be 1 as I get it from below:
>>>>
>>>>
>>>>[default@TestSpace] describe;
>>>>Keyspace: TestSpace:
>>>>  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>>>>  Durable Writes: true
>>>>    Options: [datacenter1:1]
>>>>
>>>>
>>>>As for the CL it the Astyanax default, which is 1 for both reads and writes.
>>>>
>>>>Traian.
>>>>
>>>>
>>>>
>>>>2013/2/13 Alain RODRIGUEZ <arodr...@gmail.com>
>>>>
>>>>We probably need more info like the RF of your cluster and CL of your reads 
>>>>and writes. Maybe could you also tell us if you use vnodes or not.
>>>>>
>>>>>
>>>>>I heard that Astyanax was not running very smoothly on 1.2.0, but a bit 
>>>>>better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for 
>>>>>C*1.2.
>>>>>
>>>>>Alain
>>>>>
>>>>>
>>>>>
>>>>>2013/2/13 Traian Fratean <traian.frat...@gmail.com>
>>>>>
>>>>>Hi,
>>>>>>
>>>>>>
>>>>>>I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java 
>>>>>>client with Astyanax 1.56.21.
>>>>>>When a node(10.60.15.67 - diiferent from the one in the stacktrace below) 
>>>>>>went down I get TokenRandeOfflineException and no other data gets 
>>>>>>inserted into any other node from the cluster.
>>>>>>
>>>>>>
>>>>>>Am I having a configuration issue or this is supposed to happen?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
>>>>>> - 
>>>>>>com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, 
>>>>>>latency=2057(2057), attempts=1]UnavailableException()
>>>>>>com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, 
>>>>>>latency=2057(2057), attempts=1]UnavailableException()
>>>>>>at 
>>>>>>com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>>>>>at 
>>>>>>com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>>>>>at 
>>>>>>com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
>>>>>>at 
>>>>>>com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
>>>>>>at 
>>>>>>com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>>>>>at 
>>>>>>com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>Thank you,
>>>>>>Traian.
>>>>>
>>>>
>>>
>>
>

Re: Cluster not accepting insert while one node is down

Reply via email to