Re: Cluster not accepting insert while one node is down

Alain RODRIGUEZ Thu, 14 Feb 2013 01:26:16 -0800

Hi Traian,

There is your problem. You are using RF=1, meaning that each node is
responsible for its range, and nothing more. So when a node goes down, do
the math, you just can't read 1/5 of your data.


This is very cool for performances since each node owns its own part of the
data and any write or read need to reach only one node, but it removes the
SPOF, which is a main point of using C*. So you have poor availability and
poor consistency.

An usual configuration with 5 nodes would be RF=3 and both CL (R&W) =
QUORUM.

This will replicate your data to 2 nodes + the natural endpoints (total of
3/5 nodes owning any data) and any read or write would need to reach at
least 2 nodes before being considered as being successful ensuring a strong
consistency.

This configuration allow you to shut down a node (crash or configuration
update/rolling restart) without degrading the service (at least allowing
you to reach any data) but at cost of more data on each node.

Alain


2013/2/14 Traian Fratean <traian.frat...@gmail.com>

> I am using defaults for both RF and CL. As the keyspace was created using
> cassandra-cli the default RF should be 1 as I get it from below:
>
> [default@TestSpace] describe;
> Keyspace: TestSpace:
>   Replication Strategy:
> org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>
> As for the CL it the Astyanax default, which is 1 for both reads and
> writes.
>
> Traian.
>
>
> 2013/2/13 Alain RODRIGUEZ <arodr...@gmail.com>
>
>> We probably need more info like the RF of your cluster and CL of your
>> reads and writes. Maybe could you also tell us if you use vnodes or not.
>>
>> I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
>> better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
>> C*1.2.
>>
>> Alain
>>
>>
>> 2013/2/13 Traian Fratean <traian.frat...@gmail.com>
>>
>>> Hi,
>>>
>>> I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
>>> client with Astyanax 1.56.21.
>>> When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
>>> below) went down I get TokenRandeOfflineException and no other data gets
>>> inserted into *any other* node from the cluster.
>>>
>>> Am I having a configuration issue or this is supposed to happen?
>>>
>>>
>>> com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
>>> -
>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>> latency=2057(2057), attempts=1]UnavailableException()
>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>> latency=2057(2057), attempts=1]UnavailableException()
>>> at
>>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>>  at
>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>> at
>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
>>>  at
>>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
>>> at
>>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>>  at
>>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)
>>>
>>>
>>>
>>> Thank you,
>>> Traian.
>>>
>>
>>
>

Re: Cluster not accepting insert while one node is down

Reply via email to