Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-23 Thread Daniel Seybold
3 (in the 5 node cluster) but the downtime in case of a node failure persists. I also attached two plots which show the results with the downtimes for using the larger VMs and setting the RF to 3 Any further comments much appreciated, Cheers, Daniel Am 09.11.2018 um

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Alexander Dejanovski
ive multiple-node failures. We have also tried a RF of 3 (in the 5 node > cluster) but the downtime in case of a node failure persists. > > > I also attached two plots which show the results with the downtimes for > using the larger VMs and setting the RF to 3 > > Any furth

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Daniel Seybold
survive multiple-node failures. We have also tried a RF of 3 (in the 5 node cluster) but the downtime in case of a node failure persists. I also attached two plots which show the results with the downtimes for using the larger VMs and setting the RF to 3 Any further comments much appreciated

RE: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-09 Thread Durity, Sean R
: Friday, November 09, 2018 5:49 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure Hi Apache Cassandra experts, we are running a set of availability evaluations under a write/read/update

Re: Node Failure Scenario

2017-11-15 Thread Anshu Vajpayee
;>>> - commitlog/ >>>> - saved_caches/ >>>> >>>> Forget rejoining with repair -- it will just cause more problems. >>>> Cheers! >>>> >>>> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee < >>>> anshu.vajp

Re: Node Failure Scenario

2017-11-14 Thread Jonathan Haddad
he contents of the following directories: >>> - data/ >>> - commitlog/ >>> - saved_caches/ >>> >>> Forget rejoining with repair -- it will just cause more problems. Cheers! >>> >>> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee <

Re: Node Failure Scenario

2017-11-14 Thread Anshu Vajpayee
ntents of the following directories: >> - data/ >> - commitlog/ >> - saved_caches/ >> >> Forget rejoining with repair -- it will just cause more problems. Cheers! >> >> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee > > wrote: >> >>> Hi

Re: Node Failure Scenario

2017-11-13 Thread Anthony Grasso
e following directories: > - data/ > - commitlog/ > - saved_caches/ > > Forget rejoining with repair -- it will just cause more problems. Cheers! > > On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee > wrote: > >> Hi All , >> >> There was a node failure in on

Re: Node Failure Scenario

2017-11-12 Thread Erick Ramirez
All , > > There was a node failure in one of production cluster due to disk > failure. After h/w recovery that node is noew ready be part of cluster, > but it doesn't has any data due to disk crash. > > > > I can think of following option : > > > > 1. repla

Node Failure Scenario

2017-11-12 Thread Anshu Vajpayee
Hi All , There was a node failure in one of production cluster due to disk failure. After h/w recovery that node is noew ready be part of cluster, but it doesn't has any data due to disk crash. I can think of following option : 1. replace the node with same. using replace_address 2

Re: Node failure

2017-10-06 Thread Jon Haddad
I’ve had a few use cases for downgrading consistency over the years. If you’re showing a customer dashboard w/ some Ad summary data, it’s great to be right, but showing a number that’s close is better than not being up. > On Oct 6, 2017, at 1:32 PM, Jeff Jirsa wrote: > > I think it was Brando

Re: Node failure

2017-10-06 Thread Jeff Jirsa
I think it was Brandon that used to make a pretty compelling argument that downgrading consistency on writes was always wrong, because if you can tolerate the lower consistency, you should just use the lower consistency from the start (because cassandra is still going to send the write to all repli

Re: Node failure

2017-10-06 Thread Jim Witschey
> Modern client drivers also have ways to “downgrade” the CL of requests, in > case they fail. E.g. for the Java driver: > http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html Quick note from a driver dev's perspective: Mark,

RE: Node failure

2017-10-06 Thread Mark Furlong
I’ll check to see what our app is using. Thanks Mark 801-705-7115 office From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Friday, October 6, 2017 12:25 PM To: user@cassandra.apache.org Subject: RE: Node failure QUORUM should succeed with a RF=3 and 2 of 3 nodes

RE: Node failure

2017-10-06 Thread Steinmaurer, Thomas
/DowngradingConsistencyRetryPolicy.html Thomas From: Mark Furlong [mailto:mfurl...@ancestry.com] Sent: Freitag, 06. Oktober 2017 19:43 To: user@cassandra.apache.org Subject: RE: Node failure Thanks for the detail. I’ll have to remove and then add one back in. It’s my consistency levels that may bite me in the interim. Thanks

RE: Node failure

2017-10-06 Thread Mark Furlong
We are using quorum on our reads and writes. Thanks Mark 801-705-7115 office From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Friday, October 6, 2017 11:30 AM To: cassandra Subject: Re: Node failure If you write with CL:ANY, CL:ONE (or LOCAL_ONE), and one node fails, you may lose data that

RE: Node failure

2017-10-06 Thread Mark Furlong
Thanks for the detail. I’ll have to remove and then add one back in. It’s my consistency levels that may bite me in the interim. Thanks Mark 801-705-7115 office From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Friday, October 6, 2017 11:29 AM To: cassandra Subject: Re: Node failure There

Re: Node failure

2017-10-06 Thread Jeff Jirsa
ld be aware of? > > > > *Thanks* > > *Mark* > > *801-705-7115 <(801)%20705-7115> office* > > > > *From:* Akshit Jain [mailto:akshit13...@iiitd.ac.in] > *Sent:* Friday, October 6, 2017 11:25 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Node f

RE: Node failure

2017-10-06 Thread Mark Furlong
The only time I’ll have a problem is if I have a do a read all or write all. Any other gotchas I should be aware of? Thanks Mark 801-705-7115 office From: Akshit Jain [mailto:akshit13...@iiitd.ac.in] Sent: Friday, October 6, 2017 11:25 AM To: user@cassandra.apache.org Subject: Re: Node failure

Re: Node failure

2017-10-06 Thread Akshit Jain
You replace it with a new node and bootstraping happens.The new node receives data from other two nodes. Rest depends on the scenerio u are asking for. Regards Akshit Jain B-Tech,2013124 9891724697 On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong wrote: > What happens when I have a 3 node cluster

Re: Node failure

2017-10-06 Thread Jeff Jirsa
There's a lot to talk about here, what's your exact question? - You can either remove it from the cluster or replace it. You typically remove it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to replace it. To replace, you'll start a new server with -Dcassandra.replace_ad

Node failure

2017-10-06 Thread Mark Furlong
What happens when I have a 3 node cluster with RF 3 and a node fails that needs to be removed? Mark Furlong Sr. Database Administrator mfurl...@ancestry.com M: 801-859-7427 O: 801-705-7115 1300 W Traverse Pkwy Lehi, UT 84043 ​[http://c.mfcreative.com/mars/emai

RE: Node failure Due To Very high GC pause time

2017-07-13 Thread Durity, Sean R
distributed across all of your cluster. And you want to delete whole partitions, if at all possible. (Or at least a reasonable number of deletes within a partition.) Sean Durity From: Karthick V [mailto:karthick...@zohocorp.com] Sent: Monday, July 03, 2017 12:47 PM To: user Subject: Re: Node failure Due

RE: Node failure Due To Very high GC pause time

2017-07-03 Thread ZAIDI, ASAD A
your tables with [tombstones], A quick [grep –i tombstone /path/to/system.log] command would tell you what objects are suffering with tombstones! From: Karthick V [mailto:karthick...@zohocorp.com] Sent: Monday, July 03, 2017 11:47 AM To: user Subject: Re: Node failure Due To Very high GC pa

Re: Node failure Due To Very high GC pause time

2017-07-03 Thread Karthick V
Hi Bryan, Thanks for your quick response. We have already tuned our memory and GC based on our hardware specification and it was working fine until yesterday, i.e before facing the below specified delete request. As you specified we will once again look into our GC & memory confi

Re: Node failure Due To Very high GC pause time

2017-07-03 Thread Bryan Cheng
This is a very antagonistic use case for Cassandra :P I assume you're familiar with Cassandra and deletes? (eg. http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html, http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html ) That being said, are you gi

Node failure Due To Very high GC pause time

2017-07-03 Thread Karthick V
Hi, Recently In my test Cluster I faced a outrageous GC activity which made the Node unreachable inside the cluster itself. Scenario : In a Partition of 5Million rows we read first 500 (by giving the starting range) and delete the same 500 again.The same has been done recursiv

Fwd: Node failure due to Incremental repair

2017-02-28 Thread Karthick V
Hi, Recently I have enabled incremental repair in one of my test cluster setup which consists of 8 nodes(DC1 - 4, DC2 - 4) with C* version of 2.1.13. Currently, I am facing node failure scenario in this cluster with the following exception during the incremental repair process exception occurred

Re: Handle Node Failure with Repair -pr

2015-12-07 Thread Anuj Wadehra
Hi All !!! Any comments on the repair -pr scenarios..please share how you deal with such scenarios.. Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Sat, 5 Dec, 2015 at 12:57 am Subject:Handle Node Failure with Repair -pr Hi Guys !! I need comm

Handle Node Failure with Repair -pr

2015-12-04 Thread Anuj Wadehra
Hi Guys !! I need comments on my understanding of repair -pr ..If you are using repair -pr in your cluster then following statements hold true: 1. If a node goes down for long time and your not sure when will it return, you must ensure that subrange repair for the defected node range is done

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Robert Coli
4.mbox/%3CCAEDUwd2rhFwVXiByccJ1-VrPOYbDtd0LWGnzpU4CxA2u=mi...@mail.gmail.com%3E tl;dr : It depends on what you mean by "minimum" and "survive". Most people consider the "minimum" to use QUORUM to "survive" a single node "failure" to be RF=N=3.

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Ken Hancock
Another nice resource... http://www.ecyrd.com/cassandracalculator/

Re: Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Ben Bromhead
tle of this thread has to be "Minimum cluster size to survive a > single node failure". > > > On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara > wrote: > Hi Everyone, > > First of all, apologies if the $subject was discussed previously in this list > b

Re: Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Prabath Abeysekara
Sorry, the title of this thread has to be "*Minimum cluster size to survive a single node failure*". On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara < prabathabeysek...@gmail.com> wrote: > Hi Everyone, > > First of all, apologies if the $subject was discussed

Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Prabath Abeysekara
tanding is correct, a *3 node Cassandra cluster* would survive a single node failure while the Replication Factor is set to 3 with consistency levels are assigned QUORUM for read/write operations. For example, let's consider the following configuration. * Number of nodes in the cluster : 3 * Replica

Re: Cassandra timeout on node failure

2014-01-23 Thread Robert Coli
On Thu, Jan 23, 2014 at 8:52 AM, Ankit Patel wrote: > We are seeing a weird issue with our Cassandra cluster(version 1.0.10). > We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas > of each other. All reads and writes are LOCAL_QOURUM. > Frankly I'm surprised that 1.0.10 i

Cassandra timeout on node failure

2014-01-23 Thread Ankit Patel
We are seeing a weird issue with our Cassandra cluster(version 1.0.10). We have 6 nodes(DC1:3, DC2:3) in our cluster. So all 6 nodes are replicas of each other. All reads and writes are LOCAL_QOURUM. We see that when one of the node in DC1 fails, we see timeout errors on th

Re: ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

2011-03-21 Thread Jonathan Ellis
I suggest upgrading to either 0.6.12 or 0.7.4 and re-testing. On Mon, Mar 21, 2011 at 12:52 PM, Markus Klems wrote: > Hi guys, > > we are currently benchmarking various configurations of an EC2-based > Cassandra cluster. This is our current setup: > > 1) 8 nodes where each node is an m1.xlarge EC

ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

2011-03-21 Thread Markus Klems
Hi guys, we are currently benchmarking various configurations of an EC2-based Cassandra cluster. This is our current setup: 1) 8 nodes where each node is an m1.xlarge EC2 instance 2) Cassandra version 0.6.5 3) Replication Factor = 3 4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT

Re: node failure, and automatic decommission (or removetoken)

2011-03-01 Thread Mimi Aluminium
, within the GCGraceSeconds period. If this cannot be done then >> nodetool decomission and removetoken are the recommended approach. >> >> In your example though, with 3 nodes and an RF of 3 your cluster can >> sustain a single node failure and continue to operate at CL Quorum for r

Re: node failure, and automatic decommission (or removetoken)

2011-02-28 Thread Aaron Morton
> I have a question about a tool or a wrapper that perform automatic data move >> upon node failure? >> Assuming I have 3 nodes with a replication factor of 3. In case of one node >> failure, does the third replica (that was located before on the failed node >> ) re-

Re: node failure, and automatic decommission (or removetoken)

2011-02-28 Thread Mimi Aluminium
n are the recommended approach. > > In your example though, with 3 nodes and an RF of 3 your cluster can > sustain a single node failure and continue to operate at CL Quorum for reads > and writes. So there is no immediate need to move data. > > Does that help? > > Aaron > > O

Re: node failure, and automatic decommission (or removetoken)

2011-02-28 Thread aaron morton
single node failure and continue to operate at CL Quorum for reads and writes. So there is no immediate need to move data. Does that help? Aaron On 28 Feb 2011, at 07:41, Mimi Aluminium wrote: > Hi, > I have a question about a tool or a wrapper that perform automatic data move > upon nod

node failure, and automatic decommission (or removetoken)

2011-02-27 Thread Mimi Aluminium
Hi, I have a question about a tool or a wrapper that perform automatic data move upon node failure? Assuming I have 3 nodes with a replication factor of 3. In case of one node failure, does the third replica (that was located before on the failed node ) re-appears on one the of live nodes? I am

Re: How does node failure detection work in Cassandra?

2011-02-25 Thread Brandon Williams
On Fri, Feb 25, 2011 at 5:32 PM, tijoriwala.ritesh < tijoriwala.rit...@gmail.com> wrote: > > Hi, > I would like to know internals of how does node failure detection work in > Cassandra? http://bit.ly/phi_accrual > Is there a concept of Coordinator/Election? No. -Brandon

How does node failure detection work in Cassandra?

2011-02-25 Thread tijoriwala.ritesh
Hi, I would like to know internals of how does node failure detection work in Cassandra? And in absence of any network partition, do all nodes see the same view of live nodes? Is there a concept of Coordinator/Election? If yes, how is merge handled after network partition heals? thanks, Ritesh

Re: seed node failure crash the whole cluster

2011-02-07 Thread TSANG Yiu Wing
i will continue the issue here: http://groups.google.com/group/scale7/browse_thread/thread/dd74f1d6265ae2e7 thanks On Tue, Feb 8, 2011 at 7:44 AM, Dan Washusen wrote: > Hi, > I've added some comments and questions inline. > > Cheers, > Dan > On 8 February 2011 10:00, Jonathan Ellis wrote: >>

Re: seed node failure crash the whole cluster

2011-02-07 Thread Dan Washusen
Hi, I've added some comments and questions inline. Cheers, Dan On 8 February 2011 10:00, Jonathan Ellis wrote: > On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing wrote: > > cassandra version: 0.7 > > > > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT > > > > cluster: 3 machines (A, B, C)

Re: seed node failure crash the whole cluster

2011-02-07 Thread Jonathan Ellis
On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing wrote: > cassandra version: 0.7 > > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT > > cluster: 3 machines (A, B, C) > > details: > it works perfectly when all 3 machines are up and running > > but if the seed machine is down, the problems hap

seed node failure crash the whole cluster

2011-02-06 Thread TSANG Yiu Wing
cassandra version: 0.7 client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT cluster: 3 machines (A, B, C) details: it works perfectly when all 3 machines are up and running but if the seed machine is down, the problems happen: 1) new client connection cannot be established 2) if a client ke

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Jordan Pittier
>Could you gimme some clue plz ? "This is a known problem with 0.5 that was addressed in 0.6." It seems you posted twice for the same issue On Wed, Apr 7, 2010 at 6:12 PM, Oleg Anastasjev wrote: > > Jonathan Ellis gmail.com> writes: > > > > > Isn't this the same question I just answered? >

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Oleg Anastasjev
Jonathan Ellis gmail.com> writes: > > Isn't this the same question I just answered? > Umm, I am not sure. I looked over last 3 days of your replies and did not found my case. Could you gimme some clue plz ?

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Jonathan Ellis
ssandra.db.RowMutation.apply(RowMutation.java:203) > at org.apache.cassandra.db.RowMutationVerbHandler.doVerb( > RowMutationVerbHandler.java:58) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask( > ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > Can I do something to have cassandra cluster to tolerate single node failure > better ? > > > > >

Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Oleg Anastasjev
net.MessageDeliveryTask.run(MessageDeliveryTask.java:38) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask( ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Can I do something to have cassandra cluster to t

Re: Cassandra cluster does not tolerate single node failure

2010-04-07 Thread Jonathan Ellis
slow write latency. > > Interesting, that issuing "nodeprobe flush system" command to 46 and 47 nodes > speedup processing for a short period of time, but then it quickly returns > bakc > to 66 ops/second. > > I suspect, that these nodes create very much subcolumns in supercolumn of CF > HintsColumnFamily in memory table. > > What can i do to have cassandra cluster to tolerate single node failure > better ? > > > > > >

Cassandra cluster does not tolerate single node failure

2010-04-07 Thread Oleg Anastasjev
What can i do to have cassandra cluster to tolerate single node failure better ?

Re: Question about node failure...

2010-04-05 Thread Jonathan Ellis
On Mon, Apr 5, 2010 at 5:20 PM, Rob Coli wrote: > On 4/5/10 2:11 PM, Jonathan Ellis wrote: >> >> On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta >>  wrote: >>> >>> Perhaps it would be good to have convenience workflow for replacing >>> broken host ("squashing lemons")? I would assume that most com

Re: Question about node failure...

2010-04-05 Thread Rob Coli
On 4/5/10 2:11 PM, Jonathan Ellis wrote: On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta wrote: Perhaps it would be good to have convenience workflow for replacing broken host ("squashing lemons")? I would assume that most common use [ snip ] Does anyone have numbers on how badly "nodetool re

Re: Question about node failure...

2010-04-05 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 6:42 PM, Tatu Saloranta wrote: > Perhaps it would be good to have convenience workflow for replacing > broken host ("squashing lemons")? I would assume that most common use > case is to effectively replace host that can't be repaired (or perhaps > it might sometimes be best

Re: Question about node failure...

2010-03-29 Thread Tatu Saloranta
On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert wrote: > So,  what does "anti-entropy repair" do then? Fix discrepancies between live nodes? (caused by transient failures presumably) > Sounds like you have to 'decommission' the dead node, then I thought run > 'nodeprobe repair' to get the data adj

Re: Question about node failure...

2010-03-29 Thread Ned Wolpert
e in this case... On Mon, Mar 29, 2010 at 10:32 AM, Jonathan Ellis wrote: > On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert > wrote: > > Folks- > > > > Can someone point out what happens during a node failure. Here is the > > Specific usecase: > > >

Re: Question about node failure...

2010-03-29 Thread Jonathan Ellis
On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert wrote: > Folks- > > Can someone point out what happens during a node failure. Here is the > Specific usecase: > >   - Cassandra cluster with 4 nodes, replication factor of 3 >   - One node fails. >   - At this point, data

Question about node failure...

2010-03-29 Thread Ned Wolpert
Folks- Can someone point out what happens during a node failure. Here is the Specific usecase: - Cassandra cluster with 4 nodes, replication factor of 3 - One node fails. - At this point, data that existed on the one failed node has copies on 2 live nodes. - The failed node never comes