Re: one node down and cluster works better

2020-04-13 Thread Osman Yozgatlıoğlu
Thanks Mehmet and Erick, I don't have any monitoring other than nodetool but I manage to see some disk errors cause exceptions. I changed faulty disk and performance ok now. Regards, Osman On Sun, 5 Apr 2020 at 03:17, Erick Ramirez wrote: > > With only 2 replicas per DC, it means you're likely

Re: one node down and cluster works better

2020-04-04 Thread Erick Ramirez
With only 2 replicas per DC, it means you're likely writing with a consistency level of either ONE or LOCAL_ONE. Everytime you hit the problematic node, the write performance drops. All other configurations being equal, this indicates an issue with the commitlog disk on the node. Get your

Re: one node down and cluster works better

2020-04-04 Thread mehmet bursali
Hi Osman,Do you use any monitoring solution such as prometheus on your cluster?  If yes, you should install and use cassandra exporter from the link below and examine some detailed metrics.https://github.com/criteo/cassandra_exporter   ndroid’de Yahoo Postadan gönderildi 15:53’’4e’ 4 Nis

one node down and cluster works better

2020-04-04 Thread Osman Yozgatlıoğlu
Hello, I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2 My insertion performance dropped somehow. I restarted nodes one by one and found one node degrades performance. Verified this node after problem occurs a couple of times. How can I continue to investigate? Regards,

Cassandra node down metric

2019-07-30 Thread Rahul Reddy
Hello, I'm using jmx metric node org_apache_cassandra_net_failuredetector_downendpointcount to monitor number of Cassandra nodes down. For any reason (aws schedule retirement) we decommission Cassandra node this metric shows the node down for 72 hours until the gossip clearead. We want keep

RE: Jmx metrics shows node down

2019-07-29 Thread ZAIDI, ASAD A
9, 2019 10:56 AM To: user@cassandra.apache.org Subject: Re: Jmx metrics shows node down Is there workaround to shorten 72 hours to something shorter?(you said by default, wondering if one can set a non-default value?) Thanks, Yuping On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin mailto:olek

Re: Jmx metrics shows node down

2019-07-29 Thread yuping wang
Is there workaround to shorten 72 hours to something shorter?(you said by default, wondering if one can set a non-default value?) Thanks, Yuping On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin wrote: > On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy wrote: > > Decommissioned 2 nodes from

Re: Jmx metrics shows node down

2019-07-29 Thread yuping wang
We have the same issue. We observed the JMX only cleared after exactly 72 hours too. On Jul 29, 2019, at 11:23 AM, Rahul Reddy wrote: And also system.peers table doesn't have the information on old nodes only ghost nodes to be there in JMX > On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy

Re: Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
And also system.peers table doesn't have the information on old nodes only ghost nodes to be there in JMX On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy wrote: > We removed many times nodes from a cluster but never seen the jmx metric > down stay for 72 hours. So it has to be completely removed

Re: Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
We removed many times nodes from a cluster but never seen the jmx metric down stay for 72 hours. So it has to be completely removed from gossip to show the metric as expected? This would be problem for using the metric to alert on call On Mon, Jul 29, 2019, 7:28 AM Oleksandr Shulgin <

Re: Jmx metrics shows node down

2019-07-29 Thread Oleksandr Shulgin
On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy wrote: > > Decommissioned 2 nodes from cluster nodetool status doesn't list the > nodes as expected but jmx metrics shows still those 2 nodes has down. > Nodetool gossip shows the 2 nodes in Left state. Why does my jmx still > shows those nodes down

Jmx metrics shows node down

2019-07-29 Thread Rahul Reddy
Hello, Decommissioned 2 nodes from cluster nodetool status doesn't list the nodes as expected but jmx metrics shows still those 2 nodes has down. Nodetool gossip shows the 2 nodes in Left state. Why does my jmx still shows those nodes down even after 24 hours. Cassandra version 3.11.3 ? Anything

2.1 cassandra 1 node down produces replica shortfall

2019-05-17 Thread Carl Mueller
Being one of our largest and unfortunately heaviest multi-tenant clusters, and our last 2.1 prod cluster, we are encountering not enough replica errors (need 2, only found 1) after only bringing down 1 node. 90 node cluster, 30/dc, dcs are in europe, asia, and us. AWS. Are there bugs for

Re: cqlsh COPY ... TO ... doesn't work if one node down

2018-07-01 Thread @Nandan@
CQL Copy command will not work in case if you are trying to copy from all NODES because COPY command will check all N nodes UP and RUNNING Status. If you want to complete then you have 2 options:- 1) Remove DOWN NODE from COPY command 2) Make it UP and NORMAL status. On Mon, Jul 2, 2018 at

Re: cqlsh COPY ... TO ... doesn't work if one node down

2018-07-01 Thread Anup Shirolkar
Hi, The error shows that, the cqlsh connection with down node is failed. So, you should debug why it happened. Although, you have mentioned other node in cqlsh command '10.0.0.154' my guess is, the down node was present in connection pool, hence it was attempted for connection. Ideally the

cqlsh COPY ... TO ... doesn't work if one node down

2018-06-29 Thread Dmitry Simonov
Hello! I have cassandra cluster with 5 nodes. There is a (relatively small) keyspace X with RF5. One node goes down. Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.82 253.64

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
@cassandra.apache.org Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster If you think that will fix the problem, maybe you could add a little more memory to each machine as a short term fix. From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Wednesday, March 28, 2018 5:24 AM To: user

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
If you think that will fix the problem, maybe you could add a little more memory to each machine as a short term fix. From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Wednesday, March 28, 2018 5:24 AM To: user@cassandra.apache.org Subject: 答复: 答复: 答复: A node down every day in a 6 nodes

答复: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Xiangfei Ni
Brotman <kenbrot...@yahoo.com.INVALID> 发送时间: 2018年3月28日 20:16 收件人: user@cassandra.apache.org 主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster David, Did you figure out what to do about the data model problem? It could be that your data files finally grow to the point that the data

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
model. Kenneth Brotman From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] Sent: Wednesday, March 28, 2018 4:46 AM To: 'user@cassandra.apache.org' Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster Was any change to hardware done around the time the problem started

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
-dt.com] Sent: Wednesday, March 28, 2018 4:40 AM To: user@cassandra.apache.org Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster Hi Kenneth, The cluster has been running for 4 months, The problem occurred from last week, Best Regards, 倪项菲/ David Ni 中移德电网络科技有限公司

答复: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Xiangfei Ni
: Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 发送时间: 2018年3月28日 19:34 收件人: user@cassandra.apache.org 主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster David, How long has the cluster been operating? How long has the problem been occurring? Kenneth Brotman From: Jeff Jirsa [mail

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman
David, How long has the cluster been operating? How long has the problem been occurring? Kenneth Brotman From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Tuesday, March 27, 2018 7:00 PM To: Xiangfei Ni Cc: user@cassandra.apache.org Subject: Re: 答复: 答复: A node down every day in a 6

Re: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Jeff Jirsa
m> > 发送时间: 2018年3月27日 11:50 > 收件人: Xiangfei Ni <xiangfei...@cm-dt.com> > 抄送: user@cassandra.apache.org > 主题: Re: 答复: A node down every day in a 6 nodes cluster > > Only one node having the problem is suspicious. May be that your application > is improperly poo

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516 发件人: Xiangfei Ni <xiangfei...@cm-dt.com> 发送时间: 2018年3月28日 9:45 收件人: Jeff Jirsa <jji...@gmail.com> 抄送: user@cassandra.apache.org 主题: 答复: 答复: A node down every day in a 6 nodes cluster Hi Jeff, Today another node was shu

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
: + 86 27 5024 2516 发件人: Jeff Jirsa <jji...@gmail.com> 发送时间: 2018年3月27日 11:50 收件人: Xiangfei Ni <xiangfei...@cm-dt.com> 抄送: user@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improp

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Rahul Singh
running node: > https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html > > Kenneth Brotman > > > From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] > Sent: Tuesday, March 27, 2018 5:44 AM > To: user@cassandra.apache.org > Subject: Re:RE: 答复: A nod

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
/operations/opsReplaceLiveNode.html Kenneth Brotman From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 5:44 AM To: user@cassandra.apache.org Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster Thanks,Kenneth,this is production database

Re:RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
enneth Brotman From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 3:27 AM To: Jeff Jirsa Cc: user@cassandra.apache.org Subject: 答复: 答复: A node down every day in a 6 nodes cluster Thanks Jeff, So your suggestion is to first resolve the data model issue which caus

RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
David, Can you replace the misbehaving node to see if that resolves the problem? Kenneth Brotman From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] Sent: Tuesday, March 27, 2018 3:27 AM To: Jeff Jirsa Cc: user@cassandra.apache.org Subject: 答复: 答复: A node down every day in a 6 nodes

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
5024 2516 发件人: Jeff Jirsa <jji...@gmail.com> 发送时间: 2018年3月27日 11:50 收件人: Xiangfei Ni <xiangfei...@cm-dt.com> 抄送: user@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improp

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Xiangfei Ni
angfei Ni <xiangfei...@cm-dt.com> 抄送: user@cassandra.apache.org 主题: Re: 答复: A node down every day in a 6 nodes cluster Only one node having the problem is suspicious. May be that your application is improperly pooling connections, or you have a hardware problem. I dont see anything

答复: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread Xiangfei Ni
27 5024 2516 发件人: daemeon reiydelle <daeme...@gmail.com> 发送时间: 2018年3月27日 11:42 收件人: user <user@cassandra.apache.org> 主题: Re: 答复: A node down every day in a 6 nodes cluster Look for errors on your network interface. I think you have periodic errors in your network connectivity <

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread Jeff Jirsa
Wuhan,HuBei > > Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516 > <+86%2027%205024%202516> > > > > *发件人:* Jeff Jirsa <jji...@gmail.com> > *发送时间:* 2018年3月27日 11:03 > *收件人:* user@cassandra.apache.org > *主题:* Re: A node

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread daemeon reiydelle
|Tel: + 86 27 5024 2516 > <+86%2027%205024%202516> > > > > *发件人:* Jeff Jirsa <jji...@gmail.com> > *发送时间:* 2018年3月27日 11:03 > *收件人:* user@cassandra.apache.org > *主题:* Re: A node down every day in a 6 nodes cluster > > > > That warning isn’t sufficient to

Re: A node down every day in a 6 nodes cluster

2018-03-26 Thread Jeff Jirsa
That warning isn’t sufficient to understand why the node is going down Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is likely a good idea Are the nodes coming up on their own? Or are you restarting them? Paste the output of nodetool tpstats and nodetool cfstats

A node down every day in a 6 nodes cluster

2018-03-26 Thread Xiangfei Ni
Hi Cassandra experts, I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster is just in one DC, Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 3,a node downs one

Re: Not marking node down due to local pause

2017-10-20 Thread Alexander Dejanovski
Hi John, the other main source of STW pause in the JVM is the safepoint mechanism : http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html If you turn on full GC logging in your cassandra-env.sh file, you will find lines like this : 2017-10-09T20:13:42.462+: 4.890: Total time for

Not marking node down due to local pause

2017-10-19 Thread John Sanda
I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a lot of these messages in both logs: WARN 07:23:16 Not marking nodes down due to local pause of 7219277694 > 50 I am fairly certain that they are not due to GC. I am not seeing a whole of GC being logged and nothing

Re: Node down during move

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky ho...@avast.com wrote: just a follow up. We've seen this behavior multiple times now. It seems that the receiving node loses connectivity to the cluster and thus thinks that it is the sole online node, whereas the rest of the cluster thinks that it

Re: Node down during move

2014-12-23 Thread Jiri Horky
Hi, just a follow up. We've seen this behavior multiple times now. It seems that the receiving node loses connectivity to the cluster and thus thinks that it is the sole online node, whereas the rest of the cluster thinks that it is the only offline node, really just after the streaming is over.

Node down during move

2014-12-19 Thread Jiri Horky
Hi list, we added a new node to existing 8-nodes cluster with C* 1.2.9 without vnodes and because we are almost totally out of space, we are shuffling the token fone node after another (not in parallel). During one of this move operations, the receiving node died and thus the streaming failed:

Re: node down = log explosion?

2013-01-23 Thread aaron morton
so that hints are recorded if say the node has been down for more than 1 minute. Anyways I would say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use

node down = log explosion?

2013-01-22 Thread Sergey Olefir
backup). In total there's 100 separate clients executing 1-2 batch updates per second. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). This led to a complete explosion of logs on the remaining alive

Re: node down = log explosion?

2013-01-22 Thread Rob Coli
not ideal. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). ... This led to a complete explosion of logs on the remaining alive node in DC1. I agree, this level of exception logging during

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
counter increments at the rate of about 10k per second. Do you need highly performant counters that count accurately, without meaningful chance of over-count? If so, Cassandra's counters are probably not ideal. We wanted to test what happens if one node goes down, so we brought one node down

Re: node down = log explosion?

2013-01-22 Thread Rob Coli
On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir solf.li...@gmail.com wrote: Do you have a suggestion as to what could be a better fit for counters? Something that can also replicate across DCs and survive link breakdown between nodes (across DCs)? (and no, I don't need 100.00% precision

Re: node down = log explosion?

2013-01-22 Thread aaron morton
say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use nodes with more cores, adjust the HH settings, or reduce the throughput. On the subject of bug

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
that hints are recorded if say the node has been down for more than 1 minute. Anyways I would say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use nodes

RE: Node down

2012-02-02 Thread Rene Kochen
view. Can it be that this stored ring view was out of sync with the actual (gossip) situation? Thanks! Rene From: aaron morton [mailto:aa...@thelastpickle.com] Sent: woensdag 1 februari 2012 21:03 To: user@cassandra.apache.org Subject: Re: Node down Without knowing too much more information I

Re: Node down

2012-02-02 Thread aaron morton
[mailto:aa...@thelastpickle.com] Sent: woensdag 1 februari 2012 21:03 To: user@cassandra.apache.org Subject: Re: Node down Without knowing too much more information I would try this… * Restart node each node in turn, watch the logs to see what it says about the other. * If that restart did

Node down

2012-02-01 Thread Rene Kochen
I have a cluster with seven nodes. If I run the node-tool ring command on all nodes, I see the following: Node1 says that node2 is down. Node 2 says that node1 is down. All other nodes say that everyone is up. Is this normal behavior? I see no network related problems. Also no problems between

Re: Node down

2012-02-01 Thread aaron morton
Without knowing too much more information I would try this… * Restart node each node in turn, watch the logs to see what it says about the other. * If that restart did not fix it, try using the Dcassandra.load_ring_state=false JVM option when starting the node. That will tell it to ignore

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Alexandru Dan Sicoe
Hi guys, It's interesting to see this thread. I recently discovered a similar problem on my 3 node Cassandra 0.8.5 cluster. It was working fine, then I took a node down to see how it behaves. All of a sudden I couldn't write or read because of this exception being thrown: Exception in thread

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
took a node down to see how it behaves. All of a sudden I couldn't write or [snip] me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be [snip]     Default replication factor = 1 So you have an RF=1 cluster (only one copy of data) and you bring a node down

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
If you want to survive node failures, use an RF above 1. And then make sure to use an appropriate consistency level. To elaborate a bit: RF, or replication factor, is the *total* number of copies of any piece of data in the cluster. So with only one copy, the data will not be available when a

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Alexandru Dan Sicoe
Hi Peter, Thank you for your explanations. Even with a RF=1 and one node down I don't understand why I can't at least read the data in the nodes that are still up? Also, why can't I at least perform writes with consistency level ANY and failover policy ON_FAIL_TRY_ALL_AVAILABLE...shouldn't

Re: UnavailableException with 1 node down and RF=2?

2011-10-28 Thread Peter Schuller
 Thank you for your explanations. Even with a RF=1 and one node down I don't understand why I can't at least read the data in the nodes that are still up? You will be able to read data for row keys that do not live on the node that is down. But for any request to a row which is on the node

2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
-- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936722.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread Jake Luciani
2.73 MB 55.00% 167057712653383445280042298172156091026 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936722.html Sent from the cassandra-u...@incubator.apache.org mailing list archive

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
The error I currently see when I take down node B: Error performing get_indexed_slices on NODE A IP:9160: exception 'cassandra_UnavailableException' -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread Jake Luciani
'cassandra_UnavailableException' -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936869.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- http://twitter.com

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
I'm reading with: cassandra_ConsistencyLevel::ANY (phpcassa lib). Is there any way to verify that all the nodes know that they are RF=2 ? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure

Re: 2 node cluster, 1 node down, overall failure

2011-10-27 Thread RobinUs2
Thank you very much Jake! It solved the problem. All reads and writes are working now. Have a nice day! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936947.html Sent from the cassandra-u

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Jonathan Ellis
responding any more. Did you found a solution for your problem? /I'm new to mailing lists, if it's inappropriate to reply here, please let me know../ http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html http://cassandra

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Javier Canillas
://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html -- View this message in context: http://cassandra-user

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread Jonathan Ellis
me know../ http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html -- View this message in context

Re: UnavailableException with 1 node down and RF=2?

2011-10-27 Thread R. Verlangen
, the other isn't responding any more. Did you found a solution for your problem? /I'm new to mailing lists, if it's inappropriate to reply here, please let me know../ http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html

One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
One of our nodes, which happens to be the seed thinks its Up and all the other nodes are down. However all the other nodes thinks the seed is down instead. The logs for the seed node show everything is running as it should be. I've tried restarting the node, turning on/off gossip and thrift and

Re: One node down but it thinks its fine...

2011-07-13 Thread samal
Check seed ip is same in all node and should not be loopback ip on cluster. On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski ray.slakin...@gmail.comwrote: One of our nodes, which happens to be the seed thinks its Up and all the other nodes are down. However all the other nodes thinks the seed is

Re: One node down but it thinks its fine...

2011-07-13 Thread Sasha Dolgy
any firewall changes? ping is fine ... but if you can't get from node(a) to nodes(n) on the specific ports... On Wed, Jul 13, 2011 at 6:47 PM, samal sa...@wakya.in wrote: Check seed ip is same in all node and should not be loopback ip on cluster. On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski

Re: One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
And fixed! a co-worker put in a bad host line entry last night that through it all off :( Thanks for the assist guys. -- Ray Slakinski On Wednesday, July 13, 2011 at 1:32 PM, Ray Slakinski wrote: Was all working before, but we ran out of file handles and ended up restarting the nodes. No

RE: Reboot, now node down 0.8rc1

2011-05-24 Thread Scott McPheeters
@cassandra.apache.org Subject: Re: Reboot, now node down 0.8rc1 You could have removed the affected commit log file and then run a nodetool repair after the node had started. It would be handy to have some more context for the problem. Was this an upgrade from 0.7 or a fresh install? If you

Re: Reboot, now node down 0.8rc1

2011-05-24 Thread Sylvain Lebresne
@cassandra.apache.org Subject: Re: Reboot, now node down 0.8rc1 You could have removed the affected commit log file and then run a nodetool repair after the node had started. It would be handy to have some more context for the problem. Was this an upgrade from 0.7 or a fresh install? If you are running

RE: Reboot, now node down 0.8rc1

2011-05-24 Thread Scott McPheeters
@cassandra.apache.org Subject: Re: Reboot, now node down 0.8rc1 You could have removed the affected commit log file and then run a nodetool repair after the node had started. It would be handy to have some more context for the problem. Was this an upgrade from 0.7 or a fresh install? If you

Reboot, now node down 0.8rc1

2011-05-23 Thread Scott McPheeters
I have a test node system running release 0.8rc1. I rebooted node3 and now Cassandra is failing on startup. Any ideas? I am not sure where to begin. Debian 6, plenty of disk space, Cassandra 0.8rc1 INFO 13:48:58,192 Creating new commitlog segment

RE: Reboot, now node down 0.8rc1

2011-05-23 Thread Scott McPheeters
the node and bring it back? Or am I missing completely what the commitlog is? Scott -Original Message- From: Scott McPheeters [mailto:smcpheet...@healthx.com] Sent: Monday, May 23, 2011 2:18 PM To: user@cassandra.apache.org Subject: Reboot, now node down 0.8rc1 I have a test node system

Re: Reboot, now node down 0.8rc1

2011-05-23 Thread aaron morton
-Original Message- From: Scott McPheeters [mailto:smcpheet...@healthx.com] Sent: Monday, May 23, 2011 2:18 PM To: user@cassandra.apache.org Subject: Reboot, now node down 0.8rc1 I have a test node system running release 0.8rc1. I rebooted node3 and now Cassandra is failing

Determining the issues of marking node down

2011-04-30 Thread Rauan Maemirov
I have a test cluster with 3 nodes, earlier I've installed OpsCenter to watch my cluster. Every day I see, that the same one node goes down (at different time, but every day). Then I just run `service cassandra start` to fix that problem. system.log doesn't show me anything strange. What are the

Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
I looked through the documentation but couldn't find anything. I was wondering if there is a way to manually mark a node down in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up. The reason I ask is because we are having

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
:15 AM, Justin Sanders jus...@justinjas.com wrote:I looked through the documentation but couldn't find anything. I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no

Re: Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
It seems to be about 15 seconds after killing a node before the other nodes report it being down. We are running a 9 node cluster with RF=3, all reads and writes at quorum. I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
it being down. We are running a 9 node cluster with RF=3, all reads and writes at quorum. I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since the other two nodes would be able to respond. JustinOn Wed, Sep 29, 2010 at 5:58 PM, Aaron

node down window

2010-07-14 Thread B. Todd Burruss
there is a window of time from when a node goes down and when the rest of the cluster actually realizes that it is down. what happens to writes during this time frame? does hinted handoff record these writes and then handoff when the down node returns? or does hinted handoff not kick in until

Re: node down window

2010-07-14 Thread Jonathan Ellis
On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss bburr...@real.com wrote: there is a window of time from when a node goes down and when the rest of the cluster actually realizes that it is down. what happens to writes during this time frame?  does hinted handoff record these writes and then

Re: node down window

2010-07-14 Thread B. Todd Burruss
thx, but disappointing :) is this just something we have to live with and periodically repair the nodes? or is there future work to tighten up the window? thx On Wed, 2010-07-14 at 12:13 -0700, Jonathan Ellis wrote: On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss bburr...@real.com wrote:

Re: node down window

2010-07-14 Thread Jonathan Ellis
Coordination in a distributed system is difficult. I don't think we can fix HH's existing edge cases, without introducing other more complicated edge cases. So weekly-or-so repair will remain a common maintenance task for the forseeable future. On Wed, Jul 14, 2010 at 4:17 PM, B. Todd Burruss

Re: UnavailableException with 1 node down and RF=2?

2010-07-01 Thread Jonathan Ellis
...@b3k.us wrote: .QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick jamesgol...@gmail.com wrote: 4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder

Re: UnavailableException with 1 node down and RF=2?

2010-07-01 Thread James Golick
nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com

Re: UnavailableException with 1 node down and RF=2?

2010-07-01 Thread Jonathan Ellis
...@b3k.us wrote: .QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick jamesgol...@gmail.com wrote: 4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J. -- Jonathan Ellis Project Chair, Apache

UnavailableException with 1 node down and RF=2?

2010-06-30 Thread James Golick
4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J.

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread Benjamin Black
.QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick jamesgol...@gmail.com wrote: 4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J.

Re: UnavailableException with 1 node down and RF=2?

2010-06-30 Thread James Golick
Oops. I meant to say that I'm reading with CL.ONE. J. Sent from my iPhone. On 2010-07-01, at 1:39 AM, Benjamin Black b...@b3k.us wrote: .QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick jamesgol...@gmail.com wrote: 4 nodes, RF=2, 1 node down