Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
Another thing that comes to my mind: increase minimum sstable count to compact from 4 to 32 for the big table that won't be read that much, although you should watch out for too many sstables count. Sent using https://www.zoho.com/mail/ On Fri, 02 Sep 2022 11:29:59 +0430

Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
I was there too! and found nothing to work around it except stopping big/unnecessary compactions manually (using nodetool stop) whenever they appears by some shell scrips (using crontab) Sent using https://www.zoho.com/mail/ On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz wrote ---

Compaction task priority

2022-09-02 Thread Gil Ganz
Hey When deciding which sstables to compact together, how is the priority determined between tasks, and can I do something about it? In some cases (mostly after removing a node), it takes a while for compactions to keep up with the new data the came from removed nodes, and I see it is busy on

Re: netty connection reset by peer errors in logs

2022-09-01 Thread Gil Ganz
Reason I would like to suppress it is I think this is due to network disconnects we know are happening, and looks like it's not going to change. Since it doesn't happen that often, and not causing a real issue , I would like to have cleaner logs if possible. On Thu, Sep 1, 2022 at 9:55 AM Erick

Re: netty connection reset by peer errors in logs

2022-09-01 Thread Erick Ramirez
That error message indicates that 2 nodes are unable to communicate with each other over the internode (gossip) port. It makes no sense to suppress it since it's an indication that there's a problem that you need to address. Cheers!

netty connection reset by peer errors in logs

2022-09-01 Thread Gil Ganz
Hey We have an issue in few of our 4.0.4 clusters, these are on-prem, multiple datacenters around the world clusters, and our logs have many errors like this : ERROR [Messaging-EventLoop-3-26] 2022-09-01 05:57:28,142 InboundMessageHandler.java:300 -

[ANNOUNCE] Debian and RedHat package repositories are moving!

2022-08-26 Thread Mick Semb Wever
Your Debian `cassandra.sources.list` and RedHat `cassandra.repo` files must be updated to the new repository URLs. The Debian file is typically at `/etc/apt/sources.list.d/cassandra.sources.list`. The RedHat file is typically at `/etc/yum.repos.d/cassandra.repo`. For Debian the repository is now

New open-source CQL driver for Rust released - 0.5.0

2022-08-25 Thread Piotr Sarna via user
I'm pleased to announce ScyllaDB Rust Driver 0.5.0, an asynchronous CQL driver for Rust, fully compatible with Apache Cassandra™! Cool, ever growing open-source stats:  * over 38k downloads on crates;  * over 300 GitHub stars! === Notable changes ===  * Client-side timeouts are here! Request

Invitation to take the 2022 ASF Community Survey

2022-08-25 Thread Paulo Motta
Hello everyone, The 2022 ASF Community Survey is looking to gather scientific data that allows us to understand our community better, both in its demographic composition, and also in collaboration styles and preferences. We want to find areas where we can continue to do great work, and others

[RELEASE] Apache Cassandra 4.0.6 released

2022-08-25 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra version 4.0.6. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source

RE: Erroneous node. - node is not a member of the

2022-08-24 Thread Marc Hoppins
Update again: The dc1-cass14 node stopped accepting/bootstrapping/streaming early on and now there is just a bunch of WARN [OptionalTasks:1] 2022-08-24 13:10:16,761 CassandraRoleManager.java:344 - CassandraRoleManager skipped default role setup: some nodes were not ready INFO

RE: Erroneous node. - node is not a member of the

2022-08-24 Thread Marc Hoppins
Update: I shut the server down and the node finally disappeared from the status. I then restarted the server on the similarly named node (dc1-cass14) and it came up...however, it is UJ. Was this due to the amount of time spent unavailable? M -Original Message- From: Marc Hoppins

RE: Erroneous node. - node is not a member of the

2022-08-24 Thread Marc Hoppins
Also, I just had some changes made to the cass.yml config so thought that is I rolling restart the nodes it might help the problem. Now I have a startup problem with an existing node...with a similar name Original problem node = dc2-cass14 Existing node = dc1-cass14 and am getting: ERROR

Erroneous node. - node is not a member of the

2022-08-24 Thread Marc Hoppins
Hi all, I added a node but forgot to specify the correct rack so I stopped the join and removed it. When I tried adding it again it was taking a LONG time to join. I tried draining before stopping the service but that failed. I killed the process and cleared the directories but the cluster

Re: Hints not being sent from 3.0 to 4.0?

2022-08-23 Thread Jim Shaw
Is it over max hint window ? if over, better to do a full repair. check table system.hints, do you see rows ? As I remember, during upgrade, transactions will store in hints until other cluster have done upgrade, so for safety, change default 3 hours hint window to long time just before

Hints not being sent from 3.0 to 4.0?

2022-08-23 Thread Morten A. Iversen via user
Hi, We are currently in the process of upgrading our environment from 3.0.27 to 4.0.4. However I see some issues with hints not being sent from v3 nodes to v4 nodes. We have a test environment with 2DCs, we are currently writing to DC1 and DC2 have been upgraded from version 3.0.27 -> 4.0.4

Cassandra Day Berlin, September 20th

2022-08-23 Thread Stefano Lottini
Hello, Cassandra community! The next Cassandra Day event will be an in-person, one-day event which will take place in the German capital on Tuesday, 20 September 2022, hosted and organized by DataStax. Cassandra Days focus on the open source Apache Cassandra project and the community that

Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-21 Thread Jim Shaw
Though it is not required to run upgradesstables, but upgradesstables -a will re-write the file to kick out tombstones, in sizeTieredcompaction, the largest files may stay a long time to wait for the next compaction to kick out tombstones. So it really depends, to run it or not, usually upgrades

Re: cell vs row timestamp tie resolution

2022-08-21 Thread Jim Shaw
Andrey: cassandra every cell has a timestamp, select writetime (..) can see the timestamp, cassandra merge cells when compaction, when read, sort by timestamp. for you example, if you left pad the writetime to column value (writetime + cell value), then sort, shall return what you see

Re: Question about num_tokens

2022-08-18 Thread Elliott Sims
I'm not sure I entirely agree with the docs there, as they don't quite match my experiences, but it's going to depend a lot on your specific needs and other parts of the configuration. I think data distribution with low num_tokens is generally considered to be less of a problem with larger

DecoderException when moving to Cassandra 4

2022-08-18 Thread Ryan Martin
Hi, We are upgrading our Cassandra from 3.11 to 4. When we connect to the new Cassandra instance with our client everything is working fine however, we get this this error in the logs (SSL is not enabled and we are using the default cassandra.yaml): ERROR [Messaging-EventLoop-3-1] 2022-08-18

Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thank you On Tue, Aug 16, 2022 at 11:48 AM C. Scott Andreas wrote: > No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1 > SSTables incurred a performance hit). > > Many users of Cassandra don't run upgradesstables after 3.x -> 4.x > upgrades at all. It's not necessary to run

Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread C. Scott Andreas
No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1 SSTables incurred a performance hit).Many users of Cassandra don't run upgradesstables after 3.x -> 4.x upgrades at all. It's not necessary to run until a hypothetical future time if/when support for reading Cassandra 3.x

Re: Question about num_tokens

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thanks for the response and details. I am just curious about the below statement mentioned in the doc. I am pretty confident that my clusters are going to grow to 100+ nodes (same DC or combining all DCs). I am just concerned that the doc says it is *not recommended for clusters over 50 nodes*.

cell vs row timestamp tie resolution

2022-08-16 Thread Andrey Zapariy
Hello Cassandra users! I'm dealing with the unexpected behavior of the tie resolution for the same timestamp inserts. At least, unexpected for me. The following simple repro under Cassandra 3.11.4 illustrates the question: CREATE KEYSPACE the_test WITH replication = {'class': 'SimpleStrategy',

Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thank you Erick, > it is going to be single-threaded by default so it will take a while to get through all the sstables on dense nodes Is there any downside if the upgradesstables take longer (example 1-2 days), other than I/O? Also when is the upgradesstable get triggered? after every node is

Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Erick Ramirez
As convenient as it is, there are a few caveats and it isn't a silver bullet. The automatic feature will only kick in if there are no other compactions scheduled. Also, it is going to be single-threaded by default so it will take a while to get through all the sstables on dense nodes. In

Cassandra 4.0 upgrade - Upgradesstables

2022-08-15 Thread Jai Bheemsen Rao Dhanwada
Hello, I am evaluating the upgrade from 3.11.x to 4.0.x and as per CASSANDRA-14197 we don't need to run upgradesstables any more. We have tested this in a test environment and see that setting "-Dcassandra.automatic_sstable_upgrade=true"

Re: Understanding multi region read query and latency

2022-08-09 Thread Bowen Song via user
Adding sleep to solve racing conditions is a bad practice, and should be avoided if possible. Instead, use read and write CL that guarantees strong consistency when it is required/needed. On 09/08/2022 23:49, Jim Shaw wrote: Raphael:    Have you found  root cause ? If not, here are a few

Re: Understanding multi region read query and latency

2022-08-09 Thread Jim Shaw
Raphael: Have you found root cause ? If not, here are a few tips, based on what I experienced before, but may not be same as your case, just hope it is helpful. 1) app side called wrong code module get the cql from system.prepared_statements cql statement is helpful to developers to search

RE: RPM Installation on RHEL7 broken

2022-08-09 Thread Amit Patel via user
Thank you Yakir From: Stéphane Alleaume Sent: 09 August 2022 14:21 To: Yakir Gibraltar Cc: user@cassandra.apache.org; Amit Patel Subject: Re: RPM Installation on RHEL7 broken CAUTION: This email comes from outside Euroclear! Be vigilant! Thanks you very much :-) Kind regards Stéphane Le

Re: RPM Installation on RHEL7 broken

2022-08-09 Thread Stéphane Alleaume
Thanks you very much :-) Kind regards Stéphane Le mar. 9 août 2022, 15:16, Yakir Gibraltar a écrit : > The issue is this commit on 4.0.5: > https://github.com/apache/cassandra/commit/cd0a40d09e5c029e3cac260ecf4cb3dc02deabc7 > From: > Requires: jre >= 1.8.0 > To: > Requires: (jre-1.8.0 *or*

Re: RPM Installation on RHEL7 broken

2022-08-09 Thread Yakir Gibraltar
The issue is this commit on 4.0.5: https://github.com/apache/cassandra/commit/cd0a40d09e5c029e3cac260ecf4cb3dc02deabc7 From: Requires: jre >= 1.8.0 To: Requires: (jre-1.8.0 *or* jre-11) But support for “Boolean Dependencies” was added only in rpm version 4.13, CentOS 7 with rpm 4.11.3. This is my

RE: RPM Installation on RHEL7 broken

2022-08-09 Thread Amit Patel via user
Hi Stephane, I have followed same instruction but new rpm version 4.0.5 is broken(bug) and there are no other package on that repo (download base url for rhel) so I can not install older stable version. Kind regards, Amit Patel From: Stéphane Alleaume Sent: 09 August 2022 13:32 To:

Re: RPM Installation on RHEL7 broken

2022-08-09 Thread Stéphane Alleaume
Hi Hope it will help : https://cassandra.apache.org/doc/trunk/cassandra/getting_started/installing.html#installing-the-rpm-packages 1. Add the Apache repository of Cassandra to the file /etc/yum.repos.d/cassandra.repo (as the root user). The latest major version is 4.0 and the

RPM Installation on RHEL7 broken

2022-08-09 Thread Amit Patel via user
Hi All, We are facing the issue on RHEL7 as well , we have java8 installed on the system but when I tried to install yum install Cassandra or even localinstall(downloaded rpm) gives similar error as below . There are bug report for this issue

Re: Understanding multi region read query and latency

2022-08-07 Thread Stéphane Alleaume
You're right too, this option is not new, sorry. Is this option can be useful ? Le dim. 7 août 2022, 22:18, Bowen Song via user a écrit : > Do you mean "nodetool settraceprobability"? This is not exactly new, I > remember it was available on Cassandra 2.x. > On 07/08/2022 20:43, Stéphane

Re: Understanding multi region read query and latency

2022-08-07 Thread Bowen Song via user
Do you mean "nodetool settraceprobability"? This is not exactly new, I remember it was available on Cassandra 2.x. On 07/08/2022 20:43, Stéphane Alleaume wrote: I think perhaps you already know but i read you can now trace only a % of all queries, i will look to retrieve the name of this

Re: Understanding multi region read query and latency

2022-08-07 Thread Stéphane Alleaume
Thanks a lot Scott, i didn't knew this fact. Kind regards Stéphane Le dim. 7 août 2022, 19:31, C. Scott Andreas a écrit : > > but still as I understand the documentation the read repair should not > be in the blocking path of a query ? > > Read repair is in the blocking read path for the

Re: Understanding multi region read query and latency

2022-08-07 Thread Stéphane Alleaume
I think perhaps you already know but i read you can now trace only a % of all queries, i will look to retrieve the name of this fonctionnality (in new Cassandra release). Hope it will help Kind regards Stéphane Le dim. 7 août 2022, 20:26, Raphael Mazelier a écrit : > > "Read repair is in the

Re: Understanding multi region read query and latency

2022-08-07 Thread Raphael Mazelier
> "Read repair is in the blocking read path for the query, yep" OK interesting. This is not what I understood from the documentation. And I use localOne level consistency. I enabled tracing (see in the attachment of my first msg)/ but I didn't see read repair in the trace (and btw I tried to

Re: Understanding multi region read query and latency

2022-08-07 Thread C. Scott Andreas
> but still as I understand the documentation the read repair should not be in the blocking path of a query ?Read repair is in the blocking read path for the query, yep. At quorum consistency levels, the read repair must complete before returning a result to the client to ensure the data returned

Re: Understanding multi region read query and latency

2022-08-07 Thread Stéphane Alleaume
Read repair chance ? Le dim. 7 août 2022, 19:25, Raphael Mazelier a écrit : > Nope. And what really puzzle me is in the trace we really show the > difference between queries. The fast queries only request read from one > replicas, while slow queries request from multiple replicas (and not only

Re: Understanding multi region read query and latency

2022-08-07 Thread Raphael Mazelier
Nope. And what really puzzle me is in the trace we really show the difference between queries. The fast queries only request read from one replicas, while slow queries request from multiple replicas (and not only local to the dc). On 07/08/2022 14:02, Stéphane Alleaume wrote: Hi Is there

Exception encountered during startup: TruncateException

2022-08-06 Thread Bowen Song via user
Hello, I have Cassandra 4.0.1 on a server failing to start. The server was power cycled after it experienced an unrecoverable memory error detected by EDAC. The memory error was transitory, and AFAIK it has disappeared. But Cassandra is not starting. The logs are: INFO  [main]

Re: Understanding multi region read query and latency

2022-08-06 Thread Raphael Mazelier
Well answering to myself this is not related to read_repair chance. Settings them to 0 change also nothing. So the question remains: why from time to time C* want to make multiple read on a non local dc ? On 06/08/2022 12:31, Raphael Mazelier wrote: Well I tried (but already have some

Re: Understanding multi region read query and latency

2022-08-06 Thread Raphael Mazelier
Well I tried (but already have some whiteListFilter) it changed nothing but it's more convenient that using whiteListFilter (speeding up the connection time). So still from time to time (dedanding of the frequency of my requests) I got slow request when I notice in the trace that c* try to

Re: Understanding multi region read query and latency

2022-08-05 Thread Bowen Song via user
The  DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls which Cassandra coordinator node the client sends queries to, not the nodes it connects to, nor the nodes that performs the actual read. A client sends a CQL read query to a coordinator node, and the coordinator node parses the CQL

Re: Understanding multi region read query and latency

2022-08-05 Thread Jim Shaw
I remember gocql.DataCentreHostFilter was used. try add it to see whether will read local DC only in your case ? Thanks, James On Fri, Aug 5, 2022 at 2:40 PM Raphael Mazelier wrote: > Hi Cassandra Users, > > I'm relatively new to Cassandra and first I have to say I'm really > impressed by

Understanding multi region read query and latency

2022-08-05 Thread Raphael Mazelier
Hi Cassandra Users, I'm relatively new to Cassandra and first I have to say I'm really impressed by the technology. Good design and a lot of stuff to understand the underlying (the Oreilly book help a lot as well as thelastpickle blog post). I have an muli-datacenter c* cluster (US,

Re: unsubscribe

2022-08-04 Thread Bowen Song via user
Please send an email to "user-unsubscr...@cassandra.apache.org" to unsubscribe from this mailing list. On 04/08/2022 18:29, Dathan Vance Pattishall wrote: unsubscribe

unsubscribe

2022-08-04 Thread Dathan Vance Pattishall
unsubscribe

RE: Service shutdown

2022-08-04 Thread Marc Hoppins
The only messages in OS system log were exactly the same as daemon.log. The hosts did not shut down, only the Cassandra service stopped. So dmesg has nothing. The amount of data being written is not that great and GC times are always <1s. The only visible error-type messages are related to

Re: Service shutdown

2022-08-04 Thread Bowen Song via user
Generally speaking, I've seen Cassandra process stopping for the following reasons: OOM killer JVM OOM Received a signal, such as SIGTERM and SIGKILL File IO error when disk_failure_policy or commit_failure_policy is set to die Hardware issues, such as memory corruption,

Service shutdown

2022-08-04 Thread Marc Hoppins
Hulloa all, Service on two nodes stopped yesterday and I can find nothing to indicate why. I have checked Cassandra system.logs, gc.logs and debug.logs as well as OS logs and all I can see is the following - which is far from helpful: DAEMON.LOG Aug 3 11:39:12 cassandra19 systemd[1]:

Re: Cassandra Client compatibility

2022-08-02 Thread Erick Ramirez
In the context of driver compatibility, Cassandra "3.0+" means C* 3.0 and newer releases which include C* 3.0.x, 3.11.x, 4.0.x and [soon] 4.1.x. To answer your question directly, version 2.6 of the C++ driver works with C* 3.11.11 but we don't recommend you use it since it's an ancient release

Storage load reporting seems off in v4?

2022-08-01 Thread Richard Hesse
After upgrading to version 4.0.3 from 3.11.9 (and running upgradesstables), we've noticed that the storage load in one of our clusters seems off. That is, it looks like the counters have wrapped or something like that. Example: -- Address Load Tokens Owns (effective) UN

[Request] End user comments for Apache Cassandra 4.1

2022-07-27 Thread Chris Thornett
Hey everyone, We're pulling together comments from end users on the release of Apache Cassandra 4.1 (coming soon). If you would like to contribute, please email them to me on *chris at constantia dot io*. Essentially, we're looking for positive quotes on your decision to use Cassandra and

Re: Wrong Consistency level seems to be used

2022-07-21 Thread Jim Shaw
My experience to debug this kind of issue is to turn on trace. The nice thing in cassandra is: you can turn on trace only on 1 node and with a small percentage, i.e. nodetool settraceprobability 0.05 --- only run on 1 node. Hope it helps. Regards, James On Thu, Jul 21, 2022 at 2:50 PM

Re: Wrong Consistency level seems to be used

2022-07-21 Thread Tolbert, Andy
I'd bet the JIRA that Paul is pointing to is likely what's happening here. I'd look for read repair errors in your system logs or in your metrics (if you have easy access to them). There are operations that can happen during the course of a query being executed that may happen at different

Re: Export as csv

2022-07-21 Thread Arvinder Dhillon
Yup, copy worked. Thanks. On Thu, Jul 21, 2022 at 10:55 AM Nitan Kainth wrote: > Use copy command or dsbulk > > > > > > Regards, > > Nitan K. > > 510-449-9629 > > > > > > *From: *Arvinder Dhillon > *Date: *Thursday, July 21, 2022 at 12:52 PM > *To: *user@cassandra.apache.org > *Subject:

Re: Export as csv

2022-07-21 Thread Nitan Kainth
Use copy command or dsbulk Regards, Nitan K. 510-449-9629 From: Arvinder Dhillon Date: Thursday, July 21, 2022 at 12:52 PM To: user@cassandra.apache.org Subject: Export as csv What tool do I use to dump a columns of a table having 40K partitions into a csv file? I tried caqsh with CAPTURE,

Export as csv

2022-07-21 Thread Arvinder Dhillon
What tool do I use to dump a columns of a table having 40K partitions into a csv file? I tried caqsh with CAPTURE, it stuck after 100 rows and needs "enter" to dump next 100, even with allow filtering. Thanks, Arvi

Re: Wrong Consistency level seems to be used

2022-07-21 Thread Paul Chandler
I came across this problem a few years ago, and had long conversations with Datastax support about it. In my case it turns out that the error message is misleading and I was pointed to the ticket: https://issues.apache.org/jira/browse/CASSANDRA-14715 I con’t remember much about it now, but

Re: Wrong Consistency level seems to be used

2022-07-21 Thread pwozniak
Yes, I did it. Nothing like this in my code. Consistency level is set only in one place (shown below). On 7/21/22 4:08 PM, manish khandelwal wrote: Consistency can also be set on a statement basis. So please check in your code that you might be setting consistency 'ALL' for some queries. On

Re: Wrong Consistency level seems to be used

2022-07-21 Thread Bowen Song via user
It doesn't make any sense to see consistency level ALL if the code is not explicitly using it. My best guess is somewhere in the code the consistency level was overridden. On 21/07/2022 14:52, pwozniak wrote: Hi, we have the following code (java driver): cluster

Re: Wrong Consistency level seems to be used

2022-07-21 Thread manish khandelwal
Consistency can also be set on a statement basis. So please check in your code that you might be setting consistency 'ALL' for some queries. On Thu, Jul 21, 2022 at 7:23 PM pwozniak wrote: > Hi, > > we have the following code (java driver): > > cluster =

Wrong Consistency level seems to be used

2022-07-21 Thread pwozniak
Hi, we have the following code (java driver): cluster =Cluster.builder().addContactPoints(contactPoints).withPort(port) .withProtocolVersion(ProtocolVersion.V3) .withQueryOptions(new QueryOptions() .setConsistencyLevel(ConsistencyLevel.QUORUM))

Re: Adding nodes

2022-07-20 Thread Bowen Song via user
To unsubscribe, please send an email to user-unsubscr...@cassandra.apache.org On 20/07/2022 18:34, emmanuel warreng wrote: Unsubscribe On Thu, Jul 7, 2022, 16:49 Marc Hoppins wrote: Hi all, Cluster of 2 DC and 24 nodes DC1 (RF3) = 12 nodes, 16 tokens each DC2 (RF3) = 12

Re: Adding nodes

2022-07-20 Thread emmanuel warreng
Unsubscribe On Thu, Jul 7, 2022, 16:49 Marc Hoppins wrote: > Hi all, > > Cluster of 2 DC and 24 nodes > > DC1 (RF3) = 12 nodes, 16 tokens each > DC2 (RF3) = 12 nodes, 16 tokens each > > Adding 12 more nodes to DC1: I installed Cassandra (version is the same > across all nodes) but, after the

[RELEASE] Apache Cassandra 4.0.5 released

2022-07-18 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra version 4.0.5. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source

Apache Cassandra(R) Corner podcast - Call for guests

2022-07-14 Thread Aaron Ploetz
Just wanted to reach out to the folks on this list. The Apache Cassandra Corner® podcast is all about sharing and discussing some of the great use cases out there, as well as the infrastructure engineering which happens around it. If you are using or supporting Apache Cassandra®, I'd love to

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
Your rack awareness problem is described in https://issues.apache.org/jira/browse/CASSANDRA-3810 from 2012. The fundamental problem is that Cassandra wont move data except during bootstrap, decom, and explicit moves. The implication here is exactly what you've encountered - if you tell cassandra

Re: Adding nodes

2022-07-12 Thread Bowen Song via user
You have some (many?) misunderstanding of how Cassandra works, and therefore many of your questions are hard to answer without educating you first and make you asking different but related and relevant questions instead. That's why you aren't getting any answer from us. We are not paid to do

RE: Adding nodes

2022-07-12 Thread Marc Hoppins
I posted system log data, GC log data, debug log data, nodetool data. I believe I had described the situation more than adequately. Yesterday, I was asking what I assumed to be reasonable questions regarding the method for adding new nodes to a new rack. Forgive me if it sounds unreasonable

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
On Tue, Jul 12, 2022 at 7:27 AM Marc Hoppins wrote: > > I was asking the questions but no one cared to answer. > This is probably a combination of "it is really hard to answer a question with insufficient data" and your tone. Nobody here gets paid to help you solve your company's problems

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
Cassandra isn't Hadoop. Most of the mistakes you're making is treating a complex distributed system like a different complex distributed system without understanding the nuance. Racks vs DCs are because you wouldn't ever want both copies of data on one rack, in case the top of rack switch or PDU

RE: Adding nodes

2022-07-12 Thread Durity, Sean R via user
In my experience C* is not cheaper storage than HDFS. If that is the goal, it may be painful. Each Cassandra DC has at least one full copy of the data set. For production data that I care about (that my app teams care about), we use RF=3 in each Cassandra DC. And I only use 1 Cassandra rack

Re: Adding nodes

2022-07-12 Thread Bowen Song via user
I think you are misinterpreting many concepts here. For a starter, a physical rack in a physical DC is not (does not have to be) a logical rack in a logical DC in Cassandra; and the allocate_tokens_for_local_replication_factor has nothing to do with replication factor (other than using it as

RE: Adding nodes

2022-07-12 Thread Marc Hoppins
There is likely going to be 2 racks in each DC. Adding the new node decided to quit after 12 hours. Node was overloaded and GC pauses caused the bootstrap to fail. I begin to see the pattern here. If replication is only within the same datacentre, and one starts off with only one rack then

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
I've noticed the joining node has a different rack than the rest of the nodes, is this intended? Will you add all new nodes to this rack and have RF=2 in that DC? In principal, you should have equal number of servers (vnodes) in each rack, and have the rack number = RF or 1. On 11/07/2022

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Sorry if this appears spammy but, being new to this, there are always questions. This is on the joining node: (prod) marc@ba-freddy14:/var/log/cassandra $ /opt/cassandra/bin/nodetool netstats -H|grep -i receiving

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
All clocks are fine. Why would time synch would affect whether or not a node appears in the nodetool status when running the command on a different node? Either the node is up and visible or not. From 24 other nodes (including ba-freddy14 itself), it shows in the status. For those other 23

Re: Adding nodes

2022-07-11 Thread Joe Obernberger
I too came from HBase and discovered adding several nodes at a time doesn't work.  Are you absolutely sure that the clocks are in sync across the nodes?  This has bitten me several times. -Joe On 7/11/2022 6:23 AM, Bowen Song via user wrote: You should look for warning and error level logs

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
You should look for warning and error level logs in the system.log, not the debug.log or gc.log, and certainly not only the latest lines. BTW, you may want to spend some time investigating potential GC issues based on the GC logs you provided. I can see 1 full GC in the 3 hours since the node

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Maybe I am not being clear enough. The 90/120 seconds was for NEW NODES TO A NEW CLUSTER WITH NO DATA. Being that this tool/suite/application is new to both the database folk and us support folk and, given that we are currently using HBASE and thus can add several nodes at a time to a new

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
How long doe it take to add a new node? I'm 100% sure neither 90s nor 120s is the answer. The answer is it varies. If you want to wait for finishing adding a new node, be explicit about it, wait for the node fully joins the cluster. Don't put a fixed number of seconds in there. You can

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Service still running. No errors showing. The latest info is in debug.log DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902 NettyStreamingMessageSender.java:258 - [Stream #befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sending keep-alive DEBUG

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
Checking on multiple nodes won't help if the joining node suffers from any of the issues I described, as it will likely be flipping up and down frequently, and the existing nodes in the cluster may never reach an agreement before the joining node stays up (or stays down) for a while. However,

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
I am beginning to wonder… If you recall, I stated that I had checked status on a bunch of other nodes from both datacentres and the joining node shows up. No errors are occurring anywhere; data is streaming; node is joining…but, as I also stated, on the initial node which I only used to run

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
“Where did you come up with the 90 seconds number?” The database folk came up with THAT number. For myself, I timed adding a new node at 120 seconds for the initial setup with no data in the cluster. “What exactly are you waiting for by doing that?” I wanted to see for myself how long it took

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
A node in joining state can disappearing from the cluster from other nodes' perspective if the joining node stops sending/receiving gossip messages to other nodes. This can happen when the joining node is severely overloaded, has bad network connectivity or stuck in long STW GC pauses.

Re: Adding nodes

2022-07-11 Thread Bowen Song via user
Sleeping/pausing for a fixed amount of time between operations at best is a hack to workaround an unknown issue, but it's almost always better to be explicit about what you are waiting for. Where did you come up with the 90 seconds number? What exactly are you waiting for by doing that? If you

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Further oddities… I was sitting here watching our new new node being added (nodetool status being run from one of the seed nodes) and all was going well. Then I noticed that our new new node was no longer visible. I checked the service on the new new node and it was still running. So I

RE: Adding nodes

2022-07-11 Thread Marc Hoppins
Well then… I left this on Friday (still running) and came back to it today (Monday) to find the service stopped. So, I blitzed this node from the ring and began anew with a different new node. I rather suspect the problem was with trying to use Ansible to add these initially - despite the

Cassandra World Party 7/20 - Schedule Announced, Register Now!

2022-07-08 Thread Whitney True
Hello! The 2022 Cassandra World Party is less than 2 weeks away, and we hope you're planning to join us! In case you missed it, the schedule and moderators have been announced and posted to the

Re: Adding nodes

2022-07-08 Thread Jeff Jirsa
Having a node UJ but not sending/receiving other streams is an invalid state (unless 4.0 moved the streaming data out of netstats? I'm not 100% sure, but I'm 99% sure it should be there). It likely stopped the bootstrap process long ago with an error (which you may not have seen), and is running

Re: Adding nodes

2022-07-08 Thread Bowen Song via user
From the nodetool output you quoted, I seriously suspect your Cassandra nodes have at least one of the following issues: * Clock out of sync * Bad network connectivity between nodes * Long GC pauses * Broken disks * CPU bottleneck It's not normal to see over 2% dropped small

RE: Adding nodes

2022-07-08 Thread Marc Hoppins
Ifconfig shows RX of 1.1T. This doesn't seem to fit with the LOAD of 145GiB (nodetool status), unless I am reading that wrong...and the fact that this node still has a status of UJ. Netstats on this node shows (other than : Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch

Re: Adding nodes

2022-07-08 Thread Bowen Song via user
I would assume that's 85 GB (i.e. gigabytes) then. Which is approximately 79 GiB (i.e. gibibytes). This still sounds awfully slow - less than 1MB/s over a full day (24 hours). You said CPU and network aren't the bottleneck. Have you checked the disk IO? Also, be mindful with CPU usage. It can

<    5   6   7   8   9   10   11   12   13   14   >