Re: Bootstrap error - Cassandra 4.1.5

2024-08-15 Thread Joe Obernberger
hours or days before the final failure. If your cluster is already experiencing performance issues (e.g. due to CPU bottleneck or GC pauses), it's highly likely that these are related to the streaming failure. On 15/08/2024 13:41, Joe Obernberger wrote: Thank you Bowen - yeah the only ER

Re: Bootstrap error - Cassandra 4.1.5

2024-08-15 Thread Joe Obernberger
prevent or reduce the chance of it happening again in the future. On 14/08/2024 21:50, Joe Obernberger wrote: Hi all - when adding a node to our existing 15 node cluster, I get: DEBUG [NonPeriodicTasks:1] 2024-08-14 20:34:10,383 StreamCoordinator.java:152 - Finished connecting all sessions WARN  [N

Bootstrap error - Cassandra 4.1.5

2024-08-14 Thread Joe Obernberger
Hi all - when adding a node to our existing 15 node cluster, I get: DEBUG [NonPeriodicTasks:1] 2024-08-14 20:34:10,383 StreamCoordinator.java:152 - Finished connecting all sessions WARN  [NonPeriodicTasks:1] 2024-08-14 20:34:10,385 StreamResultFuture.java:242 - [Stream #d7bf9f60-5a5e-11ef-aa7

Re: [RELEASE] Apache Cassandra 5.0-rc1 released

2024-07-19 Thread Joe Obernberger
Great news.  Does anyone know if there have been updates to vectors support and search? -Joe On 7/18/2024 4:58 PM, Mick Semb Wever wrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 5.0-rc1. Apache Cassandra is a fully distributed database. It is the r

Re: Cassandra 5.0 Beta1 - vector searching results

2024-03-27 Thread Joe Obernberger
-community-over-code-na-denver-2024-performance-track-paul-brebner-nagmc/?trackingId=PlmmMjMeQby0Mozq8cnIpA%3D%3D Regards, Paul Brebner *From: *Joe Obernberger *Date: *Friday, 22 March 2024 at 3:19 am *To: *user@cassandra.apache.org *Subject: *Cassandra 5.0 Beta1 - vector searching results EXTERNA

Cassandra 5.0 Beta1 - vector searching results

2024-03-21 Thread Joe Obernberger
Hi All - I'd like to share some initial results for the vector search on Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp storage. Have a table (doc.embeddings_googleflan5tlarge) with definition: CREATE TABLE doc.embeddings_googleflant5large (     uuid text,     type tex

Startup errors - 4.1.3

2023-08-30 Thread Joe Obernberger
Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK.  I started to see a lot of timeout errors, and discovered one of the nodes had this message constantly repeated: "waiting to acquire a permit to begin streaming" - so perhaps I hit this bug: https://www.mail-archive.com/commits

Re: Big Data Question

2023-08-21 Thread Joe Obernberger
ed is indistinguishable from magic." Magic is coming, and it's coming for all of us/ / / *Daemeon Reiydelle* *email: daeme...@gmail.com* *LI: https://www.linkedin.com/in/daemeonreiydelle/* *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle* On Thu, Aug 17, 2023 at 1:53 PM Joe O

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
t you with incremental repairs? It will make repairs hell of a lot faster. Of course, occasional full repair is still needed, but that's another story. On 17/08/2023 21:36, Joe Obernberger wrote: Thank you.  Enjoying this conversation. Agree on blade servers, where each blade has a small

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
y and page cache. 1TB of RAM for 24GB heap * 40 instances is definitely not enough. You'll most likely need between 1.5 and 2 TB memory for 40x 24GB heap nodes. You may be better off with blade servers than single server with gigantic memory and disk sizes. On 17/08/2023 15:46, Joe Obernbe

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
itially thought. Your choice of compaction strategy and compression ratio can dramatically affect this calculation. On 16/08/2023 16:33, Joe Obernberger wrote: General question on how to configure Cassandra.  Say I have 1PByte of data to store.  The general rule of thumb is that each node (or at

Big Data Question

2023-08-16 Thread Joe Obernberger
General question on how to configure Cassandra.  Say I have 1PByte of data to store.  The general rule of thumb is that each node (or at least instance of Cassandra) shouldn't handle more than 2TBytes of disk.  That means 500 instances of Cassandra. Assuming you have very fast persistent stora

Re: Repair errors

2023-08-11 Thread Joe Obernberger
. Regards Manish On Mon, Aug 7, 2023 at 11:39 PM Joe Obernberger wrote: Thank you.  I've tried: nodetool repair --full nodetool repair -pr They all get to 57% on any of the nodes, and then fail. Interestingly the debug log only has INFO - there are no errors.

Re: Repair errors

2023-08-07 Thread Joe Obernberger
ecluster was not. Thanks Surbhi On Fri, Aug 4, 2023 at 8:59 AM Joe Obernberger wrote: Hi All - been using reaper to do repairs, but it has hung.  I tried to run: nodetool repair -pr on each of the nodes, but they all fail with some form of this error: error: Repair job

Repair errors

2023-08-04 Thread Joe Obernberger
Hi All - been using reaper to do repairs, but it has hung.  I tried to run: nodetool repair -pr on each of the nodes, but they all fail with some form of this error: error: Repair job has failed with the error message: Repair command #521 failed with error Did not get replies from all endpoints.

Pulling unreceived schema versions

2023-02-13 Thread Joe Obernberger
Hi all - I'm seeing this message: "Pulling unreceived schema versions..." in the debug log being repeated exactly every minute, but I can't find what this means? Thank you! -Joe -- This email has been checked for viruses by AVG antivirus software. www.avg.com

Re: Startup fails - 4.1.0

2023-02-03 Thread Joe Obernberger
and it is easier to remove all commit log files to get the node restarted. Sean R. Durity *From:*Joe Obernberger *Sent:* Friday, February 3, 2023 3:15 PM *To:* user@cassandra.apache.org *Subject:* [EXTERNAL] Startup fails - 4.1.0 Hi all - cluster had a power outage and one of the nodes in a 14

Startup fails - 4.1.0

2023-02-03 Thread Joe Obernberger
Hi all - cluster had a power outage and one of the nodes in a 14 nodes cluster isn't starting with: DEBUG [MemtableFlushWriter:1] 2023-02-03 13:52:45,468 ColumnFamilyStore.java:1329 - Flushed to [BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b

Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
ne of them isnt incremening, that one is probably stuck. There's at least one bug in 4.1 that can cause (I think? rate limiters) to interact in a way that can cause this. https://issues.apache.org/jira/browse/CASSANDRA-18110 describes it and has a workaround. On Mon, Jan 23, 2023 at 9

removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
I had a drive fail (first drive in the list) on a Cassandra cluster.  I've stopped the node (as it no longer starts), and am trying to remove it from the cluster, but the removenode command is hung (been running for 3 hours so far): nodetool removenode status is always reporting the same token a

Re: Failed disks - correct procedure

2023-01-23 Thread Joe Obernberger
pping a replacement node is necessary to avoid this. — Scott On Jan 17, 2023, at 7:28 AM, Joe Obernberger wrote:  I come from the hadoop world where we have a cluster with probably over 500 drives.  Drives fail all the time; or well several a year anyway.  We remove that single drive

Re: Failed disks - correct procedure

2023-01-17 Thread Joe Obernberger
.  Usually the fix exceeds the wait time and the node is then out of the system anyway. -----Original Message- From: Joe Obernberger Sent: Monday, January 16, 2023 6:31 PM To: Jeff Jirsa ; user@cassandra.apache.org Subject: Re: Failed disks - correct procedure EXTERNAL I'm using 4.1.0-1.

Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger
really bad state (lots of streaming = lots of compactions = slower reads) and you may be seeing some inconsistency if repairs weren't regularly running beforehand. How much data was on the drive that failed? How much data do you usually have per node? Thanks, Andy On Mon, Jan 16, 2023 at 1

Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger
anges, but you also will want to do repairs ahead of time too. To be honest it's not something I've done recently, so I'm not as confident on executing that procedure. Thanks, Andy On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger wrote: Hi all - what is the correct procedure whe

Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger
Hi all - what is the correct procedure when handling a failed disk? Have a node in a 15 node cluster.  This node has 16 drives and cassandra data is split across them.  One drive is failing.  Can I just remove it from the list and cassandra will then replicate? If not - what? Thank you! -Joe

Re: Adding nodes

2022-07-11 Thread Joe Obernberger
I too came from HBase and discovered adding several nodes at a time doesn't work.  Are you absolutely sure that the clocks are in sync across the nodes?  This has bitten me several times. -Joe On 7/11/2022 6:23 AM, Bowen Song via user wrote: You should look for warning and error level logs i

Re: removing a drive - 4.0.1

2022-06-09 Thread Joe Obernberger
Jan 7, 2022 at 4:23 PM Joe Obernberger wrote: Hi - in order to get the node back up and running I did the following: Deleted all data on the node: Added: -Dcassandra.replace_address=172.16.100.39 to the cassandra.env.sh <http://cassandra.env.sh> file, and started it

Re: Malformed IPV6 address

2022-04-27 Thread Joe Obernberger
Thank you. The -Dcom.sun.jndi.rmiURLParsing=legacy works for me. -Joe On 4/27/2022 4:28 AM, Erick Ramirez wrote: This issue was reported in https://community.datastax.com/questions/13764/ as well. TL;DR the URL parser for JNDI providers was made stricter in Oracle Java 8u331 and brackets are

Re: Malformed IPV6 address

2022-04-26 Thread Joe Obernberger
ue" There's a chance that fixes it (for an unpleasant reason). Did you get a specific stack trace / log message at all? or just that error? On Tue, Apr 26, 2022 at 1:47 PM Joe Obernberger wrote: Hi All - upgraded java recently (java-11-openjdk-11.0.15.0.9-2.el7_9.x86_64) ,

Malformed IPV6 address

2022-04-26 Thread Joe Obernberger
Hi All - upgraded java recently (java-11-openjdk-11.0.15.0.9-2.el7_9.x86_64) , and now getting: nodetool: Failed to connect to '127.0.0.1:7199' - URISyntaxException: 'Malformed IPv6 address at index 7: rmi://[127.0.0.1]:7199'. whenever running nodetool. What am I missing? Thanks! -Joe --

Re: about the performance of select * from tbl

2022-04-26 Thread Joe Obernberger
This would be a good use case for Spark + Cassandra. -Joe On 4/26/2022 8:48 AM, 18624049226 wrote: We have a business scenario. We must execute the following statement: select * from tbl; This CQL has no WHERE condition. What I want to ask is that if the data in this table is more than one

Re: Cassandra Management tools?

2022-03-01 Thread Joe Obernberger
umber of servers is obviously not practical due the the available screen space constraint. On 28/02/2022 21:59, Joe Obernberger wrote: Hi all - curious what tools are folks using to manage large Cassandra clusters?  For example, to do tasks such as nodetool cleanup after a node or nodes are add

Cassandra Management tools?

2022-02-28 Thread Joe Obernberger
Hi all - curious what tools are folks using to manage large Cassandra clusters?  For example, to do tasks such as nodetool cleanup after a node or nodes are added to the cluster, or simply rolling start/stops after an update to the config or a new version? We've used puppet before; is that what

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
.  Fast! Good times ahead. -Joe On 2/8/2022 10:00 AM, Joe Obernberger wrote: Update - I believe that for large tables, the spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or longer.  The job now runs much longer, but still doesn't complete.  I'm now facing th

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
ts? Thanks all. -Joe On 2/7/2022 10:35 AM, Joe Obernberger wrote: Some more info.  Tried different GC strategies - no luck. It only happens on large tables (more than 1 billion rows). Works fine on a 300million row table.  There is very high CPU usage during the run. I&#

Re: Query timed out after PT2M

2022-02-07 Thread Joe Obernberger
/3/2022 9:30 PM, manish khandelwal wrote: It maybe the case you have lots of tombstones in this table which is making reads slow and timeouts during bulk reads. On Fri, Feb 4, 2022, 03:23 Joe Obernberger wrote: So it turns out that number after PT is increments of 60 seconds.  I chan

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
as 40 cores.  Xmx is set to 32G. 13 node cluster. Any ideas on what else to try? -Joe On 2/4/2022 10:39 AM, Joe Obernberger wrote: Still no go.  Oddly, I can use trino and do a count OK, but with spark I get the timeouts.  I don't believe tombstones are an issue: nodetool cfstats doc

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
w and timeouts during bulk reads. On Fri, Feb 4, 2022, 03:23 Joe Obernberger wrote: So it turns out that number after PT is increments of 60 seconds.  I changed the timeout to 96, and now I get PT16M (96/6).  Since I'm still getting timeouts, something else mus

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
calRunnable.run(FastThreadLocalRunnable.java:30) -Joe On 2/3/2022 3:30 PM, Joe Obernberger wrote: I did find this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md And "spark.cassandra.read.timeoutMS" is set to 12. Running a test now, and I thi

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
I did find this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md And "spark.cassandra.read.timeoutMS" is set to 12. Running a test now, and I think that is it.  Thank you Scott. -Joe On 2/3/2022 3:19 PM, Joe Obernberger wrote: Thank you S

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
;re running a Spark job, I'd recommend using the DataStax Spark Cassandra Connector which distributes your query to executors addressing slices of the token range which will land on replica sets, avoiding the scatter-gather behavior that can occur if using the Java driver alone. Cheers, – Scott

Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
Hi all - using a Cassandra 4.0.1 and a spark job running against a large table (~8 billion rows) and I'm getting this error on the client side: Query timed out after PT2M On the server side I see a lot of messages like: DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647 ReadCallback.j

Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger
removed first item from the list system data regenerated in the new first directory in the list. And then merged??? when original first dir returned On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger wrote: Hi - in order to get the node back up and running I did the following: Deleted all

Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger
-cassandra-node-already-exists can you be little narrate 'If I remove a drive other than the first one'? what does it means On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger wrote: Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a node

removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger
Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a node, edit the cassandra.yaml file, comment out the first drive in the list, and restart the node, it fails to start saying that a node already exists in the cluster with the IP address. If I put the drive back into the li

Re: Node failed after drive failed

2021-12-13 Thread Joe Obernberger
it common for the node to come down? / this actually depends on the disk_failure_policy in your cassandra.yaml file, read the comments in it will help you understand the available choices. Cheers, Bowen On 06/12/2021 14:11, Joe Obernberger wrote: Hi All - one node in an 11 node cluster expe

Re: Added node - now queries time out

2021-12-09 Thread Joe Obernberger
ye on the log. On 03/12/2021 20:51, Joe Obernberger wrote: Hi all - just added a node to an 11 node cluster (4.0.1) and it synced up OK, but now all queries are timing out. This time I made sure the clocks are synced!  :) Kinda desperate to get this to work again.  What can I check do? Just

Node failed after drive failed

2021-12-06 Thread Joe Obernberger
nd re-add the node?  When a drive fails with cassandra, is it common for the node to come down? Thank you! -Joe Obernberger

Re: Added node - now queries time out

2021-12-03 Thread Joe Obernberger
dd it, and keep an eye on the log. On 03/12/2021 20:51, Joe Obernberger wrote: Hi all - just added a node to an 11 node cluster (4.0.1) and it synced up OK, but now all queries are timing out. This time I made sure the clocks are synced!  :) Kinda desperate to get this to work again.  What can

Added node - now queries time out

2021-12-03 Thread Joe Obernberger
Hi all - just added a node to an 11 node cluster (4.0.1) and it synced up OK, but now all queries are timing out. This time I made sure the clocks are synced!  :) Kinda desperate to get this to work again.  What can I check do? Just added the .34 node.  One item of concern is the amount of load

Re: High read Latency

2021-11-29 Thread Joe Obernberger
  AND max_index_interval = 2048     AND memtable_flush_period_in_ms = 0     AND min_index_interval = 128     AND read_repair = 'BLOCKING'     AND speculative_retry = '99p'; -Joe On 11/29/2021 11:22 AM, Joe Obernberger wrote: I have an 11 node cluster and am experiencing hig

High read Latency

2021-11-29 Thread Joe Obernberger
I have an 11 node cluster and am experiencing high read latency on one table.  This table has ~112 million rows:  nodetool tablehistograms doc.origdoc doc/origdoc histograms Percentile  Read Latency Write Latency SSTables    Partition Size    Cell Count     (micros)

Re: 4.0.1 - adding a node

2021-11-01 Thread Joe Obernberger
Hi Erick - yes I do.  There is a good chance that I'll remove hercules from the cluster (4 tokens), it was the first system that I put cassandra on.  Chaos has less drives than the other systems. -joe On 10/29/2021 7:57 PM, Erick Ramirez wrote: Out of curiosity, what's up with hercules and cha

Re: 4.0.1 - adding a node

2021-10-29 Thread Joe Obernberger
. On Thu, Oct 28, 2021 at 12:04 PM Joe Obernberger wrote: I recently added a node to a cluster.  Immediately after adding the node, the cluster status (nyx is the new node): UJ nyx.querymasters.com <http://nyx.querymasters.com> 181.25 KiB 250 ? 07bccfce-45f

4.0.1 - adding a node

2021-10-28 Thread Joe Obernberger
I recently added a node to a cluster.  Immediately after adding the node, the cluster status (nyx is the new node): UJ  nyx.querymasters.com    181.25 KiB 250 ? 07bccfce-45f1-41a3-a5c4-ee748a7a9b98 rack1 UN  enceladus.querymasters.com  569.53 GiB  200 35.1%

Re: Tombstones? 4.0.1

2021-10-25 Thread Joe Obernberger
ing nulls in some columns? Are you TTL'ing everything ? On Mon, Oct 25, 2021 at 3:28 PM Joe Obernberger wrote: Update - after 10 days, I'm able to use the table again; prior to that all selects timed out. Are deletes basically forbidden with Cassandra?  If you have

Re: Tombstones? 4.0.1

2021-10-25 Thread Joe Obernberger
zero rows, after deleting them, I can no longer do a select from the table as it times out. Thank you! -Joe On 10/14/2021 3:38 PM, Joe Obernberger wrote: I'm not sure if tombstones is the issue; is it?  Grace is set to 10 days, that time has not passed yet. -Joe On 10/14/2021 1:37

Re: Tombstones? 4.0.1

2021-10-14 Thread Joe Obernberger
n Thu, Oct 14, 2021 at 8:49 AM Joe Obernberger wrote: Hi all - I have a table where I've needed to delete a number of rows. I've run repair, but I still can't select from the table. select * from doc.indexorganize limit 10; OperationTimedOut: errors={'172

Tombstones? 4.0.1

2021-10-14 Thread Joe Obernberger
Hi all - I have a table where I've needed to delete a number of rows.  I've run repair, but I still can't select from the table. select * from doc.indexorganize limit 10; OperationTimedOut: errors={'172.16.100.37:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host

Re: Latest Supported RedHat Linux version for Cassandra 3.11

2021-09-27 Thread Joe Obernberger
Just as a data point - I'm running 4.0.1 on Rocky Linux 8x and CentOS Stream 8.x. -Joe On 9/27/2021 12:09 PM, Saha, Sushanta K wrote: I am currently running Open Source Apache Cassandra 3.11.1 on RedHat 7.7. But, need to upgrade the OS to RedHat to 7.9 or 8.x. The site cassandra.apache.org/

Re: COUNTER timeout

2021-09-15 Thread Joe Obernberger
ure they are not the bottleneck On 15/09/2021 20:34, Joe Obernberger wrote: Thank you Erick - looking through all the logs on the nodes I found this: INFO  [CompactionExecutor:17551] 2021-09-15 15:13:20,524 CompactionTask.java:245 - Compacted (fb0cdca0-1658-11ec-9098-dd70c3a3487a) 4 sstables to [/d

Re: COUNTER timeout

2021-09-15 Thread Joe Obernberger
Thank you Erick - looking through all the logs on the nodes I found this: INFO  [CompactionExecutor:17551] 2021-09-15 15:13:20,524 CompactionTask.java:245 - Compacted (fb0cdca0-1658-11ec-9098-dd70c3a3487a) 4 sstables to [/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/n

COUNTER timeout

2021-09-14 Thread Joe Obernberger
I'm getting a lot of the following errors during ingest of data: com.datastax.oss.driver.api.core.servererrors.WriteTimeoutException: Cassandra timeout during COUNTER write query at consistency ONE (1 replica were required but only 0 acknowledged the write)     at com.datastax.oss.driver.a

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
0 ms and Mean cross-node dropped latency: 15137813 ms Can you please check that the NTP client is running on all servers and the clocks are in sync? Cheers, Bowen On 10/09/2021 16:18, Joe Obernberger wrote: Good idea. There are two seed nodes: I see this on one (note 172.16.100.44 i

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
s the issue fixed in 4.0.1 (CASSANDRA-16877) was, you may see some indication there. Sam On 10 Sep 2021, at 15:56, Joe Obernberger wrote: Thank you Jeff - yes, this is on the latest 4.0.1 nodetool version ReleaseVersion: 4.0.1 nodetool status Datacenter: datacenter1 === Sta

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
, and was released 3 days ago. I've never seen it on a 10 node cluster before, but I'd be trying that. On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger wrote: I have a 10 node cluster and am trying to add another node.  The new node is running Rocky Linux and I'm gettin

Unable to Gossip

2021-09-10 Thread Joe Obernberger
I have a 10 node cluster and am trying to add another node.  The new node is running Rocky Linux and I'm getting the unable to gossip with any peers error.  Firewall and SELinux are off.  I can ping all the other nodes OK.  I've checked everything I can think of (/etc/hosts, listen_address, bro

Re: New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
get most of the redundancy of having lots of small nodes in few(er) rack units. SuperMicro has a chassis that can accommodate 14 servers in 4U: https://www.supermicro.com/en/products/superblade/enclosure#4U - Max On Aug 2, 2021, at 12:05 pm, Joe Obernberger wrote: Thank you Jeff.  Consider

Re: New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
x27;d also need 24 IPs, and you'd need a NIC that could send/receive 24x the normal bandwidth. And the cost of rebuilding such a node would be 24x higher than normal (so consider how many of those you'd have in a cluster, and how often they'd fail). On Mon, Aug 2, 2021 at 11:0

New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
We have a large amount of data to be stored in Cassandra, and if we were to purchase new hardware in limited space, what would make the most sense? Dell has machines with 24, 8TByte drives in a 2u configuration. Given Cassandra's limitations (?) to large nodes, would it make sense to run 24 copi

Re: [RELEASE] Apache Cassandra 4.0.0 released

2021-07-26 Thread Joe Obernberger
Whoo hoo!  Looking forward to trying it out! -Joe On 7/26/2021 4:03 PM, Brandon Williams wrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 4.0.0. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high a

Re: [RELEASE] Apache Cassandra 4.0-rc2 released

2021-06-30 Thread Joe Obernberger
Downloading now!  Thank you!  yum update cassandra! -Joe On 6/30/2021 3:55 PM, Patrick McFadin wrote: Congrats to everyone that worked on this iteration. If you haven't looked at the CHANGES.txt there were some great catches in RC1. Just like it should happen! On Wed, Jun 30, 2021 at 12:29 P

Re: Which open source or free tool do you use to monitor cassandra clusters?

2021-06-16 Thread Joe Obernberger
I've been using Grafana+Prometheus and the jmx_prometheus_javaagent-0.15.0.jar agent on the cassandra cluster. Then use CassandraReaper for scheduled repairs. Used this guide: https://www.cloudwalker.io/2020/05/17/monitoring-cassandra-with-prometheus/ -Joe On 6/16/2021 11:21 AM, Surbhi Gupta

Re: multiple clients making schema changes at once

2021-06-03 Thread Joe Obernberger
How does this work?  I have a program that runs a series of alter table statements, and then does inserts.  In some cases, the insert happens immediately after the alter table statement and the insert fails because the schema (apparently) has not had time to propagate.  I get an Undefined colum

Re: Datastax error - failed to allocate direct memory

2021-05-24 Thread Joe Obernberger
Please dis-regard - this appears to be a netty issue not a datastax/cassandra issue.  My apologies! -joe On 5/24/2021 11:05 AM, Joe Obernberger wrote: I'm getting the following error using 4.0RC1.  I've increased direct memory to 1g with:  -XX:MaxDirectMemorySize=1024m The error com

Datastax error - failed to allocate direct memory

2021-05-24 Thread Joe Obernberger
I'm getting the following error using 4.0RC1.  I've increased direct memory to 1g with:  -XX:MaxDirectMemorySize=1024m The error comes from an execute statement on a static PreparedStatement.  It runs fine for a while, and then dies. Any ideas? 2021-05-24 11:03:10,342 ERROR [io.qua.ver.htt.run.

Re: RC1 - joining cluster

2021-05-12 Thread Joe Obernberger
. raft.so <https://raft.so> - Cassandra consulting, support, and managed services On Mon, May 10, 2021 at 10:23 PM Joe Obernberger wrote: Hi - I waited 3 hours.  It was syncing up data; I could see network traffic, but then it stopped.  I didn't check netstats, but

Re: Counter errors - RC1

2021-05-11 Thread Joe Obernberger
raft.so <https://raft.so> - Cassandra consulting, support, and managed services On Tue, May 11, 2021 at 7:44 AM Joe Obernberger wrote: Hi all - I'm getting the following error on RC1: WARN  [Messaging-EventLoop-3-23] 2021-05-10 17:29:12,431 NoSpamLogger.java:95 -

Counter errors - RC1

2021-05-10 Thread Joe Obernberger
Hi all - I'm getting the following error on RC1: WARN  [Messaging-EventLoop-3-23] 2021-05-10 17:29:12,431 NoSpamLogger.java:95 - /172.16.100.39:7000->/172.16.100.248:7000-URGENT_MESSAGES-e8d21588 dropping message of type FAILURE_RSP whose timeout expired before reaching the network ERROR [Co

Re: RC1 - joining cluster

2021-05-10 Thread Joe Obernberger
May 8, 2021 at 11:23 AM Joe Obernberger wrote: Whoops - had it in the wrong datacenter.  Same issue - new node is stuck in UJ, but I can start/stop OK with systemctl. Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Mov

Re: RC1 - joining cluster

2021-05-07 Thread Joe Obernberger
StorageService.java:1090 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS -Joe On 5/7/2021 5:37 PM, Joe Obernberger wrote: When I try to halt the joining node with systemctl stop cassandra, it hangs.  I don&#

Re: RC1 - joining cluster

2021-05-07 Thread Joe Obernberger
  It hangs in UJ.  I deleted all data on the new node (not much there cuz it's new!), and tried again.  Same issue. In other news, java 11 is working.  :) -Joe On 5/7/2021 5:07 PM, Joe Obernberger wrote: Have an existing 5 node RC1 cluster and trying to join two more nodes to it. The ne

RC1 - joining cluster

2021-05-07 Thread Joe Obernberger
Have an existing 5 node RC1 cluster and trying to join two more nodes to it. The new node is stuck in the UJ status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --  Address         Load        Tokens  Owns (effective)  H

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
My bad.  It's V4, not v4.  :) Works fine with V4.  Spews errors on V5. -Joe On 5/7/2021 12:16 PM, Joe Obernberger wrote: So I'm confused. I get this on startup from the client: 2021-05-07 15:27:48,119 WARN [com.dat.oss.dri.int.cor.poo.ChannelPool] (s1-admin-1) [s

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
driver version 3.11.0? Thanks, Sam On 7 May 2021, at 16:12, Joe Obernberger wrote: I can retry Java 11. I am seeing this error a lot - still debugging, but I'll throw it out there - using 4.11.1 driver and a 4 node RC1 cluster.  I'm seeing warning in the cassandra logs abou

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
correctly) - There's a bunch of new defensive rate limiters and hot-tunable properties in the database that people will enjoy once they need to use them - JDK11� On Fri, May 7, 2021 at 6:05 AM Joe Obernberger wrote: Hi Sean - I'm using RC1 now in a research environment on

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
Hi Sean - I'm using RC1 now in a research environment on bare metal.  The biggest drawback of Cassandra for me is that Cassandra has issues working with modern large servers - a server with >32TBytes of SSD seems to be a non-starter. I tried running Cassandra with java 11, and that doesn't app

Re: RC1 - Counters

2021-05-05 Thread Joe Obernberger
oe On 5/5/2021 10:26 AM, Bowen Song wrote: This sounds like the clock on your Cassandra servers are not in sync. Can you please ensure all Cassandra servers have their clock synced (usually via NTP) and retry this? On 05/05/2021 14:42, Joe Obernberger wrote: Want to add - I am seeing this

Re: RC1 - Counters

2021-05-05 Thread Joe Obernberger
-joe On 5/5/2021 9:35 AM, Joe Obernberger wrote: I'm seeing some odd behavior with RC1 and counters - from cqlsh: cqlsh> select * from doc.seq;  id   | doccount --+--    DS |    1  DS_1 |  844 (2 rows) cqlsh> update doc.seq set doccount=doccount+1 whe

RC1 - Counters

2021-05-05 Thread Joe Obernberger
I'm seeing some odd behavior with RC1 and counters - from cqlsh: cqlsh> select * from doc.seq;  id   | doccount --+--    DS |    1  DS_1 |  844 (2 rows) cqlsh> update doc.seq set doccount=doccount+1 where id='DS_1'; OperationTimedOut: errors={'172.16.100.208:9042': 'Client r

Re: [RELEASE] Apache Cassandra 4.0-rc1 released

2021-04-25 Thread Joe Obernberger
Can't wait to try it! -joe On 4/25/2021 11:44 AM, Patrick McFadin wrote: This is pretty exciting and a huge milestone for the project. Congratulations to all the contributors who worked hard at making this the release it needed to be and honoring the database that powers the world. Patrick

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
s://en.wikipedia.org/wiki/ISO_8601#Durations <https://en.wikipedia.org/wiki/ISO_8601#Durations>/ If you need help from us to find out why did it happen, you will need to share a bit more information with us, such as the CQL query and the table definition. On 13/04/2021 16:53, Joe Obern

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
<https://en.wikipedia.org/wiki/ISO_8601#Durations>/ If you need help from us to find out why did it happen, you will need to share a bit more information with us, such as the CQL query and the table definition. On 13/04/2021 16:53, Joe Obernb

Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
I'm getting this error: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT1M but I can't find any documentation on this message.  Anyone know what this means?  I'm updating a counter value and then doing a select from the table.  The table that I'm selecting fro

Re: Huge single-node DCs (?)

2021-04-09 Thread Joe Obernberger
ge-architecture/ <https://www.backblaze.com/blog/vault-cloud-storage-architecture/> On 08/04/2021 17:50, Joe Obernberger wrote: I am also curious on this question.� Say your use case is to store 10PBytes of data in a new server room / data-center with new equipment,

Re: Huge single-node DCs (?)

2021-04-08 Thread Joe Obernberger
I am also curious on this question.  Say your use case is to store 10PBytes of data in a new server room / data-center with new equipment, what makes the most sense?  If your database is primarily write with little read, I think you'd want to maximize disk space per rack space.  So you may opt

Re: No node was available to execute query error

2021-03-17 Thread Joe Obernberger
ad idea. Ask yourself a question, what is really required to 'do something'? Do you really need *all* data each time? Is it possible to make 'do something' incremental, so you'll only need *some* data each time? � On 15/03/2021 19:33, Joe Obernberger wrote: Thank yo

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
table also can indirectly create large partitions in the index tables. On 15/03/2021 17:27, Joe Obernberger wrote: Great stuff - thank you.  I've spent the morning here redesigning with smaller partitions. If I have a large number of unique IDs that I want to regularly 'do somet

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
hould stay below a few hundred MBs, and often no more than 100 MB. On 15/03/2021 14:36, Joe Obernberger wrote: Thank you Bowen - I'm redesigning the tables now.  When you give Cassandra two parts to the primary key like create table xyz (uuid text, source text, primary key (source, uu

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
esign your table schemas, and avoid creating large or uneven partitions. On 12/03/2021 18:52, Joe Obernberger wrote: Thank you very much for helping me out on this!  The table fieldcounts is currently pretty small - 6.4 million rows. cfstats are: Total number of table

  1   2   >