Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
Complex predicates on non-partition keys naturally require pulling the entire data set into the Spark DataFrame to perform the query. We have some optimizations around column filtering and partition key predicates, utilizing the Filter.db/Summary.db/Index.db files to only read the data it

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread Jeremy Hanna
Thank you for the write-up and the efforts on CASSANDRA-16222. It sounds like you've been using this for some time. I understand from the rejected alternatives that the Spark Cassandra Connector was slower because it goes through the read and write path for C* rather than this backdoor

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
On the Sidecar discussion, while Sidecar is the preferred mechanism for the reasons described, the API is sufficiently generic enough to plugin a user implementations (essentially provide a list of sstables for a token range, and a mechanism to open an InputStream on any SSTable file

Re: [EXTERNAL] Re: Cassandra CI Status 2023-01-07

2023-03-27 Thread Josh McKenzie
I'll take build lead for the next 2 weeks. On Sat, Mar 25, 2023, at 4:50 PM, Mick Semb Wever wrote: >> Here comes Cassandra CI status for 2023-3-13 - 2023-23-179 : >> >> *** CASSANDRA-18338 >> -

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-27 Thread Caleb Rackliffe
Minimizing uncertainty is a nice abstract goal. What I worry about is that we ultimately create more of it (and more work/thrashing for ourselves) by not basing Accord on TCM at the earliest responsible moment.Again, although I created this thread, the state of the world is telling me a decision

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-27 Thread Benedict
I don’t know that TCM is a greater source of uncertainty. There is a degree of uncertainty about that :)I just think it’s better not to compound uncertainties, at least while it is not costly to avoid it.On 27 Mar 2023, at 08:27, Henrik Ingo wrote:Seems like this thread is the more appropriate

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-27 Thread Henrik Ingo
Seems like this thread is the more appropriate one for Accord/TCM discussion IMO the priority here should be: 1. Release CEP-15 as part of 5.0, this year, with or without CEP-21. 2. Minimize work arising from porting between branches. (e.g. first onto CEP-21, then to trunk, or vice versa. But

Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-27 Thread Henrik Ingo
Not so fast... There's certainly value in spending that time stabilizing the already done features. It's valuable triaging information to say this used to work before CEP-21 and only broke after it. That said, having a very long freeze of trunk, or alternatively having a very long lived 5.0