Re: Spark Cassandra Python Connector

2016-06-20 Thread Jonathan Haddad
I wouldn't recommend the TargetHolding lib. It's only useful for working with RDDs which are a terrible idea in Python, as the perf will make you cry with any reasonable sized dataset. The Datastax spark Cassandra connector works with Python + Dataframes without the crazy overhead of RDDs. Docs

Re: Incremental repairs in 3.0

2016-06-20 Thread Bryan Cheng
Sorry, meant to say "therefore manual migration procedure should be UNnecessary" On Mon, Jun 20, 2016 at 3:21 PM, Bryan Cheng wrote: > I don't use 3.x so hopefully someone with operational experience can chime > in, however my understanding is: 1) Incremental repairs

Re: Incremental repairs in 3.0

2016-06-20 Thread Bryan Cheng
I don't use 3.x so hopefully someone with operational experience can chime in, however my understanding is: 1) Incremental repairs should be the default in the 3.x release branch and 2) sstable repairedAt is now properly set in all sstables as of 2.2.x for standard repairs and therefore manual

Re: Spark Cassandra Python Connector

2016-06-20 Thread Dennis Lovely
https://github.com/TargetHolding/pyspark-cassandra On Mon, Jun 20, 2016 at 1:47 PM, Joaquin Alzola wrote: > Hi List > > Is there a Spark Cassandra connector in python? Of course there is the one > for scala ... > > BR > > Joaquin > This email is confidential and may

Spark Cassandra Python Connector

2016-06-20 Thread Joaquin Alzola
Hi List Is there a Spark Cassandra connector in python? Of course there is the one for scala ... BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon

Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-20 Thread Atul Saroha
We have tried this with 3.5 and there also heap usage was optimized as in 3.7. Though we have to roll-back from 3.5 to 3.0.3 due to CASSANDRA-11513 .

Counter update write timeouts with Datastax Driver/Native protocol, not with Astyanax/Thrift

2016-06-20 Thread Steven Levitt
I've posted the following to the Datastax Java Driver user forum, but no one has responded, so I thought I'd try here, too. We have a service that writes to a few legacy (pre-CQL) counter column families in a Cassandra 2.1.11 cluster. We've been trying to migrate this service from Astyanax to the

Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-20 Thread Paulo Motta
You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and could potentially cause OOMs for long-running repairs. 2016-06-20 13:26 GMT-03:00 Robert Stupp : > One possibility might be CASSANDRA-11206 (Support large partitions on the > 3.0 sstable format), which

Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3

2016-06-20 Thread Robert Stupp
One possibility might be CASSANDRA-11206 (Support large partitions on the 3.0 sstable format), which reduces heap usage for other operations (like repair, compactions) as well. You can verify that by setting column_index_cache_size_in_kb in c.yaml to a really high value like 1000 - if you

Estimating partition size for C*2.X and C*3.X and Time Series Data Modelling.

2016-06-20 Thread G P
Hello, I'm currently enrolled in a master's degree and my thesis project involves the usage of Big Data tools in the context of Smart Grid applications. I explored sever storage solutions and found Cassandra to be fitting to my problem. The data is mostly Time Series data, incoming from