Cassandra time series + Spark

2015-03-23 Thread Rumph, Frens Jan
Hi, I'm working on a system which has to deal with time series data. I've been happy using Cassandra for time series and Spark looks promising as a computational platform. I consider chunking time series in Cassandra necessary, e.g. by 3 weeks as kairosdb does it. This allows an 8 byte chunk star

RDD partitions per executor in Cassandra Spark Connector

2015-03-02 Thread Rumph, Frens Jan
Hi all, I didn't find the *issues* button on https://github.com/datastax/spark-cassandra-connector/ so posting here. Any one have an idea why token ranges are grouped into one partition per executor? I expected at least one per core. Any suggestions on how to work around this? Doing a repartition

PySpark Cassandra forked

2015-02-20 Thread Rumph, Frens Jan
Hi all, Wanted to let you know I've forked PySpark Cassandra on https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the original code didn't work for me and I couldn't figure out how it could work. But it inspired! so I rewrote the majority of the project. The rewrite implements ful

PySpark and Cassandra

2015-02-16 Thread Rumph, Frens Jan
Hi, I'm trying to connect to Cassandra through PySpark using the spark-cassandra-connector from datastax based on the work of Mike Sukmanowsky. I can use Spark and Cassandra through the datastax connector in Scala just fine. Where things fail in PySpark is that an exception is raised in org.apach