Spark Cassandra Connector for Python

mwiewiorski Thu, 09 Apr 2015 04:56:46 -0700

Hi,

At https://github.com/datastax/spark-cassandra-connector I see that youare extending API that Spark provides for interacting with RDDs toleverage some native Cassandra features. We are using Apache Cassandratogether with PySpark to do some analytics and since we have communityversion, we use classic api calls like sc.newAPIHadoopRDD which meanswriting converters for data in Scala. We would like to use calls such assc.cassandraTable but I don't see these methods anywhere in PySpark andhttps://github.com/datastax/spark-cassandra-connector does not evenmention access from Python.

Inhttp://www.datastax.com/documentation/datastax_enterprise/4.7/datastax_enterprise/spark/sparkPySpark.htmlI see however that you are using these methods in PySpark. Does it meanSpark Cassandra Connector for Python is available only in DataStaxEnterprise and we have to buy it to use that API and features likeserver-side filtering from PySpark?

Also athttps://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra.pyI see that there is some effort to interface CassandraSparkContext toPython, does it mean that those guys are duplicating your work?


Regards,
Marek Wiewiórski
Opera Software

Spark Cassandra Connector for Python

Reply via email to