Hi,

At https://github.com/datastax/spark-cassandra-connector I see that you are extending API that Spark provides for interacting with RDDs to leverage some native Cassandra features. We are using Apache Cassandra together with PySpark to do some analytics and since we have community version, we use classic api calls like sc.newAPIHadoopRDD which means writing converters for data in Scala. We would like to use calls such as sc.cassandraTable but I don't see these methods anywhere in PySpark and https://github.com/datastax/spark-cassandra-connector does not even mention access from Python.

In http://www.datastax.com/documentation/datastax_enterprise/4.7/datastax_enterprise/spark/sparkPySpark.html I see however that you are using these methods in PySpark. Does it mean Spark Cassandra Connector for Python is available only in DataStax Enterprise and we have to buy it to use that API and features like server-side filtering from PySpark?

Also at https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra.py I see that there is some effort to interface CassandraSparkContext to Python, does it mean that those guys are duplicating your work?

Regards,
Marek WiewiĆ³rski
Opera Software

Reply via email to