Hi,
At https://github.com/datastax/spark-cassandra-connector I see that you
are extending API that Spark provides for interacting with RDDs to
leverage some native Cassandra features. We are using Apache Cassandra
together with PySpark to do some analytics and since we have community
version, we use classic api calls like sc.newAPIHadoopRDD which means
writing converters for data in Scala. We would like to use calls such as
sc.cassandraTable but I don't see these methods anywhere in PySpark and
https://github.com/datastax/spark-cassandra-connector does not even
mention access from Python.
In
http://www.datastax.com/documentation/datastax_enterprise/4.7/datastax_enterprise/spark/sparkPySpark.html
I see however that you are using these methods in PySpark. Does it mean
Spark Cassandra Connector for Python is available only in DataStax
Enterprise and we have to buy it to use that API and features like
server-side filtering from PySpark?
Also at
https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra.py
I see that there is some effort to interface CassandraSparkContext to
Python, does it mean that those guys are duplicating your work?
Regards,
Marek WiewiĆ³rski
Opera Software