Hi,
I'm trying to connect to Cassandra through PySpark using the
spark-cassandra-connector from datastax based on the work of Mike
Sukmanowsky.
I can use Spark and Cassandra through the datastax connector in Scala just
fine. Where things fail in PySpark is that an exception is raised in
org.apach
Hi All ,
Is it possible to have cassandra as input data for PySpark. I found
example for java -
http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and I
am looking something similar for python.
Thanks
Oleg.
In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See
examples/src/main/python/cassandra_inputformat.py for an example. You may
need to write your own key/value converters.
On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets
wrote:
> Hi All ,
>Is it possible to have cassand
Hi ,
I try to evaluate different option of spark + cassandra and I have couple
of additional questions.
My aim is to use cassandra only without hadoop:
1) Is it possible to use only cassandra as input/output parameter for
PySpark?
2) In case I'll use Spark (java,scala) is it possible to use
Thanks for the clarification, Yadid. By "Hadoop jobs," I meant Spark jobs
that use Hadoop inputformats (as shown in the cassandra_inputformat.py
example).
A future possibility of accessing Cassandra from PySpark is when SparkSQL
supports Cassandra as a data source.
On Wed, Sep 10, 2014 at 11:37
Hey all,
Just thought I'd share this with the list in case any one else would
benefit. Currently working on a proper integration of PySpark and
DataStax's new Cassandra-Spark connector, but that's on going.
In the meanwhile, I've basically updated the cassandra_inputformat.py and
cassandra_outpu
Nice!
- Helena
@helenaedelson
On Oct 29, 2014, at 12:01 PM, Mike Sukmanowsky
wrote:
> Hey all,
>
> Just thought I'd share this with the list in case any one else would benefit.
> Currently working on a proper integration of PySpark and DataStax's new
> Cassandra-Spark connector, but that'