Hi , I try to evaluate different option of spark + cassandra and I have couple of additional questions. My aim is to use cassandra only without hadoop: 1) Is it possible to use only cassandra as input/output parameter for PySpark? 2) In case I'll use Spark (java,scala) is it possible to use only cassandra - input/output without hadoop? 3) I know there are couple of strategies for storage level, in case my data set is quite big and I have no enough memory to process - can I use DISK_ONLY option without hadoop (having only cassandra)?
Thanks Oleg On Wed, Sep 3, 2014 at 3:08 AM, Kan Zhang <[email protected]> wrote: > In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See > examples/src/main/python/cassandra_inputformat.py for an example. You may > need to write your own key/value converters. > > > On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets <[email protected]> > wrote: > >> Hi All , >> Is it possible to have cassandra as input data for PySpark. I found >> example for java - >> http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and >> I am looking something similar for python. >> >> Thanks >> Oleg. >> > >
