Re: pyspark and cassandra

Oleg Ruchovets Wed, 10 Sep 2014 08:32:43 -0700

Hi ,
  I try to evaluate different option of spark + cassandra and I have couple
of additional questions.
  My aim is to use cassandra only without hadoop:
  1) Is it possible to use only cassandra as input/output parameter for
PySpark?
  2) In case I'll use Spark (java,scala) is it possible to use only
cassandra - input/output without hadoop?
  3) I know there are couple of strategies for storage level, in case my
data set is quite big and I have no enough memory to process - can I use
DISK_ONLY option without hadoop (having only cassandra)?


Thanks
Oleg

On Wed, Sep 3, 2014 at 3:08 AM, Kan Zhang <[email protected]> wrote:

> In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See
> examples/src/main/python/cassandra_inputformat.py for an example. You may
> need to write your own key/value converters.
>
>
> On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets <[email protected]>
> wrote:
>
>> Hi All ,
>>    Is it possible to have cassandra as input data for PySpark. I found
>> example for java -
>> http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and
>> I am looking something similar for python.
>>
>> Thanks
>> Oleg.
>>
>
>

Re: pyspark and cassandra

Reply via email to