Re: Spark-Streaming: output to cassandra

m.sarosh Thu, 04 Dec 2014 22:26:44 -0800

Hi Gerard/Akhil,


By "how do I specify a batch" I was trying to ask that when does the data in 
the JavaDStream gets flushed into Cassandra table?.

I read somewhere that the streaming data in batches gets written in Cassandra. 
This batch can be of some particular time, or one particular run.

That was what I was trying to understand, how to set that "Batch" in my 
program. Because if a batch means one cycle run of my  streaming app, then in 
my app, I'm hitting a Ctrl+C to kill the program. So the program is 
terminating, and would the data get inserted successfully into my Cassandra 
table?

For example,

in Terminal-A I'm running Kafka-producer to stream-in messages.

Terminal-B I'm running my Streaming App. In my App there is a line 
jssc.awaitTermination();? which will keep running my App till I kill it.

Eventually I am hitting Ctrl+C in my App terminal, i.e. Terminal-B and killing 
it. So its a kind of ungraceful termination. So in this case will the data in 
my App DStream get written into Cassandra?




Thanks and Regards,

Md. Aiman Sarosh.
Accenture Services Pvt. Ltd.
Mob #:  (+91) - 9836112841.
________________________________
From: Gerard Maas <gerard.m...@gmail.com>
Sent: Thursday, December 4, 2014 10:22 PM
To: Akhil Das
Cc: Sarosh, M.; user@spark.apache.org
Subject: Re: Spark-Streaming: output to cassandra

I guess he's already doing so, given the 'saveToCassandra' usage.
What I don't understand is the question "how do I specify a batch". That 
doesn't make much sense to me. Could you explain further?

-kr, Gerard.

On Thu, Dec 4, 2014 at 5:36 PM, Akhil Das 
<ak...@sigmoidanalytics.com<mailto:ak...@sigmoidanalytics.com>> wrote:
You can use the datastax's Cassandra 
connector.<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md>

Thanks
Best Regards

On Thu, Dec 4, 2014 at 8:21 PM, 
<m.sar...@accenture.com<mailto:m.sar...@accenture.com>> wrote:

Hi,


I have written the code below which is streaming data from kafka, and printing 
to the console.

I want to extend this, and want my data to go into Cassandra table instead.


JavaStreamingContext jssc = new JavaStreamingContext("local[4]", "SparkStream", 
new Duration(1000));
JavaPairReceiverInputDStream<String, String> messages = 
KafkaUtils.createStream(jssc, args[0], args[1], topicMap );

System.out.println("Connection done!");
JavaDStream<String> data = messages.map(new Function<Tuple2<String, String>, 
String>()
{
public String call(Tuple2<String, String> message)
{
return message._2();
}
}
);
//data.print();   --> output to console
data.foreachRDD(saveToCassandra("mykeyspace","mytable"));
jssc.start();
jssc.awaitTermination();



How should I implement the line:

data.foreachRDD(saveToCassandra("mykeyspace","mytable"));?

so that data goes into Cassandra, in each batch.  And how do I specify a batch, 
because if i do Ctrl+C on the console of streaming-job-jar, nothing will be 
entered into cassandra for sure since it is getting killed.


Please help.


Thanks and Regards,

Md. Aiman Sarosh.
Accenture Services Pvt. Ltd.
Mob #:  (+91) - 9836112841<tel:%28%2B91%29%20-%209836112841>.

________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: Spark-Streaming: output to cassandra

Reply via email to