Hello, 
Please could you explain me what is exactly distributed when I launch a spark 
streaming job over YARN cluster ?
My code is something like :

JavaDStream<JMSEvent> customReceiverStream = 
ssc.receiverStream(streamConfig.getJmsReceiver());

JavaDStream<String> incoming_msg = customReceiverStream.map(
                                new Function<JMSEvent, String>()
                                {
                                        public String call(JMSEvent jmsEvent)
                                        {
                                                return jmsEvent.getText();
                                        }
                                }
                                );

incoming_msg.foreachRDD( new Function<JavaRDD<String>,  Void>() {
        public Void call(JavaRDD<String> rdd) throws Exception {
                rdd.foreachPartition(new VoidFunction<Iterator<String>>() {     
                                        
                        @Override
                        public void call(Iterator<String> msg) throws Exception 
{
                                while (msg.hasNext()) {
                                   // insert message in MongoDB
                                }
....

So, in this code , at what step is done the distribution over YARN :
- Does my receiver is distributed (and so all the rest also) ?
- Does the foreachRDD is distributed (and so all the rest also)?
- Does foreachPartition is distributed ?

Tks
Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to