Hello, Please could you explain me what is exactly distributed when I launch a spark streaming job over YARN cluster ? My code is something like :
JavaDStream<JMSEvent> customReceiverStream = ssc.receiverStream(streamConfig.getJmsReceiver()); JavaDStream<String> incoming_msg = customReceiverStream.map( new Function<JMSEvent, String>() { public String call(JMSEvent jmsEvent) { return jmsEvent.getText(); } } ); incoming_msg.foreachRDD( new Function<JavaRDD<String>, Void>() { public Void call(JavaRDD<String> rdd) throws Exception { rdd.foreachPartition(new VoidFunction<Iterator<String>>() { @Override public void call(Iterator<String> msg) throws Exception { while (msg.hasNext()) { // insert message in MongoDB } .... So, in this code , at what step is done the distribution over YARN : - Does my receiver is distributed (and so all the rest also) ? - Does the foreachRDD is distributed (and so all the rest also)? - Does foreachPartition is distributed ? Tks Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org