The jobs depend on the number of output operations (print, foreachRDD, saveAs*Files) and the number of RDD actions in those output operations.
For example: dstream1.foreachRDD { rdd => rdd.count } // ONE Spark job per batch dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark jobs per batch dstream1.foreachRDD { rdd => rdd.count } ; dstream2.foreachRDD { rdd => rdd.count } // TWO Spark jobs per batch Regards, Yogesh Mahajan SnappyData Inc (snappydata.io) On Thu, Jan 28, 2016 at 4:30 PM, Sachin Aggarwal <different.sac...@gmail.com > wrote: > Hi > > I am executing a streaming wordcount with kafka > with one test topic with 2 partition > my cluster have three spark executors > > Each batch is of 10 sec > > for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark > UI , as shown below below > > my questions:- > 1) As label says jobId for first column, does spark submits 3 jobs for > each batch ? > 2) I tried decreasing executers/nodes the job count is also getting > changed what is the relation with no of executors? > 3) only one job actually executes the stage rest two shows skipped why > other jobs got created? > > Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all > stages): Succeeded/Total > 221 Streaming job from [output operation 0, batch time 02:51:00] print at > StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped) > 1/1 (3 skipped) > 220 Streaming job from [output operation 0, batch time 02:51:00] print at > StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped) > 4/4 (3 skipped) > 219 Streaming job from [output operation 0, batch time 02:51:00] print at > StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2 > 4/4 > > -- > > Thanks & Regards > > Sachin Aggarwal > 7760502772 >