Thank you Silvio for the update. On Sat, Dec 26, 2015 at 1:14 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote:
> Skipped stages result from existing shuffle output of a stage when > re-running a transformation. The executors will have the output of the > stage in their local dirs and Spark recognizes that, so rather than > re-computing, it will start from the following stage. So, this is a good > thing in that you’re not re-computing a stage. In your case, it looks like > there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t > re-compute it. > > From: Prem Spark <sparksure...@gmail.com> > Date: Friday, December 25, 2015 at 11:41 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: why one of Stage is into Skipped section instead of Completed > > > Whats does the below Skipped Stage means. can anyone help in clarifying? > I was expecting 3 stages to get Succeeded but only 2 of them getting > completed while one is skipped. > Status: SUCCEEDED > Completed Stages: 2 > Skipped Stages: 1 > > Scala REPL Code Used: > > accounts is a basic RDD contains weblog text data. > > var accountsByID = accounts. > > map(line => line.split(',')). > > map(values => (values(0),values(4)+','+values(3))); > > var userreqs = sc. > > textFile("/loudacre/weblogs/*6"). > > map(line => line.split(' ')). > > map(words => (words(2),1)). > > reduceByKey((v1,v2) => v1 + v2); > > var accounthits = > > accountsByID.join(userreqs).map(pair => pair._2) > > accounthits. > > saveAsTextFile("/loudacre/userreqs") > > scala> accounthits.toDebugString > res15: String = > (32) MapPartitionsRDD[24] at map at <console>:28 [] > | MapPartitionsRDD[23] at join at <console>:28 [] > | MapPartitionsRDD[22] at join at <console>:28 [] > | CoGroupedRDD[21] at join at <console>:28 [] > +-(15) MapPartitionsRDD[15] at map at <console>:25 [] > | | MapPartitionsRDD[14] at map at <console>:24 [] > | | /loudacre/accounts/* MapPartitionsRDD[13] at textFile at > <console>:21 [] > | | /loudacre/accounts/* HadoopRDD[12] at textFile at <console>:21 [] > | ShuffledRDD[20] at reduceByKey at <console>:25 [] > +-(32) MapPartitionsRDD[19] at map at <console>:24 [] > | MapPartitionsRDD[18] at map at <console>:23 [] > | /loudacre/weblogs/*6 MapPartitionsRDD[17] at textFile at > <console>:22 [] > | /loudacre/weblogs/*6 HadoopRDD[16] at textFile at <con... > > > > > > > >