Forgot to mention, it works when using .foreachRDD(_.saveAsTextFile(“”)).
> On 11.03.2015, at 18:35, Marius Soutier <mps....@gmail.com> wrote: > > Hi, > > I’ve written a Spark Streaming Job that inserts into a Parquet, using > stream.foreachRDD(_insertInto(“table”, overwrite = true). Now I’ve added > checkpointing; everything works fine when starting from scratch. When > starting from a checkpoint however, the job doesn’t work and produces the > following exception in the foreachRDD: > > ERROR org.apache.spark.streaming.scheduler.JobScheduler: Error running job > streaming job 1426093830000 ms.2 > org.apache.spark.SparkException: RDD transformations and actions can only be > invoked by the driver, not inside of other transformations; for example, > rdd1.map(x => rdd2.values.count() * x) is invalid because the values > transformation and count action cannot be performed inside of the rdd1.map > transformation. For more information, see SPARK-5063. > at org.apache.spark.rdd.RDD.sc(RDD.scala:90) > at org.apache.spark.rdd.RDD.<init>(RDD.scala:143) > at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108) > at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:114) > at MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167) > at MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > > > > > Cheers > - Marius > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org