Forgot to mention, it works when using .foreachRDD(_.saveAsTextFile(“”)).

> On 11.03.2015, at 18:35, Marius Soutier <mps....@gmail.com> wrote:
> 
> Hi,
> 
> I’ve written a Spark Streaming Job that inserts into a Parquet, using 
> stream.foreachRDD(_insertInto(“table”, overwrite = true). Now I’ve added 
> checkpointing; everything works fine when starting from scratch. When 
> starting from a checkpoint however, the job doesn’t work and produces the 
> following exception in the foreachRDD:
> 
> ERROR org.apache.spark.streaming.scheduler.JobScheduler: Error running job 
> streaming job 1426093830000 ms.2
> org.apache.spark.SparkException: RDD transformations and actions can only be 
> invoked by the driver, not inside of other transformations; for example, 
> rdd1.map(x => rdd2.values.count() * x) is invalid because the values 
> transformation and count action cannot be performed inside of the rdd1.map 
> transformation. For more information, see SPARK-5063.
>       at org.apache.spark.rdd.RDD.sc(RDD.scala:90)
>       at org.apache.spark.rdd.RDD.<init>(RDD.scala:143)
>       at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
>       at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:114)
>       at MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167)
>       at MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167)
>       at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535)
>       at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535)
>       at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
>       at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>       at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> 
> 
> 
> 
> Cheers
> - Marius
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to