Re: Spark Streaming recover from Checkpoint with Spark SQL

Marius Soutier Thu, 12 Mar 2015 02:34:18 -0700

Thanks, the new guide did help - instantiating the SQLContext inside foreachRDD 
did the trick for me, but the SQLContext singleton works as well.


Now the only problem left is that spark.driver.port is not retained after 
starting from a checkpoint, so my Actor receivers are running on a random 
port...


> On 12.03.2015, at 02:35, Tathagata Das <t...@databricks.com> wrote:
> 
> Can you show us the code that you are using?
> 
> This might help. This is the updated streaming programming guide for 1.3, 
> soon to be up, this is a quick preview. 
> http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#dataframe-and-sql-operations
>  
> <http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#dataframe-and-sql-operations>
> 
> TD
> 
> On Wed, Mar 11, 2015 at 12:20 PM, Marius Soutier <mps....@gmail.com 
> <mailto:mps....@gmail.com>> wrote:
> Forgot to mention, it works when using .foreachRDD(_.saveAsTextFile(“”)).
> 
> > On 11.03.2015, at 18:35, Marius Soutier <mps....@gmail.com 
> > <mailto:mps....@gmail.com>> wrote:
> >
> > Hi,
> >
> > I’ve written a Spark Streaming Job that inserts into a Parquet, using 
> > stream.foreachRDD(_insertInto(“table”, overwrite = true). Now I’ve added 
> > checkpointing; everything works fine when starting from scratch. When 
> > starting from a checkpoint however, the job doesn’t work and produces the 
> > following exception in the foreachRDD:
> >
> > ERROR org.apache.spark.streaming.scheduler.JobScheduler: Error running job 
> > streaming job 1426093830000 ms.2
> > org.apache.spark.SparkException: RDD transformations and actions can only 
> > be invoked by the driver, not inside of other transformations; for example, 
> > rdd1.map(x => rdd2.values.count() * x) is invalid because the values 
> > transformation and count action cannot be performed inside of the rdd1.map 
> > transformation. For more information, see SPARK-5063.
> >       at org.apache.spark.rdd.RDD.sc 
> > <http://org.apache.spark.rdd.rdd.sc/>(RDD.scala:90)
> >       at org.apache.spark.rdd.RDD.<init>(RDD.scala:143)
> >       at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
> >       at 
> > org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:114)
> >       at 
> > MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167)
> >       at 
> > MyStreamingJob$$anonfun$createComputation$3.apply(MyStreamingJob:167)
> >       at 
> > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535)
> >       at 
> > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:535)
> >       at 
> > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
> >       at 
> > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> >       at 
> > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> >
> >
> >
> >
> > Cheers
> > - Marius
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: Spark Streaming recover from Checkpoint with Spark SQL

Reply via email to