Re: Spark Streaming checkpoint recovery failure for HBaseContext

Sean Busbey Fri, 27 Jan 2017 07:25:37 -0800

HBaseContext makes use of broadcast variables, so to use it with
checkpointing in a spark streaming application you have to wrap it in
a singleton that can be recreated on restore from the checkpoint.


I have an example of this I was working on but have not finished for
the hbase-downstreamer project:

https://github.com/busbey/hbase-downstreamer/blob/hbase-spark-example/src/main/scala/org/hbase/downstreamer/spark/RecoverableNetworkWordCountHBase.scala



On Fri, Jan 27, 2017 at 12:23 AM, SivaRam Bollineni
<shivaram.bollin...@gmail.com> wrote:
> hi,
>
> First of all, Thanks a ton for HBaseContext API, it's very useful and
> performant for our use-cases. We use HBaseContext in Spark Streaming with
> checkpoint. We have a problem in recovering from checkpoint. Below is the
> *NullPointerException* stack trace. I am sure it's because SparkContext
> object is passed as null to HBaseContext. SparkContext is not serializable
> and without SparkContext we can't create HBaseContext. Please help me in
> resolving this issue.
>
> java.lang.NullPointerException
>
>         at
> com.cloudera.spark.hbase.HBaseContext.<init>(HBaseContext.scala:69)
>
>         at
> com.abc.bbc.xyz.MySparkStreaming$$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:118)
>
>         at com.abc.bbc.xyz. MySparkStreaming
> $$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:94)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>
>         at scala.util.Try$.apply(Try.scala:161)
>
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:206)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> Thanks !
> Siva

Re: Spark Streaming checkpoint recovery failure for HBaseContext

Reply via email to