HBaseContext makes use of broadcast variables, so to use it with checkpointing in a spark streaming application you have to wrap it in a singleton that can be recreated on restore from the checkpoint.
I have an example of this I was working on but have not finished for the hbase-downstreamer project: https://github.com/busbey/hbase-downstreamer/blob/hbase-spark-example/src/main/scala/org/hbase/downstreamer/spark/RecoverableNetworkWordCountHBase.scala On Fri, Jan 27, 2017 at 12:23 AM, SivaRam Bollineni <shivaram.bollin...@gmail.com> wrote: > hi, > > First of all, Thanks a ton for HBaseContext API, it's very useful and > performant for our use-cases. We use HBaseContext in Spark Streaming with > checkpoint. We have a problem in recovering from checkpoint. Below is the > *NullPointerException* stack trace. I am sure it's because SparkContext > object is passed as null to HBaseContext. SparkContext is not serializable > and without SparkContext we can't create HBaseContext. Please help me in > resolving this issue. > > java.lang.NullPointerException > > at > com.cloudera.spark.hbase.HBaseContext.<init>(HBaseContext.scala:69) > > at > com.abc.bbc.xyz.MySparkStreaming$$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:118) > > at com.abc.bbc.xyz. MySparkStreaming > $$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:94) > > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631) > > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > > at scala.util.Try$.apply(Try.scala:161) > > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207) > > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:206) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > > at java.lang.Thread.run(Unknown Source) > > Thanks ! > Siva