Re: Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-04 Thread Emre Sevinc
I've also tried the following: Configuration hadoopConfiguration = new Configuration(); hadoopConfiguration.set(multilinejsoninputformat.member, itemSet); JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDirectory, hadoopConfiguration, factory, false); but I

Re: Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-04 Thread Tathagata Das
That could be a corner case bug. How do you add the 3rd party library to the class path of the driver? Through spark-submit? Could you give the command you used? TD On Wed, Mar 4, 2015 at 12:42 AM, Emre Sevinc emre.sev...@gmail.com wrote: I've also tried the following: Configuration

Re: Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-04 Thread Emre Sevinc
I'm adding this 3rd party library to my Maven pom.xml file so that it's embedded into the JAR I send to spark-submit: dependency groupIdjson-mapreduce/groupId artifactIdjson-mapreduce/artifactId version1.0-SNAPSHOT/version exclusions exclusion

Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-03 Thread Emre Sevinc
Hello, I have a Spark Streaming application (that uses Spark 1.2.1) that listens to an input directory, and when new JSON files are copied to that directory processes them, and writes them to an output directory. It uses a 3rd party library to process the multi-line JSON files (