Update: I am using Spark 2.0.2 and  Kafka 0.8.2 with Scala 2.10

On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed <
11besemja...@seecs.edu.pk> wrote:

> I am PhD student at Ohio State working on a study to evaluate streaming
> frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench
> benchmarks. But I think I am having a problem  with Spark. I have Spark
> Streaming application which I am trying to run on a 5 node cluster
> (including master). I have 2 zookeeper and 4 kafka brokers. However,
> whenever I run a Spark Streaming application I encounter the following
> error:
>
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
>         at scala.Predef$.require(Predef.scala:224)
>         at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
>         at 
> org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>         at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>         at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>         at scala.Option.orElse(Option.scala:289)
>
> The application starts fine, but as soon as the Kafka producers start
> emitting the stream data I start receiving the aforementioned error
> repeatedly.
>
> I have tried removing Spark Streaming checkpointing files as has been
> suggested in similar posts on the internet. However, the problem persists
> even if I start a Kafka topic and its corresponding consumer Spark
> Streaming application for the first time. Also the problem could not be
> offset related as I start the topic for the first time.
> Although the application seems to be processing the stream properly as I
> can see by the benchmark numbers generated. However, the numbers are way of
> from what I got for Storm and Flink, suspecting me to believe that there is
> something wrong with the pipeline and Spark is not able to process the
> stream as cleanly as it should. Any help in this regard would be really
> appreciated.
>

Reply via email to