Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala 2.10 On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote:
> I am PhD student at Ohio State working on a study to evaluate streaming > frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench > benchmarks. But I think I am having a problem with Spark. I have Spark > Streaming application which I am trying to run on a 5 node cluster > (including master). I have 2 zookeeper and 4 kafka brokers. However, > whenever I run a Spark Streaming application I encounter the following > error: > > java.lang.IllegalArgumentException: requirement failed: numRecords must not > be negative > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38) > at > org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333) > at scala.Option.orElse(Option.scala:289) > > The application starts fine, but as soon as the Kafka producers start > emitting the stream data I start receiving the aforementioned error > repeatedly. > > I have tried removing Spark Streaming checkpointing files as has been > suggested in similar posts on the internet. However, the problem persists > even if I start a Kafka topic and its corresponding consumer Spark > Streaming application for the first time. Also the problem could not be > offset related as I start the topic for the first time. > Although the application seems to be processing the stream properly as I > can see by the benchmark numbers generated. However, the numbers are way of > from what I got for Storm and Flink, suspecting me to believe that there is > something wrong with the pipeline and Spark is not able to process the > stream as cleanly as it should. Any help in this regard would be really > appreciated. >