Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

swetha kasireddy Mon, 28 Aug 2017 11:27:55 -0700

There is no difference in performance even with Cache being disabled.

On Mon, Aug 28, 2017 at 7:43 AM, Cody Koeninger <c...@koeninger.org> wrote:


> So if you can run with cache enabled for some time, does that
> significantly affect the performance issue you were seeing?
>
> Those settings seem reasonable enough.   If preferred locations is
> behaving correctly you shouldn't need cached consumers for all 96
> partitions on any one executor, so that maxCapacity setting is
> probably unnecessary.
>
> On Fri, Aug 25, 2017 at 7:04 PM, swetha kasireddy
> <swethakasire...@gmail.com> wrote:
> > Because I saw some posts that say that consumer cache  enabled will have
> > concurrentModification exception with reduceByKeyAndWIndow. I see those
> > errors as well after running for sometime with cache being enabled. So, I
> > had to disable it. Please see the tickets below.  We have 96 partitions.
> So
> > if I enable cache, would teh following settings help to improve
> performance?
> >
> > "spark.streaming.kafka.consumer.cache.maxCapacity" ->
> Integer.valueOf(96),
> > "spark.streaming.kafka.consumer.cache.maxCapacity" ->
> Integer.valueOf(96),
> >
> > "spark.streaming.kafka.consumer.poll.ms" -> Integer.valueOf(1024),
> >
> >
> > http://markmail.org/message/n4cdxwurlhf44q5x
> >
> > https://issues.apache.org/jira/browse/SPARK-19185
> >
> > On Fri, Aug 25, 2017 at 12:28 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>
> >> Why are you setting consumer.cache.enabled to false?
> >>
> >> On Fri, Aug 25, 2017 at 2:19 PM, SRK <swethakasire...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > What would be the appropriate settings to run Spark with Kafka 10? My
> >> > job
> >> > works fine with Spark with Kafka 8 and with Kafka 8 cluster. But its
> >> > very
> >> > slow with Kafka 10 by using Kafka Direct' experimental APIs for Kafka
> 10
> >> > . I
> >> > see the following error sometimes . Please see the kafka parameters
> and
> >> > the
> >> > consumer strategy for creating the stream below. Any suggestions on
> how
> >> > to
> >> > run this with better performance would be of great help.
> >> >
> >> > java.lang.AssertionError: assertion failed: Failed to get records for
> >> > test
> >> > stream1 72 324027964 after polling for 120000
> >> >
> >> > val kafkaParams = Map[String, Object](
> >> >       "bootstrap.servers" -> kafkaBrokers,
> >> >       "key.deserializer" -> classOf[StringDeserializer],
> >> >       "value.deserializer" -> classOf[StringDeserializer],
> >> >       "auto.offset.reset" -> "latest",
> >> >       "heartbeat.interval.ms" -> Integer.valueOf(20000),
> >> >       "session.timeout.ms" -> Integer.valueOf(60000),
> >> >       "request.timeout.ms" -> Integer.valueOf(90000),
> >> >       "enable.auto.commit" -> (false: java.lang.Boolean),
> >> >       "spark.streaming.kafka.consumer.cache.enabled" -> "false",
> >> >       "group.id" -> "test1"
> >> >     )
> >> >
> >> >       val hubbleStream = KafkaUtils.createDirectStream[String,
> String](
> >> >         ssc,
> >> >         LocationStrategies.PreferConsistent,
> >> >         ConsumerStrategies.Subscribe[String, String](topicsSet,
> >> > kafkaParams)
> >> >       )
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> > http://apache-spark-user-list.1001560.n3.nabble.com/Slower-
> performance-while-running-Spark-Kafka-Direct-Streaming-
> with-Kafka-10-cluster-tp29108.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> >
> >
> >
>

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

Reply via email to