Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

swetha kasireddy Mon, 28 Aug 2017 11:57:10 -0700

There is no difference in performance even with Cache being enabled.

On Mon, Aug 28, 2017 at 11:27 AM, swetha kasireddy <
swethakasire...@gmail.com> wrote:


> There is no difference in performance even with Cache being disabled.
>
> On Mon, Aug 28, 2017 at 7:43 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> So if you can run with cache enabled for some time, does that
>> significantly affect the performance issue you were seeing?
>>
>> Those settings seem reasonable enough.   If preferred locations is
>> behaving correctly you shouldn't need cached consumers for all 96
>> partitions on any one executor, so that maxCapacity setting is
>> probably unnecessary.
>>
>> On Fri, Aug 25, 2017 at 7:04 PM, swetha kasireddy
>> <swethakasire...@gmail.com> wrote:
>> > Because I saw some posts that say that consumer cache  enabled will have
>> > concurrentModification exception with reduceByKeyAndWIndow. I see those
>> > errors as well after running for sometime with cache being enabled. So,
>> I
>> > had to disable it. Please see the tickets below.  We have 96
>> partitions. So
>> > if I enable cache, would teh following settings help to improve
>> performance?
>> >
>> > "spark.streaming.kafka.consumer.cache.maxCapacity" ->
>> Integer.valueOf(96),
>> > "spark.streaming.kafka.consumer.cache.maxCapacity" ->
>> Integer.valueOf(96),
>> >
>> > "spark.streaming.kafka.consumer.poll.ms" -> Integer.valueOf(1024),
>> >
>> >
>> > http://markmail.org/message/n4cdxwurlhf44q5x
>> >
>> > https://issues.apache.org/jira/browse/SPARK-19185
>> >
>> > On Fri, Aug 25, 2017 at 12:28 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>> >>
>> >> Why are you setting consumer.cache.enabled to false?
>> >>
>> >> On Fri, Aug 25, 2017 at 2:19 PM, SRK <swethakasire...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > What would be the appropriate settings to run Spark with Kafka 10? My
>> >> > job
>> >> > works fine with Spark with Kafka 8 and with Kafka 8 cluster. But its
>> >> > very
>> >> > slow with Kafka 10 by using Kafka Direct' experimental APIs for
>> Kafka 10
>> >> > . I
>> >> > see the following error sometimes . Please see the kafka parameters
>> and
>> >> > the
>> >> > consumer strategy for creating the stream below. Any suggestions on
>> how
>> >> > to
>> >> > run this with better performance would be of great help.
>> >> >
>> >> > java.lang.AssertionError: assertion failed: Failed to get records for
>> >> > test
>> >> > stream1 72 324027964 after polling for 120000
>> >> >
>> >> > val kafkaParams = Map[String, Object](
>> >> >       "bootstrap.servers" -> kafkaBrokers,
>> >> >       "key.deserializer" -> classOf[StringDeserializer],
>> >> >       "value.deserializer" -> classOf[StringDeserializer],
>> >> >       "auto.offset.reset" -> "latest",
>> >> >       "heartbeat.interval.ms" -> Integer.valueOf(20000),
>> >> >       "session.timeout.ms" -> Integer.valueOf(60000),
>> >> >       "request.timeout.ms" -> Integer.valueOf(90000),
>> >> >       "enable.auto.commit" -> (false: java.lang.Boolean),
>> >> >       "spark.streaming.kafka.consumer.cache.enabled" -> "false",
>> >> >       "group.id" -> "test1"
>> >> >     )
>> >> >
>> >> >       val hubbleStream = KafkaUtils.createDirectStream[String,
>> String](
>> >> >         ssc,
>> >> >         LocationStrategies.PreferConsistent,
>> >> >         ConsumerStrategies.Subscribe[String, String](topicsSet,
>> >> > kafkaParams)
>> >> >       )
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> > http://apache-spark-user-list.1001560.n3.nabble.com/Slower-p
>> erformance-while-running-Spark-Kafka-Direct-Streaming-with-
>> Kafka-10-cluster-tp29108.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> > ------------------------------------------------------------
>> ---------
>> >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >> >
>> >
>> >
>>
>
>

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

Reply via email to