Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread Cody Koeninger
Pretty much the same problems you'd expect any time you have skew in a distributed system - some leaders are going to be working harder than others & have more disk space used, some consumers are going to be working harder than others. It sounds like you're talking about differences in topics,

Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread chandan prakash
Hey Cody, What kind of problems exactly? ...data rate in kafka topics do vary significantly in my caseout of total 50 topics(with 3 partitions each),half of the topics generate data at very high speed say 1lakh/sec while other half generate at very low rate say 1k/sec... i have

Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread Soumitra Johri
I think a better partitioning scheme can help u too. On Tue, May 10, 2016 at 10:31 AM Cody Koeninger wrote: > maxRate is not used by the direct stream. > > Significant skew in rate across different partitions for the same > topic is going to cause you all kinds of problems,

Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread Cody Koeninger
maxRate is not used by the direct stream. Significant skew in rate across different partitions for the same topic is going to cause you all kinds of problems, not just with spark streaming. You can turn on backpressure, but you're better off addressing the underlying issue if you can. On Tue,

Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread Soumitra Siddharth Johri
Also look at back pressure enabled. Both of these can be used to limit the rate Sent from my iPhone > On May 10, 2016, at 8:02 AM, chandan prakash > wrote: > > Hi, > I am using Spark Streaming with Direct kafka approach. > Want to limit number of event records

Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread chandan prakash
Hi, I am using Spark Streaming with Direct kafka approach. Want to limit number of event records coming in my batches. Have question regarding following 2 parameters : 1. spark.streaming.receiver.maxRate 2. spark.streaming.kafka.maxRatePerPartition The documentation (