Hey Cody, What kind of problems exactly? .......data rate in kafka topics do vary significantly in my case............out of total 50 topics(with 3 partitions each),half of the topics generate data at very high speed say 1lakh/sec while other half generate at very low rate say 1k/sec....... i have to process them together and insert into the same database table.......will it be better to have 2 different spark streaming applications instead? I dont have control over kafka topics and partitions, they are a central system used by many other systems as well.
Regards, Chandan On Tue, May 10, 2016 at 8:01 PM, Cody Koeninger <c...@koeninger.org> wrote: > maxRate is not used by the direct stream. > > Significant skew in rate across different partitions for the same > topic is going to cause you all kinds of problems, not just with spark > streaming. > > You can turn on backpressure, but you're better off addressing the > underlying issue if you can. > > On Tue, May 10, 2016 at 8:08 AM, Soumitra Siddharth Johri > <soumitra.siddha...@gmail.com> wrote: > > Also look at back pressure enabled. Both of these can be used to limit > the > > rate > > > > Sent from my iPhone > > > > On May 10, 2016, at 8:02 AM, chandan prakash <chandanbaran...@gmail.com> > > wrote: > > > > Hi, > > I am using Spark Streaming with Direct kafka approach. > > Want to limit number of event records coming in my batches. > > Have question regarding following 2 parameters : > > 1. spark.streaming.receiver.maxRate > > 2. spark.streaming.kafka.maxRatePerPartition > > > > > > The documentation > > ( > http://spark.apache.org/docs/latest/streaming-programming-guide.html#deploying-applications > > ) says ..... > > " spark.streaming.receiver.maxRate for receivers and > > spark.streaming.kafka.maxRatePerPartition for Direct Kafka approach " > > > > Does it mean that spark.streaming.receiver.maxRate is valid only for > > Receiver based approach only ? (not the DirectKafkaApproach as well) > > > > If yes, then how do we control total number of records/sec in DirectKafka > > ?.....because spark.streaming.kafka.maxRatePerPartition only controls > max > > rate per partition and not whole records . There might be many > partitions > > some with very fast rate and some with very slow rate. > > > > Regards, > > Chandan > > > > > > > > -- > > Chandan Prakash > > > -- Chandan Prakash