Pretty much the same problems you'd expect any time you have skew in a
distributed system - some leaders are going to be working harder than
others & have more disk space used, some consumers are going to be
working harder than others.
It sounds like you're talking about differences in topics, not
Hey Cody,
What kind of problems exactly?
...data rate in kafka topics do vary significantly in my
caseout of total 50 topics(with 3 partitions each),half of the
topics generate data at very high speed say 1lakh/sec while other half
generate at very low rate say 1k/sec...
i have
I think a better partitioning scheme can help u too.
On Tue, May 10, 2016 at 10:31 AM Cody Koeninger wrote:
> maxRate is not used by the direct stream.
>
> Significant skew in rate across different partitions for the same
> topic is going to cause you all kinds of problems, not just with spark
>
maxRate is not used by the direct stream.
Significant skew in rate across different partitions for the same
topic is going to cause you all kinds of problems, not just with spark
streaming.
You can turn on backpressure, but you're better off addressing the
underlying issue if you can.
On Tue, Ma
Also look at back pressure enabled. Both of these can be used to limit the rate
Sent from my iPhone
> On May 10, 2016, at 8:02 AM, chandan prakash
> wrote:
>
> Hi,
> I am using Spark Streaming with Direct kafka approach.
> Want to limit number of event records coming in my batches.
> Have ques