Maybe reducing the batch duration would help :\

2014-07-01 17:57 GMT+01:00 Chen Song <chen.song...@gmail.com>:

> In my use case, if I need to stop spark streaming for a while, data would
> accumulate a lot on kafka topic-partitions. After I restart spark streaming
> job, the worker's heap will go out of memory on the fetch of the 1st batch.
>
> I am wondering if
>
> * Is there a way to throttle reading from kafka in spark streaming jobs?
> * Is there a way to control how far Kafka Dstream can read on
> topic-partition (via offset for example). By setting this to a small
> number, it will force DStream to read less data initially.
> * Is there a way to limit the consumption rate at Kafka side? (This one is
> not actually for spark streaming and doesn't seem to be question in this
> group. But I am raising it anyway here.)
>
> I have looked at code example below but doesn't seem it is supported.
>
> KafkaUtils.createStream ...
> Thanks, All
> --
> Chen Song
>
>

Reply via email to