Maybe reducing the batch duration would help :\
2014-07-01 17:57 GMT+01:00 Chen Song <chen.song...@gmail.com>: > In my use case, if I need to stop spark streaming for a while, data would > accumulate a lot on kafka topic-partitions. After I restart spark streaming > job, the worker's heap will go out of memory on the fetch of the 1st batch. > > I am wondering if > > * Is there a way to throttle reading from kafka in spark streaming jobs? > * Is there a way to control how far Kafka Dstream can read on > topic-partition (via offset for example). By setting this to a small > number, it will force DStream to read less data initially. > * Is there a way to limit the consumption rate at Kafka side? (This one is > not actually for spark streaming and doesn't seem to be question in this > group. But I am raising it anyway here.) > > I have looked at code example below but doesn't seem it is supported. > > KafkaUtils.createStream ... > Thanks, All > -- > Chen Song > >