Re: spark streaming and the spark shell

Evgeny Shishkin Thu, 27 Mar 2014 15:29:31 -0700

On 28 Mar 2014, at 01:13, Tathagata Das <tathagata.das1...@gmail.com> wrote:


> Seems like the configuration of the Spark worker is not right. Either the 
> worker has not been given enough memory or the allocation of the memory to 
> the RDD storage needs to be fixed. If configured correctly, the Spark workers 
> should not get OOMs.


Yes, it is easy to start with latest offsets, get steady configuration and 
everything is nice.

Then your machine failes. And you stop receiving from kafka anything.

Then you notice this and restart your app hoping it would continue from offsets 
on zookeeper.
BUT NO
YOUR DEFAULT STREAM CONSUMERS JUST ERASED OFFSETS FROM ZOOKEEPER

After we fixed erasing offsets, we start from Some Offsets in the past.
And during batch duration we can’t limit how many messages we get from Kafka.
AND HERE WE OOM

And it's just a pain. Complete pain.

And you remember, only some machines consumes. Usually two or three. Because of 
broken high-level consumer in kafka.

Re: spark streaming and the spark shell

Reply via email to