Re: KafkaInputDStream mapping of partitions to tasks

Evgeny Shishkin Thu, 27 Mar 2014 15:39:13 -0700

On 28 Mar 2014, at 01:32, Tathagata Das <tathagata.das1...@gmail.com> wrote:


> Yes, no one has reported this issue before. I just opened a JIRA on what I 
> think is the main problem here
> https://spark-project.atlassian.net/browse/SPARK-1340
> Some of the receivers dont get restarted. 
> I have a bunch refactoring in the NetworkReceiver ready to be posted as a PR 
> that should fix this. 
> 
> Regarding the second problem, I have been thinking of adding flow control 
> (i.e. limiting the rate of receiving) for a while, just havent gotten around 
> to it. 
> I added another JIRA for that for tracking this issue.
> https://spark-project.atlassian.net/browse/SPARK-1341
> 
> 

Thank you, i will participate and can provide testing of new code.
Sorry for capslock, i just debugged this whole day, literally. 


> TD
> 
> 
> On Thu, Mar 27, 2014 at 3:23 PM, Evgeny Shishkin <itparan...@gmail.com> wrote:
> 
> On 28 Mar 2014, at 01:11, Scott Clasen <scott.cla...@gmail.com> wrote:
> 
> > Evgeniy Shishkin wrote
> >> So, at the bottom — kafka input stream just does not work.
> >
> >
> > That was the conclusion I was coming to as well.  Are there open tickets
> > around fixing this up?
> >
> 
> I am not aware of such. Actually nobody complained on spark+kafka before.
> So i thought it just works, and then we tried to build something on it and 
> almost failed.
> 
> I think that it is possible to steal/replicate how twitter storm works with 
> kafka.
> They do manual partition assignment, at least this would help to balance load.
> 
> There is another issue.
> ssc batch creates new rdds every batch duration, always, even it previous 
> computation did not finish.
> 
> But with kafka, we can consume more rdds later, after we finish previous rdds.
> That way it would be much much simpler to not get OOM’ed when starting from 
> beginning,
> because we can consume many data from kafka during batch duration and then 
> get oom.
> 
> But we just can not start slow, can not limit how many to consume during 
> batch.
> 
> 
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
>

Re: KafkaInputDStream mapping of partitions to tasks

Reply via email to