On 28 Mar 2014, at 01:32, Tathagata Das <tathagata.das1...@gmail.com> wrote:
> Yes, no one has reported this issue before. I just opened a JIRA on what I > think is the main problem here > https://spark-project.atlassian.net/browse/SPARK-1340 > Some of the receivers dont get restarted. > I have a bunch refactoring in the NetworkReceiver ready to be posted as a PR > that should fix this. > > Regarding the second problem, I have been thinking of adding flow control > (i.e. limiting the rate of receiving) for a while, just havent gotten around > to it. > I added another JIRA for that for tracking this issue. > https://spark-project.atlassian.net/browse/SPARK-1341 > > Thank you, i will participate and can provide testing of new code. Sorry for capslock, i just debugged this whole day, literally. > TD > > > On Thu, Mar 27, 2014 at 3:23 PM, Evgeny Shishkin <itparan...@gmail.com> wrote: > > On 28 Mar 2014, at 01:11, Scott Clasen <scott.cla...@gmail.com> wrote: > > > Evgeniy Shishkin wrote > >> So, at the bottom — kafka input stream just does not work. > > > > > > That was the conclusion I was coming to as well. Are there open tickets > > around fixing this up? > > > > I am not aware of such. Actually nobody complained on spark+kafka before. > So i thought it just works, and then we tried to build something on it and > almost failed. > > I think that it is possible to steal/replicate how twitter storm works with > kafka. > They do manual partition assignment, at least this would help to balance load. > > There is another issue. > ssc batch creates new rdds every batch duration, always, even it previous > computation did not finish. > > But with kafka, we can consume more rdds later, after we finish previous rdds. > That way it would be much much simpler to not get OOM’ed when starting from > beginning, > because we can consume many data from kafka during batch duration and then > get oom. > > But we just can not start slow, can not limit how many to consume during > batch. > > > > > > -- > > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3379.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >