Re: Kafka on yarn

Steve Morin Wed, 23 Jul 2014 17:16:29 -0700

Kam,
  Give it some time and think it's getting better as a real possibility for
Kafka on Yarn.  There are new capabilities coming out in Yarn/HDFS to allow
for node groups/label that can work with locality and secondarily new
functionality in HDFS that depending on the use-case can be very
interesting with in-memory files.
-Steve



On Wed, Jul 23, 2014 at 4:44 PM, Kam Kasravi <kamkasr...@yahoo.com.invalid>
wrote:

> Thanks Joe for the input related to Mesos as well as acknowledging the
> need for YARN to support this type of cluster allocation - long running
> services with node locality priority.
>
> Thanks Jay - That's an interesting fact that I wasn't aware of - though I
> imagine there could possibly be a long latency for the replica data to be
> transferred to the new broker (depending on #/size of partitions). It does
> open up some possibilities to restart brokers on app master restart using
> different containers  (as well as some complications if an old container
> with old data were reallocated on restart). I had used zookeeper to store
> broker locations so the app master on restart would look for this
> information and attempt to reallocate containers on these nodes.  All this
> said, would this be part of kafka or some other framework? I can see kafka
> benefitting from this at the same time kafka's appeal IMO is it's
> simplicity. Spark has chosen to include YARN within its distribution, not
> sure what the kafka team thinks.
>
>
>
> On Wednesday, July 23, 2014 4:19 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
>
>
>
> Hey Kam,
>
> It would be nice to have a way to get a failed node back with it's
> original data, but this isn't strictly necessary, it is just a good
> optimization. As long as you run with replication you can restart a
> broker elsewhere with no data, and it will restore it's state off the
> other replicas.
>
> -Jay
>
>
> On Wed, Jul 23, 2014 at 3:47 PM, Kam Kasravi
> <kamkasr...@yahoo.com.invalid> wrote:
> > Hi
> >
> > Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a
> particular resource since the broker needs to always use its local data.
> YARN doesn't do this well, unless you provide (override) the default
> scheduler (CapacityScheduler or FairScheduler). SequenceIO did something
> along these lines for a different use case. Unfortunately replacing the
> scheduler is a global operation which would affect all App masters.
> Additionally one could argue that the broker should be run as an OS service
> and auto restarted on failure if necessary. Slider (incubating) did some of
> this groundwork but YARN still has lots of limitations in providing
> guarantees to consistently allocate a container on a particular node
> especially on appmaster restart (eg ResourceManager dies). That said, it
> might be worthwhile to enumerate all of this here with some possible
> solutions. If there is interest I could certainly list the relevant JIRA's
> along with some additional JIRA's
> >  required IMO.
> >
> > Thanks
> > Kam
> >
> >
> > On Wednesday, July 23, 2014 2:37 PM, "hsy...@gmail.com" <
> hsy...@gmail.com> wrote:
> >
> >
> >
> > Hi guys,
> >
> > Kafka is getting more and more popular and in most cases people run kafka
> > as long-term service in the cluster. Is there a discussion of running
> kafka
> > on yarn cluster which we can utilize the convenient
> configuration/resource
> > management and HA.  I think there is a big potential and requirement for
> > that.
> > I found a project https://github.com/kkasravi/kafka-yarn. But is there a
> > official roadmap/plan for this?
> >
> > Thank you very much!
> >
> > Best,
> > Siyuan
>

Re: Kafka on yarn

Reply via email to