Kam, Give it some time and think it's getting better as a real possibility for Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow for node groups/label that can work with locality and secondarily new functionality in HDFS that depending on the use-case can be very interesting with in-memory files. -Steve
On Wed, Jul 23, 2014 at 4:44 PM, Kam Kasravi <kamkasr...@yahoo.com.invalid> wrote: > Thanks Joe for the input related to Mesos as well as acknowledging the > need for YARN to support this type of cluster allocation - long running > services with node locality priority. > > Thanks Jay - That's an interesting fact that I wasn't aware of - though I > imagine there could possibly be a long latency for the replica data to be > transferred to the new broker (depending on #/size of partitions). It does > open up some possibilities to restart brokers on app master restart using > different containers (as well as some complications if an old container > with old data were reallocated on restart). I had used zookeeper to store > broker locations so the app master on restart would look for this > information and attempt to reallocate containers on these nodes. All this > said, would this be part of kafka or some other framework? I can see kafka > benefitting from this at the same time kafka's appeal IMO is it's > simplicity. Spark has chosen to include YARN within its distribution, not > sure what the kafka team thinks. > > > > On Wednesday, July 23, 2014 4:19 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > Hey Kam, > > It would be nice to have a way to get a failed node back with it's > original data, but this isn't strictly necessary, it is just a good > optimization. As long as you run with replication you can restart a > broker elsewhere with no data, and it will restore it's state off the > other replicas. > > -Jay > > > On Wed, Jul 23, 2014 at 3:47 PM, Kam Kasravi > <kamkasr...@yahoo.com.invalid> wrote: > > Hi > > > > Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a > particular resource since the broker needs to always use its local data. > YARN doesn't do this well, unless you provide (override) the default > scheduler (CapacityScheduler or FairScheduler). SequenceIO did something > along these lines for a different use case. Unfortunately replacing the > scheduler is a global operation which would affect all App masters. > Additionally one could argue that the broker should be run as an OS service > and auto restarted on failure if necessary. Slider (incubating) did some of > this groundwork but YARN still has lots of limitations in providing > guarantees to consistently allocate a container on a particular node > especially on appmaster restart (eg ResourceManager dies). That said, it > might be worthwhile to enumerate all of this here with some possible > solutions. If there is interest I could certainly list the relevant JIRA's > along with some additional JIRA's > > required IMO. > > > > Thanks > > Kam > > > > > > On Wednesday, July 23, 2014 2:37 PM, "hsy...@gmail.com" < > hsy...@gmail.com> wrote: > > > > > > > > Hi guys, > > > > Kafka is getting more and more popular and in most cases people run kafka > > as long-term service in the cluster. Is there a discussion of running > kafka > > on yarn cluster which we can utilize the convenient > configuration/resource > > management and HA. I think there is a big potential and requirement for > > that. > > I found a project https://github.com/kkasravi/kafka-yarn. But is there a > > official roadmap/plan for this? > > > > Thank you very much! > > > > Best, > > Siyuan >