John, i believe that you are 100% correct. Theoretically we should run MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very complex and difficult to decouple from the resource manager/negotiator.
It's still something that could be done I guess but maybe as completely independent Hadoop-compatible map reduce framework for Mesos. You could write this from scratch with a custom framework inspired by the MRv2 app master implementation. On Jul 27, 2014 7:00 PM, "John Omernik" <j...@omernik.com> wrote: > So excuse my naivety in this space, but my ignorance has never really > stopped me from asking questions: > > I see YARN (Yet another resource negotiator) as very similar to Mesos. > I.e. something to manage resources on a cluster of machines. So when I hear > talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask > myself, what are we actually getting out of this setup? > > So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce > V2 like this: Map Reduce V2 is an application that runs on YARN. I.e. if > you run a job, it creates an application master, that application master > requests resources, and the job gets run. It differs from Map Reduce V1 is > there is no long running Job Tracker (other than the YARN Resource Manager, > but that is managing resources for all applications, not just Map Reduce > Applications). Ok, so Mesos, why can't there be a Mesos Application that > is similar to a Map Reduce V2 Application in YARN? Why do we need to run > YARN on Mesos? That doesn't really make sense. Basically, for M/R V2 vs > M/R V1, the only difference is to mimic M/R V1 we need task trackers and > job trackers running as Mesos applications (which we have). So in M/R v2, > we just need the equivalent of an application master running on Yarn, > requesting resources across the cluster. > > Fundamentally, YARN is confusing because I think they coupled running Map > Reduce jobs with the resource manager and called it "Hadoop v2". By > coupling the two, people look at YARN as Map Reduce V2, but it's not > really. It's a way to running jobs on a cluster of machines (ala Mesos) > with a "application" that is the equivalent of Map Reduce V1. The names > being given seem to be confusing to me, it makes people who have invested > in Hadoop (Map Reduce V1) be very interested in YARN because it's called > "Hadoop V2". While Mesos is seen as the "Other" > > > Just for my sake I summarized a TL;DR form so if someone wants to correct > my understanding they can > > Mesos = Tool to manage resources > > YARN = Tool to manage resources it's also called Hadoopv2 > > Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run > on Hadoop clusters, and Mesos. It's also called Hadoopv1 > > Map Reduce V2 = Application that can run on YARN that mimics Map Reduce > V1 on a YARN Cluster. This + YARN has been called Hadoopv2. > > > > > > > > > > > > > > > > > > > On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou < > maxime.brugi...@gmail.com> wrote: > >> When I said that running yarn over mesos did not make sense I meant that >> running a resource manager in a resource manager was very sub-optimal. You >> will eventually do static allocation of resources for the Yarn framework in >> Mesos or have complex logic to determine how much resource should be given >> to yarn. You will also have the same burden of managing 2 different >> clusters instead of one, even if yarn is sort of hidden as mesos framework. >> >> However yes I believe its easier to run yarn on mesos than to run mrv2 on >> top of mesos. The solution I was discussing was obviously "ideal" and I >> looked at the MRAppMaster since and it discouraged me :) >> On Jul 27, 2014 12:41 AM, "Rick Richardson" <rick.richard...@gmail.com> >> wrote: >> >>> FWIW I also think the fastest approach here is is porting Yarn onto >>> Mesos. >>> >>> In a perfect world, writing an implementation layer for the Yarn >>> Interface on Mesos would certainly be the optimal approach, but looking at >>> the MRv2 code, it is very very coupled to many Yarn modules. >>> >>> If someone wanted to take on the project of making a generic resource >>> scheduler Interface for MRv2, that works be amazing :) >>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yujie....@gmail.com> wrote: >>> >>>> I am interested in investigating the idea of YARN on top of Mesos. One >>>> of the benefits I can think of is that we can get rid of the static >>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can >>>> allocate those resources that are not used by YARN to other Mesos >>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization >>>> of the entire data center. Also, we could avoid running each MRv2 job as a >>>> framework which I think might cause some maintenance complexity (e.g. for >>>> framework rate limiting, etc). Finally, YARN currently does not have a good >>>> isolation support. It only supports cpu isolation right now (using >>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the >>>> existing Mesos containerizer strategy to provide better isolation between >>>> tasks. Maxime, I am curious why do you think it does not make sense to run >>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing >>>> something. >>>> >>>> I have been thinking of making ResourceManager in YARN a Mesos >>>> framework and making NodeManager a Mesos executor. The NodeManager will >>>> launch containers using primitives provided by Mesos so that we have a >>>> consistent containerizer layer. I haven't fully figured out how this could >>>> be done yet (e.g., nested containers, communication between NodeManager and >>>> ResourceManager, etc.), but I would love to explore this direction. I would >>>> like to hear about any feedback/suggestions you guys have about this >>>> direction. >>>> >>>> Thanks, >>>> - Jie >>>> >>>> >>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou < >>>> maxime.brugi...@gmail.com> wrote: >>>> >>>>> We run both mesos and yarn in prod and it does not make sense to run >>>>> yarn over mesos. >>>>> >>>>> However it would be interesting to find a way to run MRv2 jobs on >>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to >>>>> start >>>>> though... MRv2 contains a yarn application master that needs to be >>>>> rewritten as a mesos framework scheduler. This is probably doable. However >>>>> with MRv2 every map reduce job would be mapped as a new framework in >>>>> Mesos. >>>>> Not sure how many frameworks mesos can run and scale up to. Especially >>>>> short lived frameworks. >>>>> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t...@duedil.com> wrote: >>>>> >>>>>> Hey Luyi, >>>>>> >>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2 >>>>>> MRv1. It also doesn't have great support for the HA jobtracker available >>>>>> in >>>>>> newer versions of Hadoop, but I've been working on that the past few >>>>>> weeks. >>>>>> >>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested >>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of >>>>>> YARN? >>>>>> >>>>>> I wonder if anyone else on the mailing list is running YARN on top of >>>>>> Mesos... >>>>>> >>>>>> Tom. >>>>>> >>>>>> On Friday, 25 July 2014, Luyi Wang <wangluyi1...@gmail.com> wrote: >>>>>> >>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It >>>>>>> listed support for MapReduce V1 >>>>>>> >>>>>>> How about the MR V2? >>>>>>> >>>>>>> Right now we are using cloudera to manage hadoop clusters where uses >>>>>>> MRV2. We are planning to migrate all our services to mesos(still in the >>>>>>> initial investigating stage). Good suggestions, advice and experiences >>>>>>> are >>>>>>> welcomed. >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> >>>>>>> -Luyi. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>> >