Note that you could also launch the framework scheduler/JT itself via Marathon, and then it would run as a Mesos task on one of the slaves, automatically restarting (potentially elsewhere) if it dies. Then you could use something like mesos-dns <https://github.com/mesosphere/mesos-dns> to set the mapred.job.tracker property to "marathon.hadoop.jobtracker:port" or whatever name it generates.
For YARN/MR2 workloads, you might also want to check out Myriad <https://github.com/mesos/myriad> On Fri, Jan 30, 2015 at 5:17 AM, Alex <alex.m.lis...@gmail.com> wrote: > Hi Tom, > > Thanks a lot for your reply, it's very helpful. > > On 01/29/2015 05:54 PM, Tom Arnfeld wrote: > > Hi Alex, > > Great to hear you're hoping to use Hadoop on Mesos. We've been running > it for a good 6 months and it's been awesome. > > I'll answer the simpler question first, running multiple job trackers > should be just fine.. even multiple JTs with HA enabled (we do this). The > mesos scheduler for Hadoop will ship all configuration options needed for > each TaskTracker within mesos, so there's nothing you need to have > specifically configured on each slave.. > > # Slow slot allocations > > If you only have a few slaves, not many resources and a large amount of > resources per slot, you might end up with a pretty small slot allocation > (e.g 5 mappers and 1 reducer). Because of the nature of Hadoop, slots are > static for each TaskTracker and the framework does a *best effort* to > figure out what balance of map/reduce slots to launch on the cluster. > > Because of this, the current stable version of the framework has a few > issues when running on small clusters, especially when you don't configure > min/max slot capacity for each JobTracker. Few links below > > - https://github.com/mesos/hadoop/issues/32 > - https://github.com/mesos/hadoop/issues/31 > - https://github.com/mesos/hadoop/issues/28 > - https://github.com/mesos/hadoop/issues/26 > > Having said that, we've been working on a solution to this problem which > enables Hadoop to launch different types of slots over the lifetime of a > single job, meaning you could start with 5 maps and 1 reduce, and then end > with 0 maps and 6 reduce. It's not perfect, but it's a decent optimisation > if you still need to use Hadoop. > > - https://github.com/mesos/hadoop/pull/33 > > You may also want to look into how large your executor URI is (the one > containing the hadoop source that gets downloaded for each task tracker) > and how long that takes to download.. it might be that the task trackers > are taking a while to bootstrap. > > > Do you have any idea of when your pull request will be merged? It looks > pretty interesting, even if we're just playing around at this point. Is > your hadoop-mesos-0.0.9.jar available for download somewhere, or do I have > to build it myself? In the meantime, I'm adding more slaves to see if this > makes the problem go away, at least for demos. > > > # HA Hadoop JTs > > The framework currently does not support a full HA setup, however that's > not a huge issue. The JT will automatically restart jobs where they left > off on it's own when a failover occurs, but for the time being all the > track trackers will be killed and new ones spawned. Depending on your > setup, this could be a fairly negligible time. > > > I'm not sure I understand. I know task trackers will get restarted, that's > not what I'm worried about. The issue I see is with the JT: it's started on > one master only. If that master goes down, the framework goes down. I was > kind of hoping to be able to do something like this: > > <property> > <name>mapred.job.tracker</name> > <value>zk://mesos01:2181,mesos02:2181,mesos03:2181/hadoop530</value> > </property> > > Perhaps this doesn't actually work as I would expect. It doesn't look like > there's been any progress on issue #28, unfortunately... > > > # Multiple versions of hadoop on the cluster > > This is totally fine, each JT configuration can be given it's own hadoop > tar.gz file with the right version in it, and they will all happily share > the Mesos cluster. > > I guess you have to have multiple startup scripts for this, and also > multiple versions of Hadoop on the masters. Any pointers of how you've set > this up would be much appreciated. > > Cheers, > Alex >