Hi Tom,

Thanks a lot for your reply, it's very helpful.

On 01/29/2015 05:54 PM, Tom Arnfeld wrote:
> Hi Alex,
>
> Great to hear you're hoping to use Hadoop on Mesos. We've been running
> it for a good 6 months and it's been awesome.
>
> I'll answer the simpler question first, running multiple job trackers
> should be just fine.. even multiple JTs with HA enabled (we do this).
> The mesos scheduler for Hadoop will ship all configuration options
> needed for each TaskTracker within mesos, so there's nothing you need
> to have specifically configured on each slave..
>
> # Slow slot allocations
>
> If you only have a few slaves, not many resources and a large amount
> of resources per slot, you might end up with a pretty small slot
> allocation (e.g 5 mappers and 1 reducer). Because of the nature of
> Hadoop, slots are static for each TaskTracker and the framework does a
> /best effort/ to figure out what balance of map/reduce slots to launch
> on the cluster.
>
> Because of this, the current stable version of the framework has a few
> issues when running on small clusters, especially when you don't
> configure min/max slot capacity for each JobTracker. Few links below
>
> - https://github.com/mesos/hadoop/issues/32
> - https://github.com/mesos/hadoop/issues/31
> - https://github.com/mesos/hadoop/issues/28
> - https://github.com/mesos/hadoop/issues/26
>
> Having said that, we've been working on a solution to this problem
> which enables Hadoop to launch different types of slots over the
> lifetime of a single job, meaning you could start with 5 maps and 1
> reduce, and then end with 0 maps and 6 reduce. It's not perfect, but
> it's a decent optimisation if you still need to use Hadoop.
>
> - https://github.com/mesos/hadoop/pull/33
>
> You may also want to look into how large your executor URI is (the one
> containing the hadoop source that gets downloaded for each task
> tracker) and how long that takes to download.. it might be that the
> task trackers are taking a while to bootstrap.

Do you have any idea of when your pull request will be merged? It looks
pretty interesting, even if we're just playing around at this point. Is
your hadoop-mesos-0.0.9.jar available for download somewhere, or do I
have to build it myself? In the meantime, I'm adding more slaves to see
if this makes the problem go away, at least for demos.

>
> # HA Hadoop JTs
>
> The framework currently does not support a full HA setup, however
> that's not a huge issue. The JT will automatically restart jobs where
> they left off on it's own when a failover occurs, but for the time
> being all the track trackers will be killed and new ones spawned.
> Depending on your setup, this could be a fairly negligible time.

I'm not sure I understand. I know task trackers will get restarted,
that's not what I'm worried about. The issue I see is with the JT: it's
started on one master only. If that master goes down, the framework goes
down. I was kind of hoping to be able to do something like this:

<property>
  <name>mapred.job.tracker</name>
  <value>zk://mesos01:2181,mesos02:2181,mesos03:2181/hadoop530</value>
</property>

Perhaps this doesn't actually work as I would expect. It doesn't look
like there's been any progress on issue #28, unfortunately...

>
> # Multiple versions of hadoop on the cluster
>
> This is totally fine, each JT configuration can be given it's own
> hadoop tar.gz file with the right version in it, and they will all
> happily share the Mesos cluster.
>
I guess you have to have multiple startup scripts for this, and also
multiple versions of Hadoop on the masters. Any pointers of how you've
set this up would be much appreciated.

Cheers,
Alex

Reply via email to