I’m investigating the possibility of using Mesos to solve the problem of 
resource allocation between a Hadoop cluster and set of Jenkins slaves (and I 
like the possibility of being able to easily deploy other frameworks). One of 
the biggest overhanging questions I can’t seem to find an answer to is how to 
manage system dependencies across a wide variety of frameworks, and jobs 
running within those frameworks.

I came across this thread 
(http://www.mail-archive.com/user@mesos.apache.org/msg00301.html) and caching 
executor files seems to be the running solution, though not implemented yet. I 
too would really like to avoid shipping system dependencies (c-deps for python 
packages, as an example) along with every single job, and i’m especially unsure 
how this would interact with the Hadoop/Jenkins mesos schedulers (as each 
hadoop job may require it’s own system dependencies).

More importantly, the architecture of the machine submitting the job is often 
different from the slaves so we can’t simply ship all the built dependencies 
with the task.

We’re solving this problem at the moment for Hadoop by installing all 
dependencies we require on every hadoop task tracker node, which is far from 
ideal. For jenkins, we’re using Docker to isolate execution of different types 
of jobs, and built all system dependencies for a suite of jobs into docker 
images.

I like the idea of continuing down the path of Docker for process isolation and 
system dependency management, but I don’t see any easy way for this to interact 
with the existing hadoop/jenkins/etc. schedulers. I guess it’d require us to 
build our own schedulers/executors that wrapped the process in a Docker 
container.

I’d love to hear how others are solving this problem… and/or whether Docker 
seems like the wrong way to go.

—

Tom Arnfeld
Developer // DueDil

Reply via email to