Hi John,

There is no other blog post yet, I'm thinking to do a series of posts but
so far haven't get time to do that yet.

Running Spark in docker containers makes distributing spark versions easy,
it's simple to upgrade and automatically caches on the slaves so the same
image just runs right away. Most of the docker perf is usually related to
network and filesystem overheads, but I think with recent changes in Spark
to make Mesos sandbox the default temp dir filesystem won't be a big
concern as it's mostly writing to the mounted in Mesos sandbox. Also Mesos
uses host network by default so network is affected much.

Most of the cluster mode limitation is that you need to make the spark job
files available somewhere that all the slaves can access remotely (http,
s3, hdfs, etc) or available on all slaves locally by path.

I'll try to make more doc efforts once I get my existing patches and
testing infra work done.

Let me know if you have more questions,

Tim

On Sat, Sep 19, 2015 at 5:42 AM, John Omernik <j...@omernik.com> wrote:

> I was searching in the 1.5.0 docs on the Docker on Mesos capabilities and
> just found you CAN run it this way.  Are there any user posts, blog posts,
> etc on why and how you'd do this?
>
> Basically, at first I was questioning why you'd run spark in a docker
> container, i.e., if you run with tar balled executor, what are you really
> gaining?  And in this setup, are you losing out on performance somehow? (I
> am guessing smarter people than I have figured that out).
>
> Then I came along a situation where I wanted to use a python library with
> spark, and it had to be installed on every node, and I realized one big
> advantage of dockerized spark would be that spark apps that needed other
> libraries could be contained and built well.
>
> OK, that's huge, let's do that.  For my next question there are lot of
> "questions" have on how this actually works.  Does Clustermode/client mode
> apply here? If so, how?  Is there a good walk through on getting this
> setup? Limitations? Gotchas?  Should I just dive in an start working with
> it? Has anyone done any stories/rough documentation? This seems like a
> really helpful feature to scaling out spark, and letting developers truly
> build what they need without tons of admin overhead, so I really want to
> explore.
>
> Thanks!
>
> John
>

Reply via email to