Howdy,

We tried running Spark 0.9.1 stand-alone inside docker containers distributed 
over multiple hosts. This is complicated due to Spark opening up ephemeral / 
dynamic ports for the workers and the CLI.  To ensure our docker solution 
doesn't break Spark in unexpected ways and maintains a secure cluster, I am 
interested in understanding more about Spark's network architecture. I'd 
appreciate it if you could you point us to any documentation!

A couple specific questions:
What are these ports being used for?

Checking out the code / experiments, it looks like asynchronous communication 
for shuffling around results. Anything else?How do you secure the network?

Network administrators tend to secure and monitor the network at the port 
level. If these ports are dynamic and open randomly, firewalls are not easily 
configured and security alarms are raised. Is there a way to limit the range 
easily? (We did investigate setting the kernel parameter 
ip_local_reserved_ports, but this is broken [1] on some versions of Linux's 
cgroups.)

Thanks,
Jacob

[1] https://github.com/lxc/lxc/issues/97 

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075

Reply via email to