Howdy, We tried running Spark 0.9.1 stand-alone inside docker containers distributed over multiple hosts. This is complicated due to Spark opening up ephemeral / dynamic ports for the workers and the CLI. To ensure our docker solution doesn't break Spark in unexpected ways and maintains a secure cluster, I am interested in understanding more about Spark's network architecture. I'd appreciate it if you could you point us to any documentation!
A couple specific questions: What are these ports being used for? Checking out the code / experiments, it looks like asynchronous communication for shuffling around results. Anything else?How do you secure the network? Network administrators tend to secure and monitor the network at the port level. If these ports are dynamic and open randomly, firewalls are not easily configured and security alarms are raised. Is there a way to limit the range easily? (We did investigate setting the kernel parameter ip_local_reserved_ports, but this is broken [1] on some versions of Linux's cgroups.) Thanks, Jacob [1] https://github.com/lxc/lxc/issues/97 Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512) 286-6075