Howdy Akhil,

Thanks - that did help!  And, it made me think about how the EC2 scripts
work [1] to set up security.  From my understanding of EC2 security groups
[2], this just sets up external access, right?  (This has no effect on
internal communication between the instances, right?)

I am still confused as to why I am seeing the workers open up new ports for
each job.

Jacob

[1] https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L230
[2]
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Akhil Das <ak...@sigmoidanalytics.com>
To:     user@spark.apache.org
Date:   04/25/2014 12:51 PM
Subject:        Re: Securing Spark's Network
Sent by:        ak...@mobipulse.in



Hi Jacob,

This post might give you a brief idea about the ports being used

https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA





On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger <jeis...@us.ibm.com> wrote:
  Howdy,

  We tried running Spark 0.9.1 stand-alone inside docker containers
  distributed over multiple hosts. This is complicated due to Spark opening
  up ephemeral / dynamic ports for the workers and the CLI.  To ensure our
  docker solution doesn't break Spark in unexpected ways and maintains a
  secure cluster, I am interested in understanding more about Spark's
  network architecture. I'd appreciate it if you could you point us to any
  documentation!

  A couple specific questions:
     1. What are these ports being used for?
        Checking out the code / experiments, it looks like asynchronous
        communication for shuffling around results. Anything else?
     2. How do you secure the network?
        Network administrators tend to secure and monitor the network at
        the port level. If these ports are dynamic and open randomly,
        firewalls are not easily configured and security alarms are raised.
        Is there a way to limit the range easily? (We did investigate
        setting the kernel parameter ip_local_reserved_ports, but this is
        broken [1] on some versions of Linux's cgroups.)

  Thanks,
  Jacob

  [1] https://github.com/lxc/lxc/issues/97

  Jacob D. Eisinger
  IBM Emerging Technologies
  jeis...@us.ibm.com - (512) 286-6075

Reply via email to