Hi Dana,

Yes, we get VPC + EMR working but I'm not the person who deploys it. It is
related to subnet as Alex points out.

Just to want to add another point, spark-ec2 is nice to keep and improve
because it allows users to any version of spark (nightly-build for
example). EMR does not allow you to do that without manual process.

Best Regards,

Jerry

On Wed, Dec 2, 2015 at 1:02 PM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> Do you think it's a security issue if EMR started in VPC with a subnet
> having Auto-assign Public IP: Yes
>
> you can remove all Inbound rules having 0.0.0.0/0 Source in master and
> slave Security Group
> So, master and slave boxes will be accessible only for users who are on VPN
>
>
>
>
> On Wed, Dec 2, 2015 at 9:44 AM, Dana Powers <dana.pow...@gmail.com> wrote:
>
>> EMR was a pain to configure on a private VPC last I tried. Has anyone had
>> success with that? I found spark-ec2 easier to use w private networking,
>> but also agree that I would use for prod.
>>
>> -Dana
>> On Dec 1, 2015 12:29 PM, "Alexander Pivovarov" <apivova...@gmail.com>
>> wrote:
>>
>>> 1. Emr 4.2.0 has Zeppelin as an alternative to DataBricks Notebooks
>>>
>>> 2. Emr has Ganglia 3.6.0
>>>
>>> 3. Emr has hadoop fs settings to make s3 work fast (direct.EmrFileSystem)
>>>
>>> 4. EMR has s3 keys in hadoop configs
>>>
>>> 5. EMR allows to resize cluster on fly.
>>>
>>> 6. EMR has aws sdk in spark classpath. Helps to reduce app assembly jar
>>> size
>>>
>>> 7. ec2 script installs all in /root, EMR has dedicated users: hadoop,
>>> zeppelin, etc. EMR is similar to Cloudera or Hortonworks
>>>
>>> 8. There are at least 3 spark-ec2 projects. (in apache/spark, in mesos,
>>> in amplab). Master branch in spark has outdated ec2 script. Other projects
>>> have broken links in readme. WHAT A MESS!
>>>
>>> 9. ec2 script has bad documentation and non informative error messages.
>>> e.g. readme does not say anything about --private-ips option. If you did
>>> not add the flag it will connect to empty string host (localhost) instead
>>> of master. Fixed only last week. Not sure if fixed in all branches
>>>
>>> 10. I think Amazon will include spark-jobserver to EMR soon.
>>>
>>> 11. You do not need to be aws expert to start EMR cluster. Users can use
>>> EMR web ui to start cluster to run some jobs or work in Zeppelun during the
>>> day
>>>
>>> 12. EMR cluster starts in abour 8 min. Ec2 script works longer and you
>>> need to be online.
>>> On Dec 1, 2015 9:22 AM, "Jerry Lam" <chiling...@gmail.com> wrote:
>>>
>>>> Simply put:
>>>>
>>>> EMR = Hadoop Ecosystem (Yarn, HDFS, etc) + Spark + EMRFS + Amazon EMR
>>>> API + Selected Instance Types + Amazon EC2 Friendly (bootstrapping)
>>>> spark-ec2 = HDFS + Yarn (Optional) + Spark (Standalone Default) + Any
>>>> Instance Type
>>>>
>>>> I use spark-ec2 for prototyping and I have never use it for production.
>>>>
>>>> just my $0.02
>>>>
>>>>
>>>>
>>>> On Dec 1, 2015, at 11:15 AM, Nick Chammas <nicholas.cham...@gmail.com>
>>>> wrote:
>>>>
>>>> Pinging this thread in case anyone has thoughts on the matter they want
>>>> to share.
>>>>
>>>> On Sat, Nov 21, 2015 at 11:32 AM Nicholas Chammas <[hidden email]>
>>>> wrote:
>>>>
>>>>> Spark has come bundled with spark-ec2
>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> for many
>>>>> years. At the same time, EMR has been capable of running Spark for a 
>>>>> while,
>>>>> and earlier this year it added "official" support
>>>>> <https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/>.
>>>>>
>>>>> If you're looking for a way to provision Spark clusters, there are
>>>>> some clear differences between these 2 options. I think the biggest one
>>>>> would be that EMR is a "production" solution backed by a company, whereas
>>>>> spark-ec2 is not really intended for production use (as far as I know).
>>>>>
>>>>> That particular difference in intended use may or may not matter to
>>>>> you, but I'm curious:
>>>>>
>>>>> What are some of the other differences between the 2 that do matter to
>>>>> you? If you were considering these 2 solutions for your use case at one
>>>>> point recently, why did you choose one over the other?
>>>>>
>>>>> I'd be especially interested in hearing about why people might choose
>>>>> spark-ec2 over EMR, since the latter option seems to have shaped up nicely
>>>>> this year.
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>> ------------------------------
>>>> View this message in context: Re: spark-ec2 vs. EMR
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-spark-ec2-vs-EMR-tp25538.html>
>>>> Sent from the Apache Spark User List mailing list archive
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>>
>>>>
>>>>
>

Reply via email to