Re: Running Spark on EMR

Everett Anderson Mon, 16 Jan 2017 12:43:39 -0800

On Sun, Jan 15, 2017 at 11:09 AM, Andrew Holway <
andrew.hol...@otternetworks.de> wrote:


> use yarn :)
>
> "spark-submit --master yarn"
>

Doesn't this require first copying out various Hadoop configuration XML
files from the EMR master node to the machine running the spark-submit? Or
is there a well-known minimal set of host/port options to avoid that?

I'm currently copying out several XML files and using them on a client
running spark-submit, but I feel uneasy about this as it seems like the
local values override values on the cluster at runtime -- they're copied up
with the job.




>
>
> On Sun, Jan 15, 2017 at 7:55 PM, Darren Govoni <dar...@ontrenet.com>
> wrote:
>
>> So what was the answer?
>>
>>
>>
>> Sent from my Verizon, Samsung Galaxy smartphone
>>
>> -------- Original message --------
>> From: Andrew Holway <andrew.hol...@otternetworks.de>
>> Date: 1/15/17 11:37 AM (GMT-05:00)
>> To: Marco Mistroni <mmistr...@gmail.com>
>> Cc: Neil Jonkers <neilod...@gmail.com>, User <user@spark.apache.org>
>> Subject: Re: Running Spark on EMR
>>
>> Darn. I didn't respond to the list. Sorry.
>>
>>
>>
>> On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> thanks Neil. I followed original suggestion from Andrw and everything is
>>> working fine now
>>> kr
>>>
>>> On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers <neilod...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Can you drop the url:
>>>>
>>>>  spark://master:7077
>>>>
>>>> The url is used when running Spark in standalone mode.
>>>>
>>>> Regards
>>>>
>>>>
>>>> -------- Original message --------
>>>> From: Marco Mistroni
>>>> Date:15/01/2017 16:34 (GMT+02:00)
>>>> To: User
>>>> Subject: Running Spark on EMR
>>>>
>>>> hi all
>>>>  could anyone assist here?
>>>> i am trying to run spark 2.0.0 on an EMR cluster,but i am having issues
>>>> connecting to the master node
>>>> So, below is a snippet of what i am doing
>>>>
>>>>
>>>> sc = SparkSession.builder.master(sparkHost).appName("DataProcess"
>>>> ).getOrCreate()
>>>>
>>>> sparkHost is passed as input parameter. that was thought so that i can
>>>> run the script locally
>>>> on my spark local instance as well as submitting scripts on any cluster
>>>> i want
>>>>
>>>>
>>>> Now i have
>>>> 1 - setup a cluster on EMR.
>>>> 2 - connected to masternode
>>>> 3  - launch the command spark-submit myscripts.py spark://master:7077
>>>>
>>>> But that results in an connection refused exception
>>>> Then i have tried to remove the .master call above and launch the
>>>> script with the following command
>>>>
>>>> spark-submit --master spark://master:7077   myscript.py  but still i
>>>> am getting
>>>> connectionREfused exception
>>>>
>>>>
>>>> I am using Spark 2.0.0 , could anyone advise on how shall i build the
>>>> spark session and how can i submit a pythjon script to the cluster?
>>>>
>>>> kr
>>>>  marco
>>>>
>>>
>>>
>>
>>
>> --
>> Otter Networks UG
>> http://otternetworks.de
>> Gotenstraße 17
>> 10829 Berlin
>>
>
>
>
> --
> Otter Networks UG
> http://otternetworks.de
> Gotenstraße 17
> 10829 Berlin
>

Re: Running Spark on EMR

Reply via email to