Re: Zeppelin - Spark Driver location

ankit jain Wed, 14 Mar 2018 08:44:09 -0700

Hi Jhang,
Not clear on that - I thought spark-submit was done when we run a
paragraph, how does the .sh file come into play?


Thanks
Ankit

On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <[email protected]> wrote:

>
> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
> cluster mode. It is expected to run driver in separate host, but didn't
> guaranteed zeppelin support this.
>
> Ankit Jain <[email protected]>于2018年3月14日周三 上午8:34写道：
>
>> Hi Jhang,
>> What is the expected behavior with standalone cluster mode? Should we see
>> separate driver processes in the cluster(one per user) or multiple
>> SparkSubmit processes?
>>
>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does the
>> Spark-submit to the cluster? Can you please point to it?
>>
>> Thanks
>> Ankit
>>
>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <[email protected]> wrote:
>>
>>
>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>> so guaranteed it would work. But don't' have test for standalone, so not
>> sure the behavior of standalone mode.
>>
>>
>> Ruslan Dautkhanov <[email protected]>于2018年3月14日周三 上午8:06写道：
>>
>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster in
>>> it's title so I assume it's only yarn-cluster.
>>> Never used standalone-cluster myself.
>>>
>>> Which distro of Hadoop do you use?
>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>> https://www.cloudera.com/documentation/enterprise/
>>> release-notes/topics/rg_deprecated.html
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>> [email protected]> wrote:
>>>
>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>> standalone too ?
>>>>
>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>> [email protected]> escribió:
>>>>
>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>
>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end of
>>>>> September so not sure if you have that.
>>>>>
>>>>> Check out https://medium.com/@zjffdu/zeppelin-0-8-0-new-
>>>>> features-ea53e8810235 how to set this up.
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>> [email protected]> wrote:
>>>>>
>>>> Hi zeppelin users !
>>>>>>
>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>> trying to figure out a way to make zeppelin runs the spark driver outside
>>>>>> of client process that submits the application.
>>>>>>
>>>>>> According with the documentation (http://spark.apache.org/docs/
>>>>>> 2.1.1/spark-standalone.html):
>>>>>>
>>>>>> *For standalone clusters, Spark currently supports two deploy modes.
>>>>>> In client mode, the driver is launched in the same process as the client
>>>>>> that submits the application. In cluster mode, however, the driver is
>>>>>> launched from one of the Worker processes inside the cluster, and the
>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>> submitting the application without waiting for the application to 
>>>>>> finish.*
>>>>>>
>>>>>> The problem is that, even when I set the properties for
>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still run
>>>>>> inside zeppelin machine (according with spark UI/executors page). These 
>>>>>> are
>>>>>> properties that I am setting for the spark interpreter:
>>>>>>
>>>>>> master: spark://<master-name>:7077
>>>>>> spark.submit.deployMode: cluster
>>>>>> spark.executor.memory: 16g
>>>>>>
>>>>>> Any ideas would be appreciated.
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Details:
>>>>>> Spark version: 2.1.1
>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>
>>>>>


-- 
Thanks & Regards,
Ankit.

Re: Zeppelin - Spark Driver location

Reply via email to