Re: Spark on Yarn: Connecting to Existing Instance

Chris Fregly Thu, 21 Aug 2014 13:46:02 -0700

perhaps the author is referring to Spark Streaming applications?  they're
examples of long-running applications.


the application/domain-level protocol still needs to be implemented
yourself, as sandy pointed out.


On Wed, Jul 9, 2014 at 11:03 AM, John Omernik <j...@omernik.com> wrote:

> So how do I do the "long-lived server continually satisfying requests" in
> the Cloudera application? I am very confused by that at this point.
>
>
> On Wed, Jul 9, 2014 at 12:49 PM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
>
>> Spark doesn't currently offer you anything special to do this.  I.e. if
>> you want to write a Spark application that fires off jobs on behalf of
>> remote processes, you would need to implement the communication between
>> those remote processes and your Spark application code yourself.
>>
>>
>> On Wed, Jul 9, 2014 at 10:41 AM, John Omernik <j...@omernik.com> wrote:
>>
>>> Thank you for the link.  In that link the following is written:
>>>
>>> For those familiar with the Spark API, an application corresponds to an
>>> instance of the SparkContext class. An application can be used for a
>>> single batch job, an interactive session with multiple jobs spaced apart,
>>> or a long-lived server continually satisfying requests
>>>
>>> So, if I wanted to use "a long-lived server continually satisfying
>>> requests" and then start a shell that connected to that context, how would
>>> I do that in Yarn? That's the problem I am having right now, I just want
>>> there to be that long lived service that I can utilize.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza <sandy.r...@cloudera.com>
>>> wrote:
>>>
>>>> To add to Ron's answer, this post explains what it means to run Spark
>>>> against a YARN cluster, the difference between yarn-client and yarn-cluster
>>>> mode, and the reason spark-shell only works in yarn-client mode.
>>>>
>>>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez <zlgonza...@yahoo.com>
>>>> wrote:
>>>>
>>>>> The idea behind YARN is that you can run different application types
>>>>> like MapReduce, Storm and Spark.
>>>>>
>>>>> I would recommend that you build your spark jobs in the main method
>>>>> without specifying how you deploy it. Then you can use spark-submit to 
>>>>> tell
>>>>> Spark how you would want to deploy to it using yarn-cluster as the master.
>>>>> The key point here is that once you have YARN setup, the spark client
>>>>> connects to it using the $HADOOP_CONF_DIR that contains the resource
>>>>> manager address. In particular, this needs to be accessible from the
>>>>> classpath of the submitter since it implicitly uses this when it
>>>>> instantiates a YarnConfiguration instance. If you want more details, read
>>>>> org.apache.spark.deploy.yarn.Client.scala.
>>>>>
>>>>> You should be able to download a standalone YARN cluster from any of
>>>>> the Hadoop providers like Cloudera or Hortonworks. Once you have that, the
>>>>> spark programming guide describes what I mention above in sufficient 
>>>>> detail
>>>>> for you to proceed.
>>>>>
>>>>> Thanks,
>>>>> Ron
>>>>>
>>>>> Sent from my iPad
>>>>>
>>>>> > On Jul 9, 2014, at 8:31 AM, John Omernik <j...@omernik.com> wrote:
>>>>> >
>>>>> > I am trying to get my head around using Spark on Yarn from a
>>>>> perspective of a cluster. I can start a Spark Shell no issues in Yarn.
>>>>> Works easily.  This is done in yarn-client mode and it all works well.
>>>>> >
>>>>> > In multiple examples, I see instances where people have setup Spark
>>>>> Clusters in Stand Alone mode, and then in the examples they "connect" to
>>>>> this cluster in Stand Alone mode. This is done often times using the
>>>>> spark:// string for connection.  Cool. s
>>>>> > But what I don't understand is how do I setup a Yarn instance that I
>>>>> can "connect" to? I.e. I tried running Spark Shell in yarn-cluster mode 
>>>>> and
>>>>> it gave me an error, telling me to use yarn-client.  I see information on
>>>>> using spark-class or spark-submit.  But what I'd really like is a instance
>>>>> I can connect a spark-shell too, and have the instance stay up. I'd like 
>>>>> to
>>>>> be able run other things on that instance etc. Is that possible with Yarn?
>>>>> I know there may be long running job challenges with Yarn, but I am just
>>>>> testing, I am just curious if I am looking at something completely bonkers
>>>>> here, or just missing something simple.
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark on Yarn: Connecting to Existing Instance

Reply via email to