Re: Spark on Yarn: Connecting to Existing Instance

John Omernik Wed, 09 Jul 2014 10:48:06 -0700

So basically, I have Spark on Yarn running (spark shell) how do I connect
to it with another tool I am trying to test using the spark://IP:7077  URL
it's expecting? If that won't work with spark shell, or yarn-client mode,
how do I setup Spark on Yarn to be able to handle that?


Thanks!




On Wed, Jul 9, 2014 at 12:41 PM, John Omernik <j...@omernik.com> wrote:

> Thank you for the link.  In that link the following is written:
>
> For those familiar with the Spark API, an application corresponds to an
> instance of the SparkContext class. An application can be used for a
> single batch job, an interactive session with multiple jobs spaced apart,
> or a long-lived server continually satisfying requests
>
> So, if I wanted to use "a long-lived server continually satisfying
> requests" and then start a shell that connected to that context, how would
> I do that in Yarn? That's the problem I am having right now, I just want
> there to be that long lived service that I can utilize.
>
> Thanks!
>
>
> On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
>
>> To add to Ron's answer, this post explains what it means to run Spark
>> against a YARN cluster, the difference between yarn-client and yarn-cluster
>> mode, and the reason spark-shell only works in yarn-client mode.
>>
>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>
>> -Sandy
>>
>>
>> On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez <zlgonza...@yahoo.com>
>> wrote:
>>
>>> The idea behind YARN is that you can run different application types
>>> like MapReduce, Storm and Spark.
>>>
>>> I would recommend that you build your spark jobs in the main method
>>> without specifying how you deploy it. Then you can use spark-submit to tell
>>> Spark how you would want to deploy to it using yarn-cluster as the master.
>>> The key point here is that once you have YARN setup, the spark client
>>> connects to it using the $HADOOP_CONF_DIR that contains the resource
>>> manager address. In particular, this needs to be accessible from the
>>> classpath of the submitter since it implicitly uses this when it
>>> instantiates a YarnConfiguration instance. If you want more details, read
>>> org.apache.spark.deploy.yarn.Client.scala.
>>>
>>> You should be able to download a standalone YARN cluster from any of the
>>> Hadoop providers like Cloudera or Hortonworks. Once you have that, the
>>> spark programming guide describes what I mention above in sufficient detail
>>> for you to proceed.
>>>
>>> Thanks,
>>> Ron
>>>
>>> Sent from my iPad
>>>
>>> > On Jul 9, 2014, at 8:31 AM, John Omernik <j...@omernik.com> wrote:
>>> >
>>> > I am trying to get my head around using Spark on Yarn from a
>>> perspective of a cluster. I can start a Spark Shell no issues in Yarn.
>>> Works easily.  This is done in yarn-client mode and it all works well.
>>> >
>>> > In multiple examples, I see instances where people have setup Spark
>>> Clusters in Stand Alone mode, and then in the examples they "connect" to
>>> this cluster in Stand Alone mode. This is done often times using the
>>> spark:// string for connection.  Cool. s
>>> > But what I don't understand is how do I setup a Yarn instance that I
>>> can "connect" to? I.e. I tried running Spark Shell in yarn-cluster mode and
>>> it gave me an error, telling me to use yarn-client.  I see information on
>>> using spark-class or spark-submit.  But what I'd really like is a instance
>>> I can connect a spark-shell too, and have the instance stay up. I'd like to
>>> be able run other things on that instance etc. Is that possible with Yarn?
>>> I know there may be long running job challenges with Yarn, but I am just
>>> testing, I am just curious if I am looking at something completely bonkers
>>> here, or just missing something simple.
>>> >
>>> > Thanks!
>>> >
>>> >
>>>
>>
>>
>

Re: Spark on Yarn: Connecting to Existing Instance

Reply via email to