Re: running spark job with fat jar file

ayan guha Mon, 17 Jul 2017 12:58:40 -0700

Hi Mitch - YARN uses a specific folder convention comprising application
id, container id, attempt number and so on. Once you run a spark-submit
using Yarn, you can see your application in Yarn RM UI page. Once the app
finishes, you can see all logs using


yarn logs -applicationId <app_id>

In this log, you can see all details of transient folders, what goes where
and so on.

These local folders get created on OS filesystem, not on HDFS. But they are
transient so once your job finishes, Yarn cleans them up.

On Tue, Jul 18, 2017 at 5:46 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> great Ayan.
>
> Is that local folder on HDFS? Will that be a hidden folder specific to the
> user executing the spark job?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 17 July 2017 at 19:34, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> Here is my understanding:
>>
>> 1. For each container, there will be a local folder be created and
>> application jar will be copied over there
>> 2. Jars mentioned in --jars switch will be copied over to container to
>> the class path of the application.
>>
>> So to your question, --jars is not required to be copied over to all
>> nodes during submission time. YARN will take care of it.
>>
>> Best
>> Ayan
>>
>> On Tue, Jul 18, 2017 at 4:10 AM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>
>>> Yes.
>>>
>>> On Mon, Jul 17, 2017 at 10:47 AM, Mich Talebzadeh
>>> <mich.talebza...@gmail.com> wrote:
>>> > thanks Marcelo.
>>> >
>>> > are these files distributed through hdfs?
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn
>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com
>>> >
>>> >
>>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> > loss, damage or destruction of data or any other property which may
>>> arise
>>> > from relying on this email's technical content is explicitly
>>> disclaimed. The
>>> > author will in no case be liable for any monetary damages arising from
>>> such
>>> > loss, damage or destruction.
>>> >
>>> >
>>> >
>>> >
>>> > On 17 July 2017 at 18:46, Marcelo Vanzin <van...@cloudera.com> wrote:
>>> >>
>>> >> The YARN backend distributes all files and jars you submit with your
>>> >> application.
>>> >>
>>> >> On Mon, Jul 17, 2017 at 10:45 AM, Mich Talebzadeh
>>> >> <mich.talebza...@gmail.com> wrote:
>>> >> > thanks guys.
>>> >> >
>>> >> > just to clarify let us assume i am doing spark-submit as below:
>>> >> >
>>> >> > ${SPARK_HOME}/bin/spark-submit \
>>> >> >                 --packages ${PACKAGES} \
>>> >> >                 --driver-memory 2G \
>>> >> >                 --num-executors 2 \
>>> >> >                 --executor-memory 2G \
>>> >> >                 --executor-cores 2 \
>>> >> >                 --master yarn \
>>> >> >                 --deploy-mode client \
>>> >> >                 --conf "${SCHEDULER}" \
>>> >> >                 --conf "${EXTRAJAVAOPTIONS}" \
>>> >> >                 --jars ${JARS} \
>>> >> >                 --class "${FILE_NAME}" \
>>> >> >                 --conf "${SPARKUIPORT}" \
>>> >> >                 --conf "${SPARKDRIVERPORT}" \
>>> >> >                 --conf "${SPARKFILESERVERPORT}" \
>>> >> >                 --conf "${SPARKBLOCKMANAGERPORT}" \
>>> >> >                 --conf "${SPARKKRYOSERIALIZERBUFFERMAX}" \
>>> >> >                 ${JAR_FILE}
>>> >> >
>>> >> > The ${JAR_FILE} is the one. As I understand Spark should distribute
>>> that
>>> >> > ${JAR_FILE} to each container?
>>> >> >
>>> >> > Also --jars ${JARS} are the list of normal jar files that need to
>>> exist
>>> >> > in
>>> >> > the same directory on each executor node?
>>> >> >
>>> >> > cheers,
>>> >> >
>>> >> >
>>> >> >
>>> >> > Dr Mich Talebzadeh
>>> >> >
>>> >> >
>>> >> >
>>> >> > LinkedIn
>>> >> >
>>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >> >
>>> >> >
>>> >> >
>>> >> > http://talebzadehmich.wordpress.com
>>> >> >
>>> >> >
>>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for
>>> any
>>> >> > loss, damage or destruction of data or any other property which may
>>> >> > arise
>>> >> > from relying on this email's technical content is explicitly
>>> disclaimed.
>>> >> > The
>>> >> > author will in no case be liable for any monetary damages arising
>>> from
>>> >> > such
>>> >> > loss, damage or destruction.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On 17 July 2017 at 18:18, ayan guha <guha.a...@gmail.com> wrote:
>>> >> >>
>>> >> >> Hi Mitch
>>> >> >>
>>> >> >> your jar file can be anywhere in the file system, including hdfs.
>>> >> >>
>>> >> >> If using yarn, preferably use cluster mode in terms of deployment.
>>> >> >>
>>> >> >> Yarn will distribute the jar to each container.
>>> >> >>
>>> >> >> Best
>>> >> >> Ayan
>>> >> >>
>>> >> >> On Tue, 18 Jul 2017 at 2:17 am, Marcelo Vanzin <
>>> van...@cloudera.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Spark distributes your application jar for you.
>>> >> >>>
>>> >> >>> On Mon, Jul 17, 2017 at 8:41 AM, Mich Talebzadeh
>>> >> >>> <mich.talebza...@gmail.com> wrote:
>>> >> >>> > hi guys,
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > an uber/fat jar file has been created to run with spark in CDH
>>> yarc
>>> >> >>> > client
>>> >> >>> > mode.
>>> >> >>> >
>>> >> >>> > As usual job is submitted to the edge node.
>>> >> >>> >
>>> >> >>> > does the jar file has to be placed in the same directory ewith
>>> spark
>>> >> >>> > is
>>> >> >>> > running in the cluster to make it work?
>>> >> >>> >
>>> >> >>> > Also what will happen if say out of 9 nodes running spark, 3
>>> have
>>> >> >>> > not
>>> >> >>> > got
>>> >> >>> > the jar file. will that job fail or it will carry on on the
>>> fremaing
>>> >> >>> > 6
>>> >> >>> > nodes
>>> >> >>> > that have that jar file?
>>> >> >>> >
>>> >> >>> > thanks
>>> >> >>> >
>>> >> >>> > Dr Mich Talebzadeh
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > LinkedIn
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>> d6zP6AcPCCdOABUrV8Pw
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > http://talebzadehmich.wordpress.com
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > Disclaimer: Use it at your own risk. Any and all responsibility
>>> for
>>> >> >>> > any
>>> >> >>> > loss, damage or destruction of data or any other property which
>>> may
>>> >> >>> > arise
>>> >> >>> > from relying on this email's technical content is explicitly
>>> >> >>> > disclaimed. The
>>> >> >>> > author will in no case be liable for any monetary damages
>>> arising
>>> >> >>> > from
>>> >> >>> > such
>>> >> >>> > loss, damage or destruction.
>>> >> >>> >
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Marcelo
>>> >> >>>
>>> >> >>> ------------------------------------------------------------
>>> ---------
>>> >> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> >> >>>
>>> >> >> --
>>> >> >> Best Regards,
>>> >> >> Ayan Guha
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: running spark job with fat jar file

Reply via email to