Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Gourav Sengupta Wed, 28 Mar 2018 02:23:30 -0700

Hi Michael,

I think that is what I am trying to show here as the documentation mentions
"NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS
(Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the
cluster manager."


So, in a way I am supporting your statement :)

Regards,
Gourav

On Wed, Mar 28, 2018 at 10:00 AM, Michael Shtelma <mshte...@gmail.com>
wrote:

> Hi,
>
> this property will be used in YARN mode only by the driver.
> Executors will use the properties coming from YARN for storing temporary
> files.
>
>
> Best,
> Michael
>
> On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <
> gourav.sengu...@gmail.com> wrote:
>
>> Hi,
>>
>>
>> As per documentation in: https://spark.apache.org/d
>> ocs/latest/configuration.html
>>
>>
>> spark.local.dir /tmp Directory to use for "scratch" space in Spark,
>> including map output files and RDDs that get stored on disk. This should be
>> on a fast, local disk in your system. It can also be a comma-separated list
>> of multiple directories on different disks. NOTE: In Spark 1.0 and later
>> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
>> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>>
>>
>>
>> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <mshte...@gmail.com>
>> wrote:
>>
>>> Hi Keith,
>>>
>>> Thanks  for the suggestion!
>>> I have solved this already.
>>> The problem was, that the yarn process was not responding to
>>> start/stop commands and has not applied my configuration changes.
>>> I have killed it and restarted my cluster, and after that yarn has
>>> started using yarn.nodemanager.local-dirs parameter defined in
>>> yarn-site.xml.
>>> After this change, -Djava.io.tmpdir for the spark executor was set
>>> correctly,  according to yarn.nodemanager.local-dirs parameter.
>>>
>>> Best,
>>> Michael
>>>
>>>
>>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <keithgchap...@gmail.com>
>>> wrote:
>>> > Hi Michael,
>>> >
>>> > sorry for the late reply. I guess you may have to set it through the
>>> hdfs
>>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir"
>>> which
>>> > defaults to "/tmp/hadoop-${user.name}"
>>> >
>>> > Regards,
>>> > Keith.
>>> >
>>> > http://keith-chapman.com
>>> >
>>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <mshte...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Keith,
>>> >>
>>> >> Thank you for the idea!
>>> >> I have tried it, so now the executor command is looking in the
>>> following
>>> >> way :
>>> >>
>>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>>> >> '-Djava.io.tmpdir=my_prefered_path'
>>> >>
>>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/
>>> appcache/application_1521110306769_0041/container_1521110306
>>> 769_0041_01_000004/tmp
>>> >>
>>> >> JVM is using the second Djava.io.tmpdir parameter and writing
>>> >> everything to the same directory as before.
>>> >>
>>> >> Best,
>>> >> Michael
>>> >> Sincerely,
>>> >> Michael Shtelma
>>> >>
>>> >>
>>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <
>>> keithgchap...@gmail.com>
>>> >> wrote:
>>> >> > Can you try setting spark.executor.extraJavaOptions to have
>>> >> > -Djava.io.tmpdir=someValue
>>> >> >
>>> >> > Regards,
>>> >> > Keith.
>>> >> >
>>> >> > http://keith-chapman.com
>>> >> >
>>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <
>>> mshte...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Keith,
>>> >> >>
>>> >> >> Thank you for your answer!
>>> >> >> I have done this, and it is working for spark driver.
>>> >> >> I would like to make something like this for the executors as
>>> well, so
>>> >> >> that the setting will be used on all the nodes, where I have
>>> executors
>>> >> >> running.
>>> >> >>
>>> >> >> Best,
>>> >> >> Michael
>>> >> >>
>>> >> >>
>>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>>> >> >> <keithgchap...@gmail.com>
>>> >> >> wrote:
>>> >> >> > Hi Michael,
>>> >> >> >
>>> >> >> > You could either set spark.local.dir through spark conf or
>>> >> >> > java.io.tmpdir
>>> >> >> > system property.
>>> >> >> >
>>> >> >> > Regards,
>>> >> >> > Keith.
>>> >> >> >
>>> >> >> > http://keith-chapman.com
>>> >> >> >
>>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <
>>> mshte...@gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi everybody,
>>> >> >> >>
>>> >> >> >> I am running spark job on yarn, and my problem is that the
>>> >> >> >> blockmgr-*
>>> >> >> >> folders are being created under
>>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat
>>> ion_id/*
>>> >> >> >> The size of this folder can grow to a significant size and does
>>> not
>>> >> >> >> really fit into /tmp file system for one job, which makes a real
>>> >> >> >> problem for my installation.
>>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>>> >> >> >> location and expected that the block manager will create the
>>> files
>>> >> >> >> there and not under /tmp, but this is not the case. The files
>>> are
>>> >> >> >> created under /tmp.
>>> >> >> >>
>>> >> >> >> I am wondering if there is a way to make spark not use /tmp at
>>> all
>>> >> >> >> and
>>> >> >> >> configure it to create all the files somewhere else ?
>>> >> >> >>
>>> >> >> >> Any assistance would be greatly appreciated!
>>> >> >> >>
>>> >> >> >> Best,
>>> >> >> >> Michael
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------
>>> ---------
>>> >> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> >> >> >>
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Reply via email to