Hi Michael, I think that is what I am trying to show here as the documentation mentions "NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager."
So, in a way I am supporting your statement :) Regards, Gourav On Wed, Mar 28, 2018 at 10:00 AM, Michael Shtelma <mshte...@gmail.com> wrote: > Hi, > > this property will be used in YARN mode only by the driver. > Executors will use the properties coming from YARN for storing temporary > files. > > > Best, > Michael > > On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> Hi, >> >> >> As per documentation in: https://spark.apache.org/d >> ocs/latest/configuration.html >> >> >> spark.local.dir /tmp Directory to use for "scratch" space in Spark, >> including map output files and RDDs that get stored on disk. This should be >> on a fast, local disk in your system. It can also be a comma-separated list >> of multiple directories on different disks. NOTE: In Spark 1.0 and later >> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or >> LOCAL_DIRS (YARN) environment variables set by the cluster manager. >> >> Regards, >> Gourav Sengupta >> >> >> >> >> >> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <mshte...@gmail.com> >> wrote: >> >>> Hi Keith, >>> >>> Thanks for the suggestion! >>> I have solved this already. >>> The problem was, that the yarn process was not responding to >>> start/stop commands and has not applied my configuration changes. >>> I have killed it and restarted my cluster, and after that yarn has >>> started using yarn.nodemanager.local-dirs parameter defined in >>> yarn-site.xml. >>> After this change, -Djava.io.tmpdir for the spark executor was set >>> correctly, according to yarn.nodemanager.local-dirs parameter. >>> >>> Best, >>> Michael >>> >>> >>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <keithgchap...@gmail.com> >>> wrote: >>> > Hi Michael, >>> > >>> > sorry for the late reply. I guess you may have to set it through the >>> hdfs >>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir" >>> which >>> > defaults to "/tmp/hadoop-${user.name}" >>> > >>> > Regards, >>> > Keith. >>> > >>> > http://keith-chapman.com >>> > >>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <mshte...@gmail.com> >>> wrote: >>> >> >>> >> Hi Keith, >>> >> >>> >> Thank you for the idea! >>> >> I have tried it, so now the executor command is looking in the >>> following >>> >> way : >>> >> >>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m >>> >> '-Djava.io.tmpdir=my_prefered_path' >>> >> >>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/ >>> appcache/application_1521110306769_0041/container_1521110306 >>> 769_0041_01_000004/tmp >>> >> >>> >> JVM is using the second Djava.io.tmpdir parameter and writing >>> >> everything to the same directory as before. >>> >> >>> >> Best, >>> >> Michael >>> >> Sincerely, >>> >> Michael Shtelma >>> >> >>> >> >>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman < >>> keithgchap...@gmail.com> >>> >> wrote: >>> >> > Can you try setting spark.executor.extraJavaOptions to have >>> >> > -Djava.io.tmpdir=someValue >>> >> > >>> >> > Regards, >>> >> > Keith. >>> >> > >>> >> > http://keith-chapman.com >>> >> > >>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma < >>> mshte...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Hi Keith, >>> >> >> >>> >> >> Thank you for your answer! >>> >> >> I have done this, and it is working for spark driver. >>> >> >> I would like to make something like this for the executors as >>> well, so >>> >> >> that the setting will be used on all the nodes, where I have >>> executors >>> >> >> running. >>> >> >> >>> >> >> Best, >>> >> >> Michael >>> >> >> >>> >> >> >>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman >>> >> >> <keithgchap...@gmail.com> >>> >> >> wrote: >>> >> >> > Hi Michael, >>> >> >> > >>> >> >> > You could either set spark.local.dir through spark conf or >>> >> >> > java.io.tmpdir >>> >> >> > system property. >>> >> >> > >>> >> >> > Regards, >>> >> >> > Keith. >>> >> >> > >>> >> >> > http://keith-chapman.com >>> >> >> > >>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma < >>> mshte...@gmail.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi everybody, >>> >> >> >> >>> >> >> >> I am running spark job on yarn, and my problem is that the >>> >> >> >> blockmgr-* >>> >> >> >> folders are being created under >>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat >>> ion_id/* >>> >> >> >> The size of this folder can grow to a significant size and does >>> not >>> >> >> >> really fit into /tmp file system for one job, which makes a real >>> >> >> >> problem for my installation. >>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and >>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other >>> >> >> >> location and expected that the block manager will create the >>> files >>> >> >> >> there and not under /tmp, but this is not the case. The files >>> are >>> >> >> >> created under /tmp. >>> >> >> >> >>> >> >> >> I am wondering if there is a way to make spark not use /tmp at >>> all >>> >> >> >> and >>> >> >> >> configure it to create all the files somewhere else ? >>> >> >> >> >>> >> >> >> Any assistance would be greatly appreciated! >>> >> >> >> >>> >> >> >> Best, >>> >> >> >> Michael >>> >> >> >> >>> >> >> >> >>> >> >> >> ------------------------------------------------------------ >>> --------- >>> >> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >> >> >> >>> >> >> > >>> >> > >>> >> > >>> > >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> >> >