Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu
t;>>>> >>>>>> then point spark.local.dir to the ramdisk, which depends on your >>>>>> deployment strategy, for me it was through SparkConf object before >>>>>> passing >>>>>> it to SparkContext: >>>>

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
>>>> conf.set("spark.local.dir","/mnt/spark") >>>>> >>>>> To validate that spark is actually using your ramdisk (by default it >>>>> uses /tmp), ls the ramdisk after running some jobs and you should see >>&g

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu
ter running some jobs and you should see spark >>>> directories (with date on directory name) on your ramdisk >>>> >>>> >>>> Sent using Zoho Mail <https://www.zoho.com/mail/> >>>> >>>> >>>> On Wed, 17 Oct

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Rudenko
>> >>> >>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair >>> >* wrote >>> >>> What are the steps to configure this? Thanks >>> >>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester < >>> onmstes...@zoho.com.invalid> wrote: >>> >>> >>> Hi, >>> I failed to config spark for in-memory shuffle so currently just >>> using linux memory mapped directory (tmpfs) as working directory of spark, >>> so everything is fast >>> >>> Sent using Zoho Mail <https://www.zoho.com/mail/> >>> >>> >>> >>>

Re: Spark In Memory Shuffle / 5403

2018-10-18 Thread Peter Liu
mdisk >> >> >> Sent using Zoho Mail <https://www.zoho.com/mail/> >> >> >> On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair >> >* wrote >> >> What are the steps to configure this? Thanks >> >> On Wed, Oct 17, 2018

Re: Spark In Memory Shuffle

2018-10-18 Thread ☼ R Nair
018, 9:39 AM onmstester onmstester < > onmstes...@zoho.com.invalid> wrote: > > > Hi, > I failed to config spark for in-memory shuffle so currently just > using linux memory mapped directory (tmpfs) as working directory of spark, > so everything is fast > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > > >

Re: Spark In Memory Shuffle

2018-10-18 Thread onmstester onmstester
e steps to configure this? Thanks On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester wrote: Hi, I failed to config spark for in-memory shuffle so currently just using linux memory mapped directory (tmpfs) as working directory of spark, so everything is fast Sent using Zoho Mail

Re: Spark In Memory Shuffle

2018-10-17 Thread ☼ R Nair
What are the steps to configure this? Thanks On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester wrote: > Hi, > I failed to config spark for in-memory shuffle so currently just > using linux memory mapped directory (tmpfs) as working directory of spark, > so everything is fast &g

Re: Spark In Memory Shuffle

2018-10-17 Thread Gourav Sengupta
super duper, I also need to try this out. On Wed, Oct 17, 2018 at 2:39 PM onmstester onmstester wrote: > Hi, > I failed to config spark for in-memory shuffle so currently just > using linux memory mapped directory (tmpfs) as working directory of spark, > so everything is fast &g

Re: Spark In Memory Shuffle

2018-10-17 Thread onmstester onmstester
Hi, I failed to config spark for in-memory shuffle so currently just using  linux memory mapped directory (tmpfs) as working directory of spark, so everything is fast Sent using Zoho Mail On Wed, 17 Oct 2018 16:41:32 +0330  thomas lavocat wrote Hi everyone, The possibility to have

Spark In Memory Shuffle

2018-10-17 Thread thomas lavocat
Hi everyone, The possibility to have in memory shuffling is discussed in this issue https://github.com/apache/spark/pull/5403. It was in 2015. In 2016 the paper "Scaling Spark on HPC Systems" says that Spark still shuffle using disks. I would like to know : What is the current state of