Re: lost executor due to large shuffle spill memory

Michael Slavitch Wed, 06 Apr 2016 09:35:55 -0700

Shuffle will always spill the local dataset to disk.  Changing memory settings 
does nothing to alter this,  so you need to set spark.local.dir appropriately 
to a fast disk.



> On Apr 6, 2016, at 12:32 PM, Lishu Liu <lishu...@gmail.com> wrote:
> 
> Thanks Michael. I use 5 m3.2xlarge nodes. Should I increase 
> spark.storage.memoryFraction? Also I'm thinking maybe I should repartition 
> all_pairs so that each partition will be small enough to be handled. 
> 
> On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com 
> <mailto:slavi...@gmail.com>> wrote:
> Do you have enough disk space for the spill?  It seems it has lots of memory 
> reserved but not enough for the spill. You will need a disk that can handle 
> the entire data partition for each host. Compression of the spilled data 
> saves about 50% in most if not all cases.
> 
> Given the large data set I would consider a 1TB SATA flash drive, formatted 
> as EXT4 or XFS  and give it exclusive access as spark.local.dir.  It will 
> slow things down but it won’t stop.  There are alternatives if you want to 
> discuss offline.
> 
> 
> > On Apr 5, 2016, at 6:37 PM, lllll <lishu...@gmail.com 
> > <mailto:lishu...@gmail.com>> wrote:
> >
> > I have a task to remap the index to actual uuid in ALS prediction results.
> > But it consistently fail due to lost executors. I noticed there's large
> > shuffle spill memory but I don't know how to improve it.
> >
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png>>
> >
> > I've tried to reduce the number of executors while assigning each to have
> > bigger memory.
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png 
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png>>
> >
> > But it still doesn't seem big enough. I don't know what to do.
> >
> > Below is my code:
> > user = load_user()
> > product = load_product()
> > user.cache()
> > product.cache()
> > model = load_model(model_path)
> > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: x[1]))
> > all_prediction = model.predictAll(all_pairs)
> > user_reverse = user.map(lambda r: (r[1], r[0]))
> > product_reverse = product.map(lambda r: (r[1], r[0]))
> > user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> > r[1][0][1])))
> > both_reversed = user_reversed.join(product_reverse).map(lambda r:
> > (r[1][0][0], r[1][1], r[1][0][1]))
> > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> > x[2])).saveAsTextFile(recommendation_path)
> >
> > Both user and products are (uuid, index) tuples.
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> >  
> > <http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html>
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > <mailto:user-h...@spark.apache.org>
> >
> 
>

Re: lost executor due to large shuffle spill memory

Reply via email to