Re: lost executor due to large shuffle spill memory

Lishu Liu Wed, 06 Apr 2016 09:32:23 -0700

Thanks Michael. I use 5 m3.2xlarge nodes. Should I
increase spark.storage.memoryFraction? Also I'm thinking maybe I should
repartition all_pairs so that each partition will be small enough to be
handled.


On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com> wrote:

> Do you have enough disk space for the spill?  It seems it has lots of
> memory reserved but not enough for the spill. You will need a disk that can
> handle the entire data partition for each host. Compression of the spilled
> data saves about 50% in most if not all cases.
>
> Given the large data set I would consider a 1TB SATA flash drive,
> formatted as EXT4 or XFS  and give it exclusive access as spark.local.dir.
> It will slow things down but it won’t stop.  There are alternatives if you
> want to discuss offline.
>
>
> > On Apr 5, 2016, at 6:37 PM, lllll <lishu...@gmail.com> wrote:
> >
> > I have a task to remap the index to actual uuid in ALS prediction
> results.
> > But it consistently fail due to lost executors. I noticed there's large
> > shuffle spill memory but I don't know how to improve it.
> >
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png>
> >
> > I've tried to reduce the number of executors while assigning each to have
> > bigger memory.
> > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png>
> >
> > But it still doesn't seem big enough. I don't know what to do.
> >
> > Below is my code:
> > user = load_user()
> > product = load_product()
> > user.cache()
> > product.cache()
> > model = load_model(model_path)
> > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x:
> x[1]))
> > all_prediction = model.predictAll(all_pairs)
> > user_reverse = user.map(lambda r: (r[1], r[0]))
> > product_reverse = product.map(lambda r: (r[1], r[0]))
> > user_reversed = all_prediction.map(lambda u: (u[0], (u[1],
> > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1],
> > r[1][0][1])))
> > both_reversed = user_reversed.join(product_reverse).map(lambda r:
> > (r[1][0][0], r[1][1], r[1][0][1]))
> > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1],
> > x[2])).saveAsTextFile(recommendation_path)
> >
> > Both user and products are (uuid, index) tuples.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>

Re: lost executor due to large shuffle spill memory

Reply via email to