Thanks Michael. I use 5 m3.2xlarge nodes. Should I increase spark.storage.memoryFraction? Also I'm thinking maybe I should repartition all_pairs so that each partition will be small enough to be handled.
On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com> wrote: > Do you have enough disk space for the spill? It seems it has lots of > memory reserved but not enough for the spill. You will need a disk that can > handle the entire data partition for each host. Compression of the spilled > data saves about 50% in most if not all cases. > > Given the large data set I would consider a 1TB SATA flash drive, > formatted as EXT4 or XFS and give it exclusive access as spark.local.dir. > It will slow things down but it won’t stop. There are alternatives if you > want to discuss offline. > > > > On Apr 5, 2016, at 6:37 PM, lllll <lishu...@gmail.com> wrote: > > > > I have a task to remap the index to actual uuid in ALS prediction > results. > > But it consistently fail due to lost executors. I noticed there's large > > shuffle spill memory but I don't know how to improve it. > > > > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/24.png> > > > > I've tried to reduce the number of executors while assigning each to have > > bigger memory. > > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26683/31.png> > > > > But it still doesn't seem big enough. I don't know what to do. > > > > Below is my code: > > user = load_user() > > product = load_product() > > user.cache() > > product.cache() > > model = load_model(model_path) > > all_pairs = user.map(lambda x: x[1]).cartesian(product.map(lambda x: > x[1])) > > all_prediction = model.predictAll(all_pairs) > > user_reverse = user.map(lambda r: (r[1], r[0])) > > product_reverse = product.map(lambda r: (r[1], r[0])) > > user_reversed = all_prediction.map(lambda u: (u[0], (u[1], > > u[2]))).join(user_reverse).map(lambda r: (r[1][0][0], (r[1][1], > > r[1][0][1]))) > > both_reversed = user_reversed.join(product_reverse).map(lambda r: > > (r[1][0][0], r[1][1], r[1][0][1])) > > both_reversed.map(lambda x: '{}|{}|{}'.format(x[0], x[1], > > x[2])).saveAsTextFile(recommendation_path) > > > > Both user and products are (uuid, index) tuples. > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/lost-executor-due-to-large-shuffle-spill-memory-tp26683.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > >