Sometimes, shuffle write of flatMap is 14.8G and sometimes is 647.9M Why does this happen? The size of training data is about 1.5G. and the feature number is 200
Stage Id Description Submitted Duration Tasks: Succeeded/Total Shuffle Read Shuffle Write 114 flatMap at ALS.scala:434 2014/06/25 17:13:39 6.3 min 48/48 611.7 MB 14.8 GB 115 groupByKey at ALS.scala:442 2014/06/25 17:13:34 4 s 48/48 337.5 MB 1275.9 MB 116 flatMap at ALS.scala:434 2014/06/25 17:09:02 4.5 min 48/48 12.2 GB 674.9 MB 117 groupByKey at ALS.scala:442 2014/06/25 17:07:05 2.0 min 48/48 7.4 GB 25.5 GB 118 flatMap at ALS.scala:434 2014/06/25 17:00:41 6.4 min 48/48 664.2 MB 14.8 GB 119 groupByKey at ALS.scala:442 2014/06/25 17:00:30 10 s 48/48 337.4 MB 1275.9 MB 120 flatMap at ALS.scala:434 2014/06/25 16:55:19 5.2 min 48/48 12.2 GB 674.9 MB 121 groupByKey at ALS.scala:442 2014/06/25 16:54:02 1.3 min 48/48 7.4 GB 25.5 GB 122 flatMap at ALS.scala:434 2014/06/25 16:53:52 9 s 48/48 14.8 GB 123 mapPartitionsWithIndex at ALS.scala:200<http://10.71.123.101:4040/stages/stage?id=123> 2014/06/25 16:53:40 12 s 48/48 399.5 MB 737.4 MB 6 map at ALS.scala:183<http://10.71.123.101:4040/stages/stage?id=6> 2014/06/25 16:53:01 39 s 20/20 799.4 MB 3 map at ALS.scala:186<http://10.71.123.101:4040/stages/stage?id=3> 2014/06/25 16:53:01 39 s 20/20 652.2 MB