How many users and items do you have?

Each iteration will first iterate through users and then items, so each
iteration of ALS actually ends up having 2 flatMap operations. I'd assume
that you have many more users than items (or vice versa), which is why one
of the operations generates more data.


On Wed, Jun 25, 2014 at 11:39 AM, Lizhengbing (bing, BIPA) <
zhengbing...@huawei.com> wrote:

>
>
> Sometimes, shuffle write of flatMap is 14.8G and sometimes  is  647.9M
>
> Why does this happen?
>
> The size of training data is about 1.5G. and the feature number is 200
>
>
>
> *Stage Id*
>
> *Description*
>
> *Submitted*
>
> *Duration*
>
> *Tasks: Succeeded/Total*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> 114
>
> flatMap at ALS.scala:434
>
> 2014/06/25 17:13:39
>
> 6.3 min
>
> 48/48
>
> 611.7 MB
>
> 14.8 GB
>
> 115
>
> groupByKey at ALS.scala:442
>
> 2014/06/25 17:13:34
>
> 4 s
>
> 48/48
>
> 337.5 MB
>
> 1275.9 MB
>
> 116
>
> flatMap at ALS.scala:434
>
> 2014/06/25 17:09:02
>
> 4.5 min
>
> 48/48
>
> 12.2 GB
>
> 674.9 MB
>
> 117
>
> groupByKey at ALS.scala:442
>
> 2014/06/25 17:07:05
>
> 2.0 min
>
> 48/48
>
> 7.4 GB
>
> 25.5 GB
>
> 118
>
> flatMap at ALS.scala:434
>
> 2014/06/25 17:00:41
>
> 6.4 min
>
> 48/48
>
> 664.2 MB
>
> 14.8 GB
>
> 119
>
> groupByKey at ALS.scala:442
>
> 2014/06/25 17:00:30
>
> 10 s
>
> 48/48
>
> 337.4 MB
>
> 1275.9 MB
>
> 120
>
> flatMap at ALS.scala:434
>
> 2014/06/25 16:55:19
>
> 5.2 min
>
> 48/48
>
> 12.2 GB
>
> 674.9 MB
>
> 121
>
> groupByKey at ALS.scala:442
>
> 2014/06/25 16:54:02
>
> 1.3 min
>
> 48/48
>
> 7.4 GB
>
> 25.5 GB
>
> 122
>
> flatMap at ALS.scala:434
>
> 2014/06/25 16:53:52
>
> 9 s
>
> 48/48
>
> 14.8 GB
>
> 123
>
> mapPartitionsWithIndex at ALS.scala:200
> <http://10.71.123.101:4040/stages/stage?id=123>
>
> 2014/06/25 16:53:40
>
> 12 s
>
> 48/48
>
> 399.5 MB
>
> 737.4 MB
>
> 6
>
> map at ALS.scala:183 <http://10.71.123.101:4040/stages/stage?id=6>
>
> 2014/06/25 16:53:01
>
> 39 s
>
> 20/20
>
> 799.4 MB
>
> 3
>
> map at ALS.scala:186 <http://10.71.123.101:4040/stages/stage?id=3>
>
> 2014/06/25 16:53:01
>
> 39 s
>
> 20/20
>
> 652.2 MB
>
>
>

Reply via email to