Hi,
This has already been briefly discussed here in the past but there seems to be 
more questions...
I am running bigger ALS task with input data ~40GB (~3 billions of ratings). 
The data is partitioned into 512 partitions and I am also using default 
parallelism set to 512. The ALS runs with rank=100, iters=15. Using spark 1.2.0.
The issue is the volume of temporal data stored on disks generated during the 
processing. You can see the effect here: http://picpaste.com/disk-UKGFOlte.png 
It stores 12TB!!! of data until it reaches the 90% threshold when yarn kills it.
I have checkpoint directory set so allegedly it should be clearing the temp 
data but not sure that's happening (although there is 1 drop seen).
Is there any solution for this? 12TB of temp not getting cleaned seems to be 
wrong.
Thanks,Antony. 

Reply via email to