Using MEMORY_AND_DISK_SER to persist the input RDD[Rating] seems to work right for me now. I'm testing on a larger dataset and will see how it goes.
On Wed, Jun 11, 2014 at 9:56 AM, Neville Li <[email protected]> wrote: > Does cache eviction affect disk storage level too? I tried cranking up > replication but still seeing this. > > > On Wednesday, June 11, 2014, Shuo Xiang <[email protected]> wrote: > >> Daniel, >> Thanks for the explanation. >> >> >> On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos < >> [email protected]> wrote: >> >>> About more succeeded tasks than total tasks: >>> - This can happen if you have enabled speculative execution. Some >>> partitions can get processed multiple times. >>> - More commonly, the result of the stage may be used in a later >>> calculation, and has to be recalculated. This happens if some of the >>> results were evicted from cache. >>> >>> >>> On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <[email protected]> >>> wrote: >>> >>>> Hi, >>>> Came up with some confusion regarding the information on SparkUI. The >>>> following info is gathered while factorizing a large matrix using ALS: >>>> 1. some stages have more succeeded tasks than total tasks, which are >>>> displayed in the 5th column. >>>> 2. duplicate stages with exactly same stageID (stage 1/3/7) >>>> 3. Clicking into some stages, some executors cannot be addressed. >>>> Does that mean lost of executor or this does not matter? >>>> >>>> Any explanation are appreciated! >>>> >>>> >>>> >>> >>
