Using MEMORY_AND_DISK_SER to persist the input RDD[Rating] seems to work
right for me now. I'm testing on a larger dataset and will see how it goes.


On Wed, Jun 11, 2014 at 9:56 AM, Neville Li <[email protected]> wrote:

> Does cache eviction affect disk storage level too? I tried cranking up
> replication but still seeing this.
>
>
> On Wednesday, June 11, 2014, Shuo Xiang <[email protected]> wrote:
>
>> Daniel,
>>   Thanks for the explanation.
>>
>>
>> On Wed, Jun 11, 2014 at 8:57 AM, Daniel Darabos <
>> [email protected]> wrote:
>>
>>> About more succeeded tasks than total tasks:
>>>  - This can happen if you have enabled speculative execution. Some
>>> partitions can get processed multiple times.
>>>  - More commonly, the result of the stage may be used in a later
>>> calculation, and has to be recalculated. This happens if some of the
>>> results were evicted from cache.
>>>
>>>
>>> On Wed, Jun 11, 2014 at 2:23 AM, Shuo Xiang <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>   Came up with some confusion regarding the information on SparkUI. The
>>>> following info is gathered while factorizing a large matrix using ALS:
>>>>   1. some stages have more succeeded tasks than total tasks, which are
>>>> displayed in the 5th column.
>>>>   2. duplicate stages with exactly same stageID (stage 1/3/7)
>>>>   3. Clicking into some stages, some executors cannot be addressed.
>>>> Does that mean lost of executor or this does not matter?
>>>>
>>>>   Any explanation are appreciated!
>>>>
>>>>
>>>>
>>>
>>

Reply via email to