sorry i thought i gave an explanation
The issue you are encountering with incorrect record numbers in the
"ShuffleWrite Size/Records" column in the Spark DAG UI when data is read
from cache/persist is a known limitation. This discrepancy arises due to
the way Spark handles and reports shuffle
Just to further clarify that the Shuffle Write Size/Records column in
the Spark UI can be misleading when working with cached/persisted data
because it reflects the shuffled data size and record count, not the
entire cached/persisted data., So it is fair to say that this is a
limitation of the
Yep, the Spark UI's Shuffle Write Size/Records" column can sometimes show
incorrect record counts *when data is retrieved from cache or persisted
data*. This happens because the record count reflects the number of records
written to disk for shuffling, and not the actual number of records in the
Can anyone please assist me ?
On Fri, May 24, 2024 at 12:29 AM Prem Sahoo wrote:
> Does anyone have a clue ?
>
> On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote:
>
>> Hello Team,
>> in spark DAG UI , we have Stages tab. Once you click on each stage you
>> can view the tasks.
>>
>> In each