read of 50 G. i don't understand this
>>> behaviour and i think the performance is getting slow with so much
>>> shuffle
>>> read on next tranformation operations.
>>>
>>>
>>>
>
with so much shuffle
read on next tranformation operations.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-tp584p25119.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
: user
Subject: Re: How does shuffle work in spark ?
hi THANKS
i don't understand, if original data on partitions is 3.5 G and by doing
shuffle to that... how it expands to 50 GB... and why then it reads 50 GB for
next operations.. i have original data set 0f 100 GB then my data will explode