Thanks for your comment.
Which image or chart are you pointing?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Cache-Shuffle-Based-Operation-Before-Sort-tp17331p17438.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Caching shuffle RDD before the sort process improves system performance. SQL
planner can be intelligent to cache join, aggregate or sort data frame
before executing next sort process.
For any sort process two job is created by spark, first one is responsible
for producing range boundary for shuffl
Hi,
My point for #2 is distinguishing between how long does it take for each
task to read a data from disk and transfer it through network to targeted
node. As I know (correct me if I'm wrong) block time to fetch data includes
both reading a data by remote node and transferring it to requested nod