Re: Cache Shuffle Based Operation Before Sort

2016-05-08 Thread Ali Tootoonchian
Thanks for your comment. Which image or chart are you pointing? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Cache-Shuffle-Based-Operation-Before-Sort-tp17331p17438.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Cache Shuffle Based Operation Before Sort

2016-04-25 Thread Ali Tootoonchian
Caching shuffle RDD before the sort process improves system performance. SQL planner can be intelligent to cache join, aggregate or sort data frame before executing next sort process. For any sort process two job is created by spark, first one is responsible for producing range boundary for shuffl

Re: Improving system design logging in spark

2016-04-21 Thread Ali Tootoonchian
Hi, My point for #2 is distinguishing between how long does it take for each task to read a data from disk and transfer it through network to targeted node. As I know (correct me if I'm wrong) block time to fetch data includes both reading a data by remote node and transferring it to requested nod