Hi, I've developed a POC Spark Streaming application. But it seems to perform better on my development machine than on our cluster. I submit it to yarn on our cloudera cluster.
But my first question is more detailed: In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec. Then when I look at the stages I indeed see a stage with duration 5s. For example: 1678 map at LogonAnalysis.scala:215+details 2015/07/09 09:17:00 5 s 50/50 173.5 KB But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time. What is responsible for the gap between 14 ms, 1s and 5 sec? Details for Stage 1678 * Total task time across all tasks: 0.8 s * Shuffle write: 173.5 KB / 2031 Show additional metrics Summary Metrics for 50 Completed Tasks Metric Min 25th percentile Median 75th percentile Max Duration 14 ms 14 ms 15 ms 15 ms 24 ms GC Time 0 ms 0 ms 0 ms 0 ms 0 ms Shuffle Write Size / Records 2.6 KB / 28 3.1 KB / 35 3.5 KB / 42 3.9 KB / 46 4.4 KB / 53 Aggregated Metrics by Executor Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Shuffle Write Size / Records 2 xxxx:44231 1.0 s 50 0 50 173.5 KB / 2031