Data skew ? May your partition key has some special value like null or empty string
On Fri, Aug 14, 2015 at 11:01 AM, randylu <randyl...@gmail.com> wrote: > It is strange that there are always two tasks slower than others, and the > corresponding partitions's data are larger, no matter how many partitions? > > > Executor ID Address Task Time Shuffle Read Size / > Records > 1 slave129.vsvs.com:56691 16 s 1 99.5 MB / 18865432 > *10 slave317.vsvs.com:59281 0 ms 0 413.5 MB / 311001318* > 100 slave290.vsvs.com:60241 19 s 1 110.8 MB / 27075926 > 101 slave323.vsvs.com:36246 14 s 1 126.1 MB / 25052808 > > Task time and records of Executor 10 seems strange, and the cpus on the > node are all 100% busy. > > Anyone meets the same problem, Thanks in advance for any answer! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang