It sounds like you have huge data skew? On Thu, Jul 9, 2020 at 4:15 PM Bobby Evans <bo...@apache.org> wrote: > > Sadly there isn't a lot you can do to fix this. All of the operations take > iterators of rows as input and produce iterators of rows as output. For > efficiency reasons, the timing is not done for each individual row. If we did > that in many cases it would take longer to measure how long something took > then it would to just do the operation. So most operators actually end up > measuring the lifetime of the operator which often is the time of the entire > task minus how long it took for the first task to get to that operator. This > is also true of WholeStageCodeGen. > > On Thu, Jul 9, 2020 at 11:55 AM Michal Sankot > <michal.san...@spreaker.com.invalid> wrote: >> >> Hi, >> I'm checking execution of SQL queries in Spark UI, trying to find a >> bottleneck and values that are displayed in WholeStageCodegen blocks are >> confusing. >> >> In attached example whole query took 6.6 minutes and upper left >> WholeStageCodegen block says that median value is 7.8 minutes and >> maximum 7.27h :O >> >> What does it mean? Do those number have any real meaning? Is there a way >> to find out how long individual blocks really took? >> >> Thanks, >> Michal >> Spark 2.4.4 on AWS EMR >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org