It sounds like you have huge data skew?

On Thu, Jul 9, 2020 at 4:15 PM Bobby Evans <bo...@apache.org> wrote:
>
> Sadly there isn't a lot you can do to fix this.  All of the operations take 
> iterators of rows as input and produce iterators of rows as output.  For 
> efficiency reasons, the timing is not done for each individual row. If we did 
> that in many cases it would take longer to measure how long something took 
> then it would to just do the operation. So most operators actually end up 
> measuring the lifetime of the operator which often is the time of the entire 
> task minus how long it took for the first task to get to that operator. This 
> is also true of WholeStageCodeGen.
>
> On Thu, Jul 9, 2020 at 11:55 AM Michal Sankot 
> <michal.san...@spreaker.com.invalid> wrote:
>>
>> Hi,
>> I'm checking execution of SQL queries in Spark UI, trying to find a
>> bottleneck and values that are displayed in WholeStageCodegen blocks are
>> confusing.
>>
>> In attached example whole query took 6.6 minutes and upper left
>> WholeStageCodegen block says that median value is 7.8 minutes and
>> maximum 7.27h :O
>>
>> What does it mean? Do those number have any real meaning? Is there a way
>> to find out how long individual blocks really took?
>>
>> Thanks,
>> Michal
>> Spark 2.4.4 on AWS EMR
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to