Re: tez shuffle fetch phase has long pause

Kuhu Shukla Sun, 31 May 2020 20:44:58 -0700

Thank you for reporting this.

You might want to check the logs and other metrics of the node of the
upstream task to see if it is healthy or not. The download rate seems quite
low to me and depending on the fetch size , we can verify if this is
because of poor network bandwidth and/or bad node health.


Let us know what you think and find out.
Regards,
Kuhu

On Sun, May 31, 2020, 22:36 娄东风 <[email protected]> wrote:

> Hi
>   I'm using Tez 0.9.1 and Hive 2.3.3. Running TPC-DS query15 with 1TB.
> In Reduce5, i see long pause during fetch occasionally so it's hard to
> make a Jstack.
> Reduce5 depends on Map4 and Reduce2,these two vertexes finished before
> 16:33:00.
> So the fetch task should not be in wait state for upstream vertexes.
> How do i find out what makes this long pause?
>   thanks.
>
> 2020-05-30 16:36:29,531 [INFO] [Fetcher_B {Map_4} #12] 
> |ShuffleManager.fetch|: Completed fetch for attempt: {5, 0, 
> attempt_1590728138875_0202_1_01_000005_0_10003} to MEMORY, csize=24509, 
> dsize=66456, EndTime=1590827789531, TimeTaken=1, Rate=23.37 MB/s
> 2020-05-30 16:37:41,368 [INFO] [Fetcher_B {Map_4} #2] |HttpConnection.url|: 
> for 
> url=http://node-ana-coreLKpD0001:13562/mapOutput?job=job_1590728138875_0202&dag=1&reduce=245&map=attempt_1590728138875_0202_1_01_000027_0_10002,attempt_1590728138875_0202_1_01_000023_0_10003,attempt_1590728138875_0202_1_01_000030_0_10002,attempt_1590728138875_0202_1_01_000025_0_10003,attempt_1590728138875_0202_1_01_000029_0_10003
>  sent hash and receievd reply 0 ms
>
>
>

Re: tez shuffle fetch phase has long pause

Reply via email to