Thank you for reporting this. You might want to check the logs and other metrics of the node of the upstream task to see if it is healthy or not. The download rate seems quite low to me and depending on the fetch size , we can verify if this is because of poor network bandwidth and/or bad node health.
Let us know what you think and find out. Regards, Kuhu On Sun, May 31, 2020, 22:36 娄东风 <[email protected]> wrote: > Hi > I'm using Tez 0.9.1 and Hive 2.3.3. Running TPC-DS query15 with 1TB. > In Reduce5, i see long pause during fetch occasionally so it's hard to > make a Jstack. > Reduce5 depends on Map4 and Reduce2,these two vertexes finished before > 16:33:00. > So the fetch task should not be in wait state for upstream vertexes. > How do i find out what makes this long pause? > thanks. > > 2020-05-30 16:36:29,531 [INFO] [Fetcher_B {Map_4} #12] > |ShuffleManager.fetch|: Completed fetch for attempt: {5, 0, > attempt_1590728138875_0202_1_01_000005_0_10003} to MEMORY, csize=24509, > dsize=66456, EndTime=1590827789531, TimeTaken=1, Rate=23.37 MB/s > 2020-05-30 16:37:41,368 [INFO] [Fetcher_B {Map_4} #2] |HttpConnection.url|: > for > url=http://node-ana-coreLKpD0001:13562/mapOutput?job=job_1590728138875_0202&dag=1&reduce=245&map=attempt_1590728138875_0202_1_01_000027_0_10002,attempt_1590728138875_0202_1_01_000023_0_10003,attempt_1590728138875_0202_1_01_000030_0_10002,attempt_1590728138875_0202_1_01_000025_0_10003,attempt_1590728138875_0202_1_01_000029_0_10003 > sent hash and receievd reply 0 ms > > >
