Hi Weihua,

> After dumping the memory and analyzing it, I found:
> Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
> Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
> This is not consistent with my understanding of the Flink network 
> transmission mechanism.

It probably is consistent. Downstream receiver unannounced all of the credits, 
and it’s simply waiting for the data to arrive, while upstream sender is 
waiting for the data to be sent down the stream.

Stack trace you posted confirms that the sink you posted has empty input buffer 
- it’s waiting for input data. Assuming rescale partitoning works as expected 
and indeed node 242 is connected to node 121, it implies the bottleneck is your 
data exchange between those two tasks. It could be

- network bottleneck (slow network? Packet losses?)
- machine swapping/long GC pauses (If upstream node is experiencing long pauses 
it might show up like this)
- cpu bottleneck in the network stack (frequent flushing? SSL?)
- some resource competition (too high parallelism for given number of machines)
- netty threads are not keeping up

It’s hard to say what’s the problem without looking at the resource usage 
(CPU/Network/Memory/Disk IO), GC logs, code profiling results.

Piotrek

PS Zhijiang:

RescalePartitioner in this case should be connect just two upstream subtasks 
with one downstream sink. Upstream subtasks N and N+1 should be connected to 
sink with N/2 id.

> On 25 May 2020, at 04:39, Weihua Hu <huweihua....@gmail.com> wrote:
> 
> Hi, Zhijiang
> 
> I understand the normal credit-based backpressure mechanism. as usual the 
> Sink inPoolUsage will be full, and the task stack will also have some 
> information. 
> but this time is not the same. The Sink inPoolUsage is 0. 
> I also checked the stack. The Map is waiting 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment
> The Sink is waiting data to deal, this is not very in line with expectations.
> 
> 
> <粘贴的图形-2.tiff>
> 
> <粘贴的图形-1.tiff>
> 
> 
> 
> Best
> Weihua Hu
> 
>> 2020年5月24日 21:57,Zhijiang <wangzhijiang...@aliyun.com 
>> <mailto:wangzhijiang...@aliyun.com>> 写道:
>> 
>> Hi Weihua,
>> 
>> From your below info, it is with the expectation in credit-based flow 
>> control. 
>> 
>> I guess one of the sink parallelism causes the backpressure, so you will see 
>> that there are no available credits on Sink side and
>> the outPoolUsage of Map is almost 100%. It really reflects the credit-based 
>> states in the case of backpressure.
>> 
>> If you want to analyze the root cause of backpressure, you can trace the 
>> task stack of respective Sink parallelism to find which operation costs much,
>> then you can increase the parallelism or improve the UDF(if have bottleneck) 
>> to have a try. In addition, i am not sure why you choose rescale to shuffle 
>> data among operators. The default
>> forward mode can gain really good performance by default if you adjusting 
>> the same parallelism among them.
>> 
>> Best,
>> Zhijiang
>> ------------------------------------------------------------------
>> From:Weihua Hu <huweihua....@gmail.com <mailto:huweihua....@gmail.com>>
>> Send Time:2020年5月24日(星期日) 18:32
>> To:user <user@flink.apache.org <mailto:user@flink.apache.org>>
>> Subject:Singal task backpressure problem with Credit-based Flow Control
>> 
>> Hi, all
>> 
>> I ran into a weird single Task BackPressure problem.
>> 
>> JobInfo:
>>     DAG: Source (1000)-> Map (2000)-> Sink (1000), which is linked via 
>> rescale. 
>>     Flink version: 1.9.0
>>     
>> There is no related info in jobmanager/taskamanger log.
>> 
>> Through Metrics, I see that Map (242) 's outPoolUsage is full, but its 
>> downstream Sink (121)' s inPoolUsage is 0.
>> 
>> After dumping the memory and analyzing it, I found:
>> Sink (121)'s RemoteInputChannel.unannouncedCredit = 0,
>> Map (242)'s CreditBasedSequenceNumberingViewReader.numCreditsAvailable = 0.
>> This is not consistent with my understanding of the Flink network 
>> transmission mechanism.
>> 
>> Can someone help me? Thanks a lot.
>> 
>> 
>> Best
>> Weihua Hu
>> 
>> 
> 

Reply via email to