[jira] [Comment Edited] (FLINK-17823) Resolve the race condition while releasing RemoteInputChannel

Zhijiang (Jira) Tue, 02 Jun 2020 01:04:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113714#comment-17113714
 ]


Zhijiang edited comment on FLINK-17823 at 6/2/20, 8:03 AM:
-----------------------------------------------------------

Merged in release-1.11: 3eb1075ded64da20e6f7a5bc268f455eaf6573eb

Merged in master: 8c7c7267be95cddd7122d2b97e5334f5db4cc37c


was (Author: zjwang):
Merged in release-1.11: 3eb1075ded64da20e6f7a5bc268f455eaf6573eb

Will merge to master later and update the info.

> Resolve the race condition while releasing RemoteInputChannel
> -------------------------------------------------------------
>
>                 Key: FLINK-17823
>                 URL: https://issues.apache.org/jira/browse/FLINK-17823
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Zhijiang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> RemoteInputChannel#releaseAllResources might be called by canceler thread. 
> Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. 
> There probably cause two potential problems:
>  * Task thread might get null buffer after canceler thread already released 
> all the buffers, then it might cause misleading NPE in getNextBuffer.
>  * Task thread and canceler thread might pull the same buffer concurrently, 
> which causes unexpected exception when the same buffer is recycled twice.
> The solution is to properly synchronize the buffer queue in release method to 
> avoid the same buffer pulled by both canceler thread and task thread. And in 
> getNextBuffer method, we add some explicit checks to avoid misleading NPE and 
> hint some valid exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17823) Resolve the race condition while releasing RemoteInputChannel

Reply via email to