Robert Metzger created FLINK-25316:
--------------------------------------

             Summary: BlobServer can get stuck during shutdown
                 Key: FLINK-25316
                 URL: https://issues.apache.org/jira/browse/FLINK-25316
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.15.0
            Reporter: Robert Metzger
             Fix For: 1.15.0


The cluster shutdown can get stuck
{code}
"AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon 
prio=5 os_prio=0 tid=0x0000004017d70000 nid=0x2ec in Object.wait() 
[0x000000402a9b5000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000d6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
        at java.lang.Thread.join(Thread.java:1252)
        - locked <0x00000000d6c48368> (a 
org.apache.flink.runtime.blob.BlobServer)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
        at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
        - locked <0x00000000d5d27350> (a java.lang.Object)
        at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
{code}

because the BlobServer.run() method ignores interrupts:
{code}
"BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 
tid=0x000000401c929800 nid=0x2b4 runnable [0x00000040263f9000]
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
        at java.net.ServerSocket.implAccept(ServerSocket.java:560)
        at java.net.ServerSocket.accept(ServerSocket.java:528)
        at 
org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
        at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
{code}

This issue was introduced in FLINK-24156.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to