Hi Barisa,
thanks for sharing this. I'm gonna add Till to this thread. He might have
some insights.

Best,
Matthias

On Wed, Feb 10, 2021 at 4:19 PM Barisa Obradovic <bbaj...@gmail.com> wrote:

> I'm trying to understand if behaviour of the flink jobmanager during
> zookeeper upgrade is expected or not.
>
> I'm running flink 1.11.2 in kubernetes, with zookeeper server 3.5.4-beta.
> While I'm doing zookeeper upgrade, there is a 20 seconds zookeeper
> downtime.
> I'd expect to either flink job to restart or few warnings in the logs
> during
> those 20 seconds. Instead, I see whole flink JVM crash ( and later the pod
> restart).
>
> I expected for flink to internally retry zookeeper requests, so I'm
> surprised it crashes. Is this expected, or is it a bug?
>
> From the logs
>
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
> [09-Feb-2021 11:30:00.197 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
> [09-Feb-2021 11:30:00.197 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Socket connection established to zdzk.servicexxx/192.168.190.92:2181,
> initiating session
> [09-Feb-2021 11:30:00.198 UTC] WARN
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181,
> unexpected error, closing socket connection and attempting reconnect
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_192]
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> ~[?:1.8.0_192]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
> [09-Feb-2021 11:30:02.294 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
> [09-Feb-2021 11:30:02.295 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Socket connection established to zdzk.servicexxx/192.168.190.92:2181,
> initiating session
> [09-Feb-2021 11:30:02.295 UTC] WARN
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181,
> unexpected error, closing socket connection and attempting reconnect
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_192]
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> ~[?:1.8.0_192]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
>     at
>
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
> [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]
> [09-Feb-2021 11:30:03.841 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Opening socket connection to server zdzk.servicexxx/192.168.190.92:2181
> [09-Feb-2021 11:30:03.842 UTC] INFO
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Socket connection established to zdzk.servicexxx/192.168.190.92:2181,
> initiating session
> [09-Feb-2021 11:30:03.842 UTC] WARN
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] -
> Session 0x3012b0057140004 for server zdzk.servicexxx/192.168.190.92:2181,
> unexpected error, closing socket connection and attempting reconnect
> java.io.IOException: Connection reset by peer
>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_192]
>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> ~[?:1.8.0_192]
>     at sun.nio.ch.IOUtil.rea
>
>
>
> FYI: I've asked same question on stackoverflow:
>
> https://stackoverflow.com/questions/66120905/should-flink-job-manager-crash-during-zookeeper-upgrade
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to