Hey all, Thanks for the ping, Matthias. I'm not super familiar with the details of @Yang Wang <danrtsey...@gmail.com>'s operator, to be honest :(. Can you share some of your FlinkApplication specs?
For the `kubectl logs` command, I believe that just reads stdout from the container. Which logging framework are you using in your application and how have you configured it? There's a good guide for configuring the popular ones in the Flink docs[1]. For instance, if you're using the default Log4j 2 framework you should configure a ConsoleAppender[2]. Hope that helps a bit, Austin [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/advanced/logging/ [2]: https://logging.apache.org/log4j/2.x/manual/appenders.html#ConsoleAppender On Tue, May 4, 2021 at 1:59 AM Matthias Pohl <matth...@ververica.com> wrote: > Hi Fuyao, > sorry for not replying earlier. The stop-with-savepoint operation > shouldn't only suspend but terminate the job. Is it that you might have a > larger state that makes creating the savepoint take longer? Even though, > considering that you don't experience this behavior with your 2nd solution, > I'd assume that we could ignore this possibility. > > I'm gonna add Austin to the conversation as he worked with k8s operators > as well already. Maybe, he can also give you more insights on the logging > issue which would enable us to dig deeper into what's going on with > stop-with-savepoint. > > Best, > Matthias > > On Tue, May 4, 2021 at 4:33 AM Fuyao Li <fuyao...@oracle.com> wrote: > >> Hello, >> >> >> >> Update: >> >> I think stopWithSavepoint() only suspend the job. It doesn’t actually >> terminate (./bin/flink cancel) the job. I switched to cancelWithSavepoint() >> and it works here. >> >> >> >> Maybe stopWithSavepoint() should only be used to update the >> configurations like parallelism? For updating the image, this seems to be >> not suitable, please correct me if I am wrong. >> >> >> >> For the log issue, I am still a bit confused. Why it is not available in >> kubectl logs. How should I get access to it? >> >> >> >> Thanks. >> >> Best, >> >> Fuyao >> >> >> >> *From: *Fuyao Li <fuyao...@oracle.com> >> *Date: *Sunday, May 2, 2021 at 00:36 >> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com> >> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in >> Java based flink native k8s operator >> >> Hello, >> >> >> >> I noticed that first trigger a savepoint and then delete the deployment >> might cause the duplicate data issue. That could pose a bad influence to >> the semantic correctness. Please give me some hints on how to make the >> stopWithSavepoint() work correctly with Fabric8io Java k8s client to >> perform this image update operation. Thanks! >> >> >> >> Best, >> >> Fuyao >> >> >> >> >> >> >> >> *From: *Fuyao Li <fuyao...@oracle.com> >> *Date: *Friday, April 30, 2021 at 18:03 >> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com> >> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in >> Java based flink native k8s operator >> >> Hello Community, Yang, >> >> >> >> I have one more question for logging. I also noticed that if I execute >> kubectl logs command to the JM. The pods provisioned by the operator can’t >> print out the internal Flink logs in the kubectl logs. I can only get >> something like the logs below. No actual flink logs is printed here… Where >> can I find the path to the logs? Maybe use a sidecar container to get it >> out? How can I get the logs without checking the Flink WebUI? Also, the sed >> error makes me confused here. In fact, the application is already up and >> running correctly if I access the WebUI through Ingress. >> >> >> >> Reference: >> https://github.com/wangyang0918/flink-native-k8s-operator/issues/4 >> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator/issues/4__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptEJWo5vM$> >> >> >> >> >> >> [root@bastion deploy]# kubectl logs -f flink-demo-594946fd7b-822xk >> >> >> >> sed: couldn't open temporary file /opt/flink/conf/sedh1M3oO: Read-only >> file system >> >> sed: couldn't open temporary file /opt/flink/conf/sed8TqlNR: Read-only >> file system >> >> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: >> Read-only file system >> >> sed: couldn't open temporary file /opt/flink/conf/sedvO2DFU: Read-only >> file system >> >> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: >> Read-only file system >> >> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: >> Read-only file system >> >> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH >> -Xmx3462817376 -Xms3462817376 -XX:MaxMetaspaceSize=268435456 >> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint >> -D jobmanager.memory.off-heap.size=134217728b -D >> jobmanager.memory.jvm-overhead.min=429496736b -D >> jobmanager.memory.jvm-metaspace.size=268435456b -D >> jobmanager.memory.heap.size=3462817376b -D >> jobmanager.memory.jvm-overhead.max=429496736b >> >> ERROR StatusLogger No Log4j 2 configuration file found. Using default >> configuration (logging only errors to the console), or user >> programmatically provided configurations. Set system property >> 'log4j2.debug' to show Log4j 2 internal initialization logging. See >> https://logging.apache.org/log4j/2.x/manual/configuration.html >> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$> >> for instructions on how to configure Log4j 2 >> >> WARNING: An illegal reflective access operation has occurred >> >> WARNING: Illegal reflective access by >> org.apache.flink.api.java.ClosureCleaner >> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to field >> java.util.Properties.serialVersionUID >> >> WARNING: Please consider reporting this to the maintainers of >> org.apache.flink.api.java.ClosureCleaner >> >> WARNING: Use --illegal-access=warn to enable warnings of further illegal >> reflective access operations >> >> WARNING: All illegal access operations will be denied in a future release >> >> >> >> >> >> -------- The logs stops here, flink applications logs doesn’t get printed >> here anymore--------- >> >> >> >> ^C >> >> [root@bastion deploy]# kubectl logs -f flink-demo-taskmanager-1-1 >> >> sed: couldn't open temporary file /opt/flink/conf/sedaNDoNR: Read-only >> file system >> >> sed: couldn't open temporary file /opt/flink/conf/seddze7tQ: Read-only >> file system >> >> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: >> Read-only file system >> >> sed: couldn't open temporary file /opt/flink/conf/sedYveZoT: Read-only >> file system >> >> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: >> Read-only file system >> >> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: >> Read-only file system >> >> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH >> -Xmx697932173 -Xms697932173 -XX:MaxDirectMemorySize=300647712 >> -XX:MaxMetaspaceSize=268435456 >> org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D >> taskmanager.memory.framework.off-heap.size=134217728b -D >> taskmanager.memory.network.max=166429984b -D >> taskmanager.memory.network.min=166429984b -D >> taskmanager.memory.framework.heap.size=134217728b -D >> taskmanager.memory.managed.size=665719939b -D taskmanager.cpu.cores=1.0 -D >> taskmanager.memory.task.heap.size=563714445b -D >> taskmanager.memory.task.off-heap.size=0b --configDir /opt/flink/conf >> -Djobmanager.memory.jvm-overhead.min='429496736b' >> -Dpipeline.classpaths='file:usrlib/quickstart-0.1.jar' >> -Dtaskmanager.resource-id='flink-demo-taskmanager-1-1' >> -Djobmanager.memory.off-heap.size='134217728b' >> -Dexecution.target='embedded' >> -Dweb.tmpdir='/tmp/flink-web-d7691661-fac5-494e-8154-896b4fe30692' >> -Dpipeline.jars='file:/opt/flink/usrlib/quickstart-0.1.jar' >> -Djobmanager.memory.jvm-metaspace.size='268435456b' >> -Djobmanager.memory.heap.size='3462817376b' >> -Djobmanager.memory.jvm-overhead.max='429496736b' >> >> ERROR StatusLogger No Log4j 2 configuration file found. Using default >> configuration (logging only errors to the console), or user >> programmatically provided configurations. Set system property >> 'log4j2.debug' to show Log4j 2 internal initialization logging. See >> https://logging.apache.org/log4j/2.x/manual/configuration.html >> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$> >> for instructions on how to configure Log4j 2 >> >> WARNING: An illegal reflective access operation has occurred >> >> WARNING: Illegal reflective access by >> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil >> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to method >> java.nio.DirectByteBuffer.cleaner() >> >> WARNING: Please consider reporting this to the maintainers of >> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil >> >> WARNING: Use --illegal-access=warn to enable warnings of further illegal >> reflective access operations >> >> WARNING: All illegal access operations will be denied in a future release >> >> Apr 29, 2021 12:58:34 AM oracle.simplefan.impl.FanManager configure >> >> SEVERE: attempt to configure ONS in FanManager failed with >> oracle.ons.NoServersAvailable: Subscription time out >> >> >> >> >> >> -------- The logs stops here, flink applications logs doesn’t get printed >> here anymore--------- >> >> >> >> >> >> Best, >> >> Fuyao >> >> >> >> >> >> *From: *Fuyao Li <fuyao...@oracle.com> >> *Date: *Friday, April 30, 2021 at 16:50 >> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com> >> *Subject: *[External] : StopWithSavepoint() method doesn't work in Java >> based flink native k8s operator >> >> Hello Community, Yang, >> >> >> >> I am trying to extend the flink native Kubernetes operator by adding some >> new features based on the repo [1]. I wrote a method to release the image >> update functionality. [2] I added the >> >> triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig); >> >> >> >> under the existing method. >> >> triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig); >> >> >> >> >> >> I wrote a function to accommodate the image change behavior.[2] >> >> >> >> Solution1: >> >> I want to use stopWithSavepoint() method to complete the task. However, I >> found it will get stuck and never get completed. Even if I use get() for >> the completeableFuture. It will always timeout and throw exceptions. See >> solution 1 logs [3] >> >> >> >> Solution2: >> >> I tried to trigger a savepoint, then delete the deployment in the code >> and then create a new application with new image. This seems to work fine. >> Log link: [4] >> >> >> >> My questions: >> >> 1. Why solution 1 will get stuck? triggerSavepoint() >> CompleteableFuture could work here… Why stopWithSavepoint() will always >> get >> stuck or timeout? Very confused. >> 2. For Fabric8io library, I am still new to it, did I do anything >> wrong in the implementation, maybe I should update the jobStatus? Please >> give me some suggestions. >> 3. For work around solution 2, is there any bad influence I didn’t >> notice? >> >> >> >> >> >> [1] https://github.com/wangyang0918/flink-native-k8s-operator >> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijNSMY0DI$> >> >> [2] https://pastebin.ubuntu.com/p/tQShjmdcJt/ >> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/tQShjmdcJt/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijoiwPw-I$> >> >> [3] https://pastebin.ubuntu.com/p/YHSPpK4W4Z/ >> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/YHSPpK4W4Z/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijmgfSmqs$> >> >> [4] https://pastebin.ubuntu.com/p/3VG7TtXXfh/ >> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/3VG7TtXXfh/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijr_tizPo$> >> >> >> >> Best, >> >> Fuyao >> >