Hi Fuyao,
sorry for not replying earlier. The stop-with-savepoint operation shouldn't
only suspend but terminate the job. Is it that you might have a larger
state that makes creating the savepoint take longer? Even though,
considering that you don't experience this behavior with your 2nd solution,
I'd assume that we could ignore this possibility.

I'm gonna add Austin to the conversation as he worked with k8s operators as
well already. Maybe, he can also give you more insights on the logging
issue which would enable us to dig deeper into what's going on with
stop-with-savepoint.

Best,
Matthias

On Tue, May 4, 2021 at 4:33 AM Fuyao Li <fuyao...@oracle.com> wrote:

> Hello,
>
>
>
> Update:
>
> I think stopWithSavepoint() only suspend the job. It doesn’t actually
> terminate (./bin/flink cancel) the job. I switched to cancelWithSavepoint()
> and it works here.
>
>
>
> Maybe stopWithSavepoint() should only be used to update the configurations
> like parallelism? For updating the image, this seems to be not suitable,
> please correct me if I am wrong.
>
>
>
> For the log issue, I am still a bit confused. Why it is not available in
> kubectl logs. How should I get access to it?
>
>
>
> Thanks.
>
> Best,
>
> Fuyao
>
>
>
> *From: *Fuyao Li <fuyao...@oracle.com>
> *Date: *Sunday, May 2, 2021 at 00:36
> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in
> Java based flink native k8s operator
>
> Hello,
>
>
>
> I noticed that first trigger a savepoint and then delete the deployment
> might cause the duplicate data issue. That could pose a bad influence to
> the semantic correctness. Please give me some hints on how to make the
> stopWithSavepoint() work correctly with Fabric8io Java k8s client to
> perform this image update operation. Thanks!
>
>
>
> Best,
>
> Fuyao
>
>
>
>
>
>
>
> *From: *Fuyao Li <fuyao...@oracle.com>
> *Date: *Friday, April 30, 2021 at 18:03
> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in
> Java based flink native k8s operator
>
> Hello Community, Yang,
>
>
>
> I have one more question for logging. I also noticed that if I execute
> kubectl logs  command to the JM. The pods provisioned by the operator can’t
> print out the internal Flink logs in the kubectl logs. I can only get
> something like the logs below. No actual flink logs is printed here… Where
> can I find the path to the logs? Maybe use a sidecar container to get it
> out? How can I get the logs without checking the Flink WebUI? Also, the sed
> error makes me confused here. In fact, the application is already up and
> running correctly if I access the WebUI through Ingress.
>
>
>
> Reference:
> https://github.com/wangyang0918/flink-native-k8s-operator/issues/4
> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator/issues/4__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptEJWo5vM$>
>
>
>
>
>
> [root@bastion deploy]# kubectl logs -f flink-demo-594946fd7b-822xk
>
>
>
> sed: couldn't open temporary file /opt/flink/conf/sedh1M3oO: Read-only
> file system
>
> sed: couldn't open temporary file /opt/flink/conf/sed8TqlNR: Read-only
> file system
>
> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only
> file system
>
> sed: couldn't open temporary file /opt/flink/conf/sedvO2DFU: Read-only
> file system
>
> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only
> file system
>
> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp:
> Read-only file system
>
> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH
> -Xmx3462817376 -Xms3462817376 -XX:MaxMetaspaceSize=268435456
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
> -D jobmanager.memory.off-heap.size=134217728b -D
> jobmanager.memory.jvm-overhead.min=429496736b -D
> jobmanager.memory.jvm-metaspace.size=268435456b -D
> jobmanager.memory.heap.size=3462817376b -D
> jobmanager.memory.jvm-overhead.max=429496736b
>
> ERROR StatusLogger No Log4j 2 configuration file found. Using default
> configuration (logging only errors to the console), or user
> programmatically provided configurations. Set system property
> 'log4j2.debug' to show Log4j 2 internal initialization logging. See
> https://logging.apache.org/log4j/2.x/manual/configuration.html
> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
> for instructions on how to configure Log4j 2
>
> WARNING: An illegal reflective access operation has occurred
>
> WARNING: Illegal reflective access by
> org.apache.flink.api.java.ClosureCleaner
> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to field
> java.util.Properties.serialVersionUID
>
> WARNING: Please consider reporting this to the maintainers of
> org.apache.flink.api.java.ClosureCleaner
>
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
>
> WARNING: All illegal access operations will be denied in a future release
>
>
>
>
>
> -------- The logs stops here, flink applications logs doesn’t get printed
> here anymore---------
>
>
>
> ^C
>
> [root@bastion deploy]# kubectl logs -f flink-demo-taskmanager-1-1
>
> sed: couldn't open temporary file /opt/flink/conf/sedaNDoNR: Read-only
> file system
>
> sed: couldn't open temporary file /opt/flink/conf/seddze7tQ: Read-only
> file system
>
> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only
> file system
>
> sed: couldn't open temporary file /opt/flink/conf/sedYveZoT: Read-only
> file system
>
> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only
> file system
>
> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp:
> Read-only file system
>
> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH
> -Xmx697932173 -Xms697932173 -XX:MaxDirectMemorySize=300647712
> -XX:MaxMetaspaceSize=268435456
> org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D
> taskmanager.memory.framework.off-heap.size=134217728b -D
> taskmanager.memory.network.max=166429984b -D
> taskmanager.memory.network.min=166429984b -D
> taskmanager.memory.framework.heap.size=134217728b -D
> taskmanager.memory.managed.size=665719939b -D taskmanager.cpu.cores=1.0 -D
> taskmanager.memory.task.heap.size=563714445b -D
> taskmanager.memory.task.off-heap.size=0b --configDir /opt/flink/conf
> -Djobmanager.memory.jvm-overhead.min='429496736b'
> -Dpipeline.classpaths='file:usrlib/quickstart-0.1.jar'
> -Dtaskmanager.resource-id='flink-demo-taskmanager-1-1'
> -Djobmanager.memory.off-heap.size='134217728b'
> -Dexecution.target='embedded'
> -Dweb.tmpdir='/tmp/flink-web-d7691661-fac5-494e-8154-896b4fe30692'
> -Dpipeline.jars='file:/opt/flink/usrlib/quickstart-0.1.jar'
> -Djobmanager.memory.jvm-metaspace.size='268435456b'
> -Djobmanager.memory.heap.size='3462817376b'
> -Djobmanager.memory.jvm-overhead.max='429496736b'
>
> ERROR StatusLogger No Log4j 2 configuration file found. Using default
> configuration (logging only errors to the console), or user
> programmatically provided configurations. Set system property
> 'log4j2.debug' to show Log4j 2 internal initialization logging. See
> https://logging.apache.org/log4j/2.x/manual/configuration.html
> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
> for instructions on how to configure Log4j 2
>
> WARNING: An illegal reflective access operation has occurred
>
> WARNING: Illegal reflective access by
> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil
> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to method
> java.nio.DirectByteBuffer.cleaner()
>
> WARNING: Please consider reporting this to the maintainers of
> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil
>
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
>
> WARNING: All illegal access operations will be denied in a future release
>
> Apr 29, 2021 12:58:34 AM oracle.simplefan.impl.FanManager configure
>
> SEVERE: attempt to configure ONS in FanManager failed with
> oracle.ons.NoServersAvailable: Subscription time out
>
>
>
>
>
> -------- The logs stops here, flink applications logs doesn’t get printed
> here anymore---------
>
>
>
>
>
> Best,
>
> Fuyao
>
>
>
>
>
> *From: *Fuyao Li <fuyao...@oracle.com>
> *Date: *Friday, April 30, 2021 at 16:50
> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
> *Subject: *[External] : StopWithSavepoint() method doesn't work in Java
> based flink native k8s operator
>
> Hello Community, Yang,
>
>
>
> I am trying to extend the flink native Kubernetes operator by adding some
> new features based on the repo [1]. I wrote a method to release the image
> update functionality. [2] I added the
>
> triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig);
>
>
>
> under the existing method.
>
> triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig);
>
>
>
>
>
> I wrote a function to accommodate the image change behavior.[2]
>
>
>
> Solution1:
>
> I want to use stopWithSavepoint() method to complete the task. However, I
> found it will get stuck and never get completed. Even if I use get() for
> the completeableFuture. It will always timeout and throw exceptions. See
> solution 1 logs [3]
>
>
>
> Solution2:
>
> I tried to trigger a savepoint, then delete the deployment in the code and
> then create a new application with new image. This seems to work fine. Log
> link: [4]
>
>
>
> My questions:
>
>    1. Why solution 1 will get stuck? triggerSavepoint()
>    CompleteableFuture could work here… Why stopWithSavepoint() will always get
>    stuck or timeout? Very confused.
>    2. For Fabric8io library, I am still new to it, did I do anything
>    wrong in the implementation, maybe I should update the jobStatus? Please
>    give me some suggestions.
>    3. For work around solution 2, is there any bad influence I didn’t
>    notice?
>
>
>
>
>
> [1] https://github.com/wangyang0918/flink-native-k8s-operator
> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijNSMY0DI$>
>
> [2] https://pastebin.ubuntu.com/p/tQShjmdcJt/
> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/tQShjmdcJt/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijoiwPw-I$>
>
> [3] https://pastebin.ubuntu.com/p/YHSPpK4W4Z/
> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/YHSPpK4W4Z/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijmgfSmqs$>
>
> [4] https://pastebin.ubuntu.com/p/3VG7TtXXfh/
> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/3VG7TtXXfh/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijr_tizPo$>
>
>
>
> Best,
>
> Fuyao
>

Reply via email to