Hello,

I noticed that first trigger a savepoint and then delete the deployment might 
cause the duplicate data issue. That could pose a bad influence to the semantic 
correctness. Please give me some hints on how to make the stopWithSavepoint() 
work correctly with Fabric8io Java k8s client to perform this image update 
operation. Thanks!

Best,
Fuyao



From: Fuyao Li <fuyao...@oracle.com>
Date: Friday, April 30, 2021 at 18:03
To: user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
Subject: [External] : Re: StopWithSavepoint() method doesn't work in Java based 
flink native k8s operator
Hello Community, Yang,

I have one more question for logging. I also noticed that if I execute kubectl 
logs  command to the JM. The pods provisioned by the operator can’t print out 
the internal Flink logs in the kubectl logs. I can only get something like the 
logs below. No actual flink logs is printed here… Where can I find the path to 
the logs? Maybe use a sidecar container to get it out? How can I get the logs 
without checking the Flink WebUI? Also, the sed error makes me confused here. 
In fact, the application is already up and running correctly if I access the 
WebUI through Ingress.

Reference: 
https://github.com/wangyang0918/flink-native-k8s-operator/issues/4<https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator/issues/4__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptEJWo5vM$>


[root@bastion deploy]# kubectl logs -f flink-demo-594946fd7b-822xk

sed: couldn't open temporary file /opt/flink/conf/sedh1M3oO: Read-only file 
system
sed: couldn't open temporary file /opt/flink/conf/sed8TqlNR: Read-only file 
system
/docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only file 
system
sed: couldn't open temporary file /opt/flink/conf/sedvO2DFU: Read-only file 
system
/docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only file 
system
/docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx3462817376 
-Xms3462817376 -XX:MaxMetaspaceSize=268435456 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint 
-D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=429496736b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=3462817376b -D 
jobmanager.memory.jvm-overhead.max=429496736b
ERROR StatusLogger No Log4j 2 configuration file found. Using default 
configuration (logging only errors to the console), or user programmatically 
provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html<https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
 for instructions on how to configure Log4j 2
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner 
(file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to field 
java.util.Properties.serialVersionUID
WARNING: Please consider reporting this to the maintainers of 
org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release


-------- The logs stops here, flink applications logs doesn’t get printed here 
anymore---------

^C
[root@bastion deploy]# kubectl logs -f flink-demo-taskmanager-1-1
sed: couldn't open temporary file /opt/flink/conf/sedaNDoNR: Read-only file 
system
sed: couldn't open temporary file /opt/flink/conf/seddze7tQ: Read-only file 
system
/docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml: Read-only file 
system
sed: couldn't open temporary file /opt/flink/conf/sedYveZoT: Read-only file 
system
/docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml: Read-only file 
system
/docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp: Read-only 
file system
Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx697932173 
-Xms697932173 -XX:MaxDirectMemorySize=300647712 -XX:MaxMetaspaceSize=268435456 
org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=166429984b -D 
taskmanager.memory.network.min=166429984b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=665719939b -D taskmanager.cpu.cores=1.0 -D 
taskmanager.memory.task.heap.size=563714445b -D 
taskmanager.memory.task.off-heap.size=0b --configDir /opt/flink/conf 
-Djobmanager.memory.jvm-overhead.min='429496736b' 
-Dpipeline.classpaths='file:usrlib/quickstart-0.1.jar' 
-Dtaskmanager.resource-id='flink-demo-taskmanager-1-1' 
-Djobmanager.memory.off-heap.size='134217728b' -Dexecution.target='embedded' 
-Dweb.tmpdir='/tmp/flink-web-d7691661-fac5-494e-8154-896b4fe30692' 
-Dpipeline.jars='file:/opt/flink/usrlib/quickstart-0.1.jar' 
-Djobmanager.memory.jvm-metaspace.size='268435456b' 
-Djobmanager.memory.heap.size='3462817376b' 
-Djobmanager.memory.jvm-overhead.max='429496736b'
ERROR StatusLogger No Log4j 2 configuration file found. Using default 
configuration (logging only errors to the console), or user programmatically 
provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
internal initialization logging. See 
https://logging.apache.org/log4j/2.x/manual/configuration.html<https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
 for instructions on how to configure Log4j 2
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by 
org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil 
(file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to method 
java.nio.DirectByteBuffer.cleaner()
WARNING: Please consider reporting this to the maintainers of 
org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Apr 29, 2021 12:58:34 AM oracle.simplefan.impl.FanManager configure
SEVERE: attempt to configure ONS in FanManager failed with 
oracle.ons.NoServersAvailable: Subscription time out


-------- The logs stops here, flink applications logs doesn’t get printed here 
anymore---------


Best,
Fuyao


From: Fuyao Li <fuyao...@oracle.com>
Date: Friday, April 30, 2021 at 16:50
To: user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
Subject: [External] : StopWithSavepoint() method doesn't work in Java based 
flink native k8s operator
Hello Community, Yang,

I am trying to extend the flink native Kubernetes operator by adding some new 
features based on the repo [1]. I wrote a method to release the image update 
functionality. [2] I added the
triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig);

under the existing method.

triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig);


I wrote a function to accommodate the image change behavior.[2]

Solution1:
I want to use stopWithSavepoint() method to complete the task. However, I found 
it will get stuck and never get completed. Even if I use get() for the 
completeableFuture. It will always timeout and throw exceptions. See solution 1 
logs [3]

Solution2:
I tried to trigger a savepoint, then delete the deployment in the code and then 
create a new application with new image. This seems to work fine. Log link: [4]

My questions:

  1.  Why solution 1 will get stuck? triggerSavepoint() CompleteableFuture 
could work here… Why stopWithSavepoint() will always get stuck or timeout? Very 
confused.
  2.  For Fabric8io library, I am still new to it, did I do anything wrong in 
the implementation, maybe I should update the jobStatus? Please give me some 
suggestions.
  3.  For work around solution 2, is there any bad influence I didn’t notice?


[1] 
https://github.com/wangyang0918/flink-native-k8s-operator<https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijNSMY0DI$>
[2] 
https://pastebin.ubuntu.com/p/tQShjmdcJt/<https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/tQShjmdcJt/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijoiwPw-I$>
[3] 
https://pastebin.ubuntu.com/p/YHSPpK4W4Z/<https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/YHSPpK4W4Z/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijmgfSmqs$>
[4] 
https://pastebin.ubuntu.com/p/3VG7TtXXfh/<https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/3VG7TtXXfh/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijr_tizPo$>

Best,
Fuyao

Reply via email to