Re: Profiling on flink jobs

2023-12-01 Thread Matthias Pohl via user
I missed the Reply All button in my previous message. Here's my previous email for the sake of transparency sent to the user ML once more: Hi Oscar, sorry for the late reply. I didn't see that you posted the question at the beginning of the month already. I used jmap [1] in the past to get some s

Re: Doubts about state and table API

2023-11-29 Thread Matthias Pohl via user
Hi Oscar, could you provide the Java code to illustrate what you were doing? The difference between version A and B might be especially helpful. I assume you already looked into the FAQ about operator IDs [1]? Adding the JM and TM logs might help as well to investigate the issue, as Yu Chen mentio

Re: Java 17 as default

2023-11-29 Thread Matthias Pohl via user
The 1.18 Docker images were pushed on Oct 31. This also included Java 17 images [1]. [1] https://hub.docker.com/_/flink/tags?page=1&name=java17 On Wed, Nov 15, 2023 at 7:56 AM Tauseef Janvekar wrote: > Dear Team, > > I saw the documentation for 1.18 and Java 17 is not supported and the > image

Re: [DISCUSS][FLINK-33240] Document deprecated options as well

2023-10-30 Thread Matthias Pohl via user
Thanks for your proposal, Zhanghao Chen. I think it adds more transparency to the configuration documentation. +1 from my side on the proposal On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen wrote: > Hi Flink users and developers, > > Currently, Flink won't generate doc for the deprecated options

Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Matthias Pohl via user
Congratulations and good luck with pushing the project forward. On Mon, Mar 27, 2023 at 2:35 PM Jing Ge via user wrote: > Congrats! > > Best regards, > Jing > > On Mon, Mar 27, 2023 at 2:32 PM Leonard Xu wrote: > >> Congratulations! >> >> >> Best, >> Leonard >> >> On Mar 27, 2023, at 5:23 PM, Y

Re: Issue with the flink version 1.10.1

2023-03-27 Thread Matthias Pohl via user
Hi Kiran, it's really hard to come up with an answer based on your description. Usually, it helps to share some logs with the exact error that's appearing and a clear description on what you're observing and what you're expecting. A plain "no jobs are running" is too general to come up with a concl

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-27 Thread Matthias Pohl via user
Here are a few things I noticed from the 1.17 release retrospectively which I want to share (other release managers might have a different view or might disagree): - Google Meet might not be the best choice for the release sync. We need to be able to invite attendees even if the creator of the mee

Re: [ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 Thread Matthias Pohl via user
Thanks for making this release getting over the finish line. One additional thing: Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things th

Re: Job Cancellation Failing

2023-02-21 Thread Matthias Pohl via user
I noticed a test instability that sounds quite similar to what you're experiencing. I created FLINK-31168 [1] to follow-up on this one. [1] https://issues.apache.org/jira/browse/FLINK-31168 On Mon, Feb 20, 2023 at 4:50 PM Matthias Pohl wrote: > What do you mean by "earlier it use

Re: Job Cancellation Failing

2023-02-20 Thread Matthias Pohl via user
What do you mean by "earlier it used to fail due to ExecutionGraphStore not existing in /tmp" folder? Did you get the error message "Could not create executionGraphStorage directory in /tmp." and creating this folder fixed the issue? It also looks like the stacktrace doesn't match any of the 1.15

Re: Blob server connection problem

2023-01-24 Thread Matthias Pohl via user
We had issues like that in the past (e.g. FLINK-24923 [1], FLINK-10683 [2]). The error you're observing is caused by an unexpected byte being read from the socket. The BlobServer protocol expects either 0 (for put messages) or 1 (for get messages) being retrieved as a header for new message blocks

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
.java#L457 > [3] > https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/ > > Matthias Pohl via user 于2023年1月2日周一 20:27写道: > >> Hi Ruibin, >> could you switch to using the currently supported way for instantiating >> reporters using t

Re: The use of zookeeper in flink

2023-01-02 Thread Matthias Pohl via user
And I screwed up the reply again. -.- Here's my previous response for the ML thread and not only spoon_lz: Hi spoon_lz, Thanks for reaching out to the community and sharing your use case. You're right about the fact that Flink's HA feature relies on the leader election. The HA backend not being re

Re: How does Flink plugin system work?

2023-01-02 Thread Matthias Pohl via user
Hi Ruibin, could you switch to using the currently supported way for instantiating reporters using the factory configuration parameter [1][2]? Based on the ClassNotFoundException, your suspicion might be right that the plugin didn't make it onto the classpath. Could you share the startup logs of t

Re: Cleanup for high-availability.storageDir

2022-12-08 Thread Matthias Pohl via user
ds, and I believe that doesn't count as Cancelled, so > the artifacts for blobs and submitted job graphs are not cleaned up. I > imagine the same logic Gyula mentioned before applies, namely keep the > latest one and clean the older ones. > > Regards, > Alexis. > > Am D

Re: How's JobManager bring up TaskManager in Application Mode or Session Mode?

2022-11-28 Thread Matthias Pohl via user
Hi Mark, the JobManager is not necessarily in charge of spinning up TaskManager instances. It depends on the resource provider configuration you choose. Flink differentiates between active and passive Resource Management (see the two available implementations of ResourceManager [1]). Active Resour

Re: [Security] - Critical OpenSSL Vulnerability

2022-11-01 Thread Matthias Pohl via user
The Docker image for Flink 1.12.7 uses an older base image which comes with openssl 1.1.1k. There was a previous post in the OpenSSL mailing list reporting a low vulnerability being fixed with 3.0.6 and 1.1.1r (both versions being explicitly mentioned) [1]. Therefore, I understand the post in a way

Re: Watermark generating mechanism in Flink SQL

2022-10-17 Thread Matthias Pohl via user
Hi Hunk, there is documentation about watermarking in FlinkSQL [1]. There is also a FlinkSQL cookbook entry about watermarking [2]. Essentially, you define the watermark strategy in your CREATE TABLE statement and specify the lateness for a given event (not the period in which watermarks are automa

Re: jobmaster's fatal error will kill the session cluster

2022-10-17 Thread Matthias Pohl via user
[?:?] > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > ~[flink-scala_2.12-1.15.0.jar:1.15.0] > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:

Re: Sometimes checkpoints to s3 fail

2022-10-14 Thread Matthias Pohl via user
Hi Evgeniy, is it Ceph which you're using as a S3 server? All the Google search entries point to Ceph when looking for the error message. Could it be that there's a problem with the version of the underlying system? The stacktrace you provided looks like Flink struggles to close the File and, there

Re: jobmaster's fatal error will kill the session cluster

2022-10-14 Thread Matthias Pohl via user
Hi Jie Han, welcome to the community. Just a little side note: These kinds of questions are more suitable to be asked in the user mailing list. The dev mailing list is rather used for discussing feature development or project-related topics. See [1] for further details. About your question: The st

Re: Cancel a job in status INITIALIZING

2022-09-26 Thread Matthias Pohl via user
Can you provide the JobManager logs for this case. It sounds odd that the job was stuck in the INITIALIZING phase. Matthias On Wed, Sep 21, 2022 at 11:50 AM Christian Lorenz via user < user@flink.apache.org> wrote: > Hi, > > > > we’re running a Flink Cluster in standalone/session mode. During a

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Yes, the JobManager will failover in HA mode and all jobs would be recovered. On Mon, Sep 26, 2022 at 2:06 PM ramkrishna vasudevan < ramvasu.fl...@gmail.com> wrote: > Thanks @Matthias Pohl . This is informative. So > generally in a session cluster if I have more than one job and

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
with the following configs > enabled: > > SHUTDOWN_ON_APPLICATION_FINISH = false > SUBMIT_FAILED_JOB_ON_APPLICATION_ERROR = true > > I think jobmanager pod would not restart but simply go to a terminal > failed state right? > > Gyula > > On Mon, Sep 26, 2022 at 12:31

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Sep 26, 2022 at 3:11 PM ramkrishna vasudevan < > ramvasu.fl...@gmail.com> wrote: > >> Thank you very much for the reply. I have lost the k8s cluster in this >> case before I could capture the logs. I will try to repro this and get back >> to you. >> >> Regards >&

Re: JobManager restarts on job failure

2022-09-26 Thread Matthias Pohl via user
Thanks Evgeniy for reaching out to the community and Gyula for picking it up. I haven't looked into the k8s operator in much detail, yet. So, help me out if I miss something here. But I'm afraid that this is not something that would be fixed by upgrading to 1.15. The issue here is that we're recove

Re: Jobmanager fails to come up if the job has an issue

2022-09-26 Thread Matthias Pohl via user
Hi Ramkrishna, thanks for reaching out to the Flink community. Could you share the JobManager logs to get a better understanding of what's going on? I'm wondering why the JobManager is failing when the actual problem is that the job is struggling to access a folder. It sounds like there are multipl

Re: Classloading issues with Flink Operator / Kubernetes Native

2022-09-16 Thread Matthias Pohl via user
Are you deploying the job in session or application mode? Could you provide the stacktrace. I'm wondering whether that would be helpful to pin a code location for further investigation. So far, I couldn't come up with a definite answer about placing the jar in the lib directory. Initially, I would

Re: New licensing for Akka

2022-09-09 Thread Matthias Pohl via user
Looks like there will be a bit of a grace period till Sep 2023 for vulnerability fixes in akka 2.6.x [1] [1] https://discuss.lightbend.com/t/2-6-x-maintenance-proposal/9949 On Wed, Sep 7, 2022 at 4:30 PM Robin Cassan via user wrote: > Thanks a lot for your answers, this is reassuring! > > Cheer

Re: New licensing for Akka

2022-09-07 Thread Matthias Pohl via user
There is some more discussion going on in the related PR [1]. Based on the current state of the discussion, akka 2.6.20 will be the last version under Apache 2.0 license. But, I guess, we'll have to see where this discussion is heading considering that it's kind of fresh. [1] https://github.com/ak

Re: Slow Tests in Flink 1.15

2022-09-06 Thread Matthias Pohl via user
Hi David, I guess, you're referring to [1]. But as Chesnay already pointed out in the previous thread: It would be helpful to get more insights into what exactly your tests are executing (logs, code, ...). That would help identifying the cause. > Can you give us a more complete stacktrace so we can

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-05 Thread Matthias Pohl via user
1 > commits > <https://github.com/SwimSweet/flink/compare/release-1.15...apache:flink:release-1.15> > behind > <https://github.com/SwimSweet/flink/compare/release-1.15...apache:flink:release-1.15> > apache:release-1.15 > also appear in my pr change files. How can I

Re: flink ci build run longer than the maximum time of 310 minutes.

2022-09-02 Thread Matthias Pohl via user
Not sure whether that applies to your case, but there was a recent issue [1] where the e2e_1_ci job ran into a timeout. If that's what you were observing, rebasing your branch might help. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-29161 On Fri, Sep 2, 2022 at 10:51 AM Martijn

Re: Failing to maven compile install Flink 1.15

2022-08-22 Thread Matthias Pohl via user
Hi hjw, it would be interesting to know the exact Maven commands you used for the successful run (where you compiled the flink-client module individually) and the failed run (where you tried to build everything at once) and probably a more complete version of the Maven output. The path D:\learn\Co

Re: Jobmanager trying to be registered for Zombie Job

2022-04-25 Thread Matthias Pohl
& thanks a lot for your help too! > > It's not quite clear to me, the bug was already there since 1.13.6 but not > reported yet (FLINK-27354 is a new ticket)? > > Best, Peter > > > On Mon, Apr 25, 2022 at 5:48 PM Matthias Pohl > wrote: > >> Thanks again

Re: Jobmanager trying to be registered for Zombie Job

2022-04-25 Thread Matthias Pohl
find more details on the investigation in FLINK-27354 [1] itself. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-27354 On Mon, Apr 25, 2022 at 2:00 PM Matthias Pohl wrote: > Thanks Peter, we're looking into it... > > On Mon, Apr 25, 2022 at 11:54 AM Peter S

Re: Jobmanager trying to be registered for Zombie Job

2022-04-25 Thread Matthias Pohl
It can be seen on jm 1 that > the job starts crashing and recovering a few times. This happens > until 2022-04-20 12:12:14,607. After that the above described behavior can > be seen. > > I hope this helps. > > Best, Peter > > On Fri, Apr 22, 2022 at 12:06 PM Matthias Poh

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
e.org/jira/browse/FLINK-27354 On Fri, Apr 22, 2022 at 11:54 AM Matthias Pohl wrote: > Just by looking through the code, it appears that these logs could be > produced while stopping the job. The ResourceManager sends a confirmation > of the JobMaster being disconnected at the end back t

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
s) is that stopping the JobMaster didn't finish for some reason. For that it would be helpful to look at the logs to see whether there is some other issue that causes the JobMaster to stop entirely. On Fri, Apr 22, 2022 at 10:14 AM Matthias Pohl wrote: > ...if possible it would be good t

Re: Jobmanager trying to be registered for Zombie Job

2022-04-22 Thread Matthias Pohl
...if possible it would be good to get debug rather than only info logs. Did you encounter anything odd in the TaskManager logs as well. Sharing those might be of value as well. On Fri, Apr 22, 2022 at 8:57 AM Matthias Pohl wrote: > Hi Peter, > thanks for sharing. That doesn't sound

Re: Jobmanager trying to be registered for Zombie Job

2022-04-21 Thread Matthias Pohl
Hi Peter, thanks for sharing. That doesn't sound right. May you provide the entire jobmanager logs? Best, Matthias On Thu, Apr 21, 2022 at 6:08 PM Peter Schrott wrote: > Hi Flink-Users, > > I am not sure if this does something to my cluster or not. But since > updating to Flink 1.15 (atm rc4) I

Re: Adjusted frame length exceeds 2147483647

2022-03-17 Thread Matthias Pohl
gt; This issue did not repeat, so it may be a network issue > > On Thu, Mar 17, 2022 at 6:12 PM Matthias Pohl wrote: > >> Hi Ori, >> that looks odd. The message seems to exceed the maximum size >> of 2147483647 bytes (2GB). I couldn't find anything similar in the ML or

Re: how to set kafka sink ssl properties

2022-03-17 Thread Matthias Pohl
Could you share more details on what's not working? Is the ssl.trustore.location accessible from the Flink nodes? Matthias On Thu, Mar 17, 2022 at 4:00 PM HG wrote: > Hi all, > I am probably not the smartest but I cannot find how to set ssl-properties > for a Kafka Sink. > My assumption was tha

Re: Adjusted frame length exceeds 2147483647

2022-03-17 Thread Matthias Pohl
Hi Ori, that looks odd. The message seems to exceed the maximum size of 2147483647 bytes (2GB). I couldn't find anything similar in the ML or in Jira that supports a bug in Flink. Could it be that there was some network issue? Matthias On Tue, Mar 15, 2022 at 6:52 AM Ori Popowski wrote: > I am

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
put the detail logs to this thread when it happen again, > since it happen sometime, like between two weeks, if one of our cluster > machine go down. > -- > *发件人:* Matthias Pohl > *发送时间:* 2022年3月1日 17:57 > *收件人:* Alexander Preuß > *抄送:* 刘 家锹 ; user@f

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
Hi, I second Alex' observation - based on the logs it looks like the task restart functionality worked as expected: It tried to restart the tasks until it reached the limit of 4 attempts due to the missing TaskManager. The job-cluster shut down with an error code. At this point, YARN should pick it

Re: [DISCUSS] Future of Per-Job Mode

2022-01-24 Thread Matthias Pohl
Hi all, I agree with Xintong's comment: Reducing the number of deployment modes would help users. There is a clearer distinction between session mode and the two other deployment modes (i.e. application and job mode). The difference between application and job mode is not that easy to grasp for new

Re: Flink Kinesis Producer con't connect with AWS credentials

2022-01-07 Thread Matthias Pohl
cessWindowFunctionWrapper) -> Sink: Unnamed (1/1)#0] WARN > o.a.f.k.s.c.a.s.k.producer.KinesisProducerConfiguration - Property > aws.credentials.provider.basic.accesskeyid ignored as there is no > corresponding set method in KinesisProducerConfiguration > > On Mon, Jan 3, 2022 at 5:34 PM Mat

Re: Flink connection to remote server

2022-01-03 Thread Matthias Pohl
Hi Mariam, a quick mailing list query and Jira query didn't reveal any pointers for Flink with Milvus, unfortunately. But have you had a look at Flink's AsyncIO API [1]? I haven't worked with it, yet. But it sounds like something that might help you accessing an external system. Matthias [1] http

Re: Flink Kinesis Producer con't connect with AWS credentials

2022-01-03 Thread Matthias Pohl
Hi Daniel, I'm assuming you already looked into the Flink documentation for this topic [1]? I'm gonna add Fabian to this thread. Maybe, he's able to help out here. Matthias [1] https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kinesis.html#kinesis-producer On Fri, Dec 31,

Re: JsonRowSerializationSchema unable to parse TIMESTAMP_LTZ fields

2022-01-03 Thread Matthias Pohl
For documentation purposes: Surendra started a discussion in FLINK-25411 [1]. [1] https://issues.apache.org/jira/browse/FLINK-25411 On Wed, Dec 22, 2021 at 9:51 AM Surendra Lalwani wrote: > > Hi Team, > > JsonRowSerializationSchema is unable to parse fields with type > TIMESTAMP_LTZ, seems like

Re: log4j2 upgrade requirement

2022-01-03 Thread Matthias Pohl
Hi Puneet, Flink logs things like the job name which can be specified by the user. Hence, a user could (as far as I understand) add a job name containing malicious content. This is where the Flink cluster's log4j version comes into play. Therefore, it's not enough to provide only an updated log4j d

Re: Scala Case Class Serialization

2021-12-07 Thread Matthias Pohl
Hi Lars, not sure about the out-of-the-box support for case classes with primitive member types (could you refer to the section which made you conclude this?). I haven't used Scala with Flink, yet. So maybe, others can give more context. But have you looked into using the TypeInfoFactory to define

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-30 Thread Matthias Pohl
Agarwal wrote: > Hi Matthias, > > We have created a JIRA ticket for this issue. Please find the jira id below > > https://issues.apache.org/jira/browse/FLINK-25096 > > Thanks > Mahima > > On Mon, Nov 29, 2021 at 2:24 PM Matthias Pohl > wrote: > >> Thanks M

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-29 Thread Matthias Pohl
87f6492bcba074' on > file system 'file'. > > Caused by: java.lang.IllegalStateException: Failed to rollback to > checkpoint/savepoint > file:/mnt/c/Users/abc/Documents/checkpoints/a737088e21206281db87f6492bcba074/chk-144. > Thanks and Regards > Mahima Agarwal >

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-25 Thread Matthias Pohl
Just to add a bit of context: The first-level members all-exceptions, root-exceptions, truncated and timestamp have been around for a longer time. The exceptionHistory was added in Flink 1.13. As part of this change, the aforementioned members were deprecated (see [1]). We kept them for backwards-c

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-23 Thread Matthias Pohl
88 GB > Non-Heap 416 MB 404 MB 2.23 GB > Total 5.28 GB 2.55 GB 7.10 GB > Outside JVM > Type > Count > Used > Capacity > Direct 32,836 1.01 GB 1.01 GB > Mapped 0 0 B 0 B > > > > On Tue, 23 Nov 2021 at 02:23, Matthias Pohl > wrote: > >> In general, runnin

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-22 Thread Matthias Pohl
TL > > Kafka -> JSon Validation (Jackson) -> filter -> JDBC to database. > > On Mon, 22 Nov 2021 at 10:24, Matthias Pohl > wrote: > >> Hi John, >> have you had a look at the memory model for Flink 1.10? [1] Based on the >> documentation, you could try i

Re: Flink on Native Kubernetes S3 checkpointing error

2021-11-22 Thread Matthias Pohl
heckpointing is working fine. > So, in short, on AWS use EKS v1.20+ for IAM Pod Identity Webhook. > > Thanks, > Hemant > > On Mon, Nov 22, 2021 at 7:26 PM Matthias Pohl > wrote: > >> Hi bat man, >> this feature seems to be tied to a certain AWS SDK version [1]

Re: Flink CLI - pass command line arguments to a pyflink job

2021-11-22 Thread Matthias Pohl
Hi Kamil, afaik, the parameter passing should work as normal by just appending them to the Flink job submission similar to the Java job submission: ``` $ ./flink run --help Action "run" compiles and runs a program. Syntax: run [OPTIONS] [...] ``` Matthias On Mon, Nov 22, 2021 at 3:58 PM Kamil

Re: Recommended metaspace memory config for 16GB hosts.

2021-11-22 Thread Matthias Pohl
Hi John, have you had a look at the memory model for Flink 1.10? [1] Based on the documentation, you could try increasing the Metaspace size independently of the Flink memory usage (i.e. flink.size). The heap Size is a part of the overall Flink memory. I hope that helps. Best, Matthias [1] https:

Re: Table API Filesystem connector - disable interval rolling policy

2021-11-22 Thread Matthias Pohl
Hi Kamil, by looking at the code I'd say that the only option you have is to increase the parameter you already mentioned to a very high number. But I'm not sure about the side effects. I'm gonna add Francesco to this thread. Maybe he has better ideas on how to answer your question. Best, Matthias

Re: Flink on Native Kubernetes S3 checkpointing error

2021-11-22 Thread Matthias Pohl
Hi bat man, this feature seems to be tied to a certain AWS SDK version [1] which you already considered. But I checked the version used in Flink 1.13.1 for the s3 filesystem. It seems like the version that's used (1.11.788) is good enough to provide this feature (which was added in 1.11.704): ``` $

Re: Kubernetes HA: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running

2021-11-22 Thread Matthias Pohl
Hi Joey, that looks like a cluster configuration issue. The 192.168.100.79:6123 is not accessible from the JobManager pod (see line 1224f in the provided JM logs): 2021-11-19 04:06:45,049 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed w

Re: FlinkJobNotFoundException

2021-10-14 Thread Matthias Pohl
2021 at 3:12 PM Gusick, Doug S wrote: > Hi Matthias, > > > > Do you have any update here? > > > > Thank you, > > Doug > > > > *From:* Gusick, Doug S [Engineering] > *Sent:* Thursday, October 7, 2021 9:03 AM > *To:* Hailu, Andreas [Engine

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-30 Thread Matthias Pohl
ts=4 ; > > where $PORT1 refers to my second host open port, mapped to 6123 on the > Docker container (first port is mapped to 8081). > I can see in the log that $HOST and $PORT1 resolve to the correct values, > 10.0.20.25 > and 31608 > > On Wed, Sep 29, 2021 at 9:41 AM Matthias

Re: Start Flink cluster, k8s pod behavior

2021-09-30 Thread Matthias Pohl
Hi Qihua, I guess, looking into kubectl describe and the JobManager logs would help in understanding what's going on. Best, Matthias On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang wrote: > Hi, > I deployed flink in session mode. I didn't run any jobs. I saw below logs. > That is normal, same as Fli

Re: FlinkJobNotFoundException

2021-09-29 Thread Matthias Pohl
ger logs there. Please let me know if you need any more information. > > > > Thank you, > > Doug > > > > *From:* Matthias Pohl > *Sent:* Wednesday, September 29, 2021 12:00 PM > *To:* Gusick, Doug S [Engineering] > *Cc:* user@flink.apach

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
...and if possible, it would be helpful to provide debug logs as well. On Wed, Sep 29, 2021 at 6:33 PM Matthias Pohl wrote: > May you provide the entire JobManager logs so that we can see what's going > on? > > On Wed, Sep 29, 2021 at 12:42 PM Javier Vegas wrote: > >&

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
4) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > > On Wed, Sep 29, 2021 at 2:36 AM Matthias Pohl > wrote: > >> The port has its separate configurat

Re: FlinkJobNotFoundException

2021-09-29 Thread Matthias Pohl
Hi Doug, thanks for reaching out to the community. First of all, 1.9.2 is quite an old Flink version. You might want to consider upgrading to a newer version. The community only offers support for the two most-recent Flink versions. Newer version might include fixes for your issue. But back to you

Re: flink rest endpoint creation failure

2021-09-29 Thread Matthias Pohl
Hi Curt, could you elaborate a bit more on your setup? Maybe, provide commands you used to deploy the jobs and the Flink/YARN logs. What's puzzling me is your statement about "two JobManagers spinning up" and "everything's working fine if two TaskManagers are running on different instances". - When

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
on.doAs(UserGroupInformation.java:1762) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) > ... 2 common frames omitte

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-29 Thread Matthias Pohl
onnection problem, or is this a red > herring and the problem is a different one? > > > Thanks, > > > Javier Vegas > > > > > > > > > On Tue, Sep 28, 2021 at 10:23 AM Javier Vegas wrote: > >> Thanks, Matthias! >> >> There are lots of apps de

Re: Unable to connect to Mesos on mesos-appmaster.sh start

2021-09-28 Thread Matthias Pohl
Hi Javier, I don't see anything that's configured in the wrong way based on the jobmanager logs you've provided. Have you been able to deploy other applications to this Mesos cluster? Do the Mesos master logs reveal anything? The variable resolution on the TaskManager side is a valid concern shared

Re: hdfs lease issues on flink retry

2021-09-20 Thread Matthias Pohl
+ *"s"*, *" "*).replace(*" "*, *"0"*) > + Integer.*toString*(taskNumber + 1) > + *"_0"*); > > > > > > So basically I am creating a directory named *attempt__0123_r_0001_0 *instead > of *attempt___r_0001_0*.

Re: Fast serialization for Kotlin data classes

2021-09-16 Thread Matthias Pohl
LINK-16686 > > > > Regards, > > Alexis. > > > > *From:* Matthias Pohl > *Sent:* Donnerstag, 16. September 2021 13:12 > *To:* Alex Cruise > *Cc:* Flink ML > *Subject:* Re: Fast serialization for Kotlin data classes > > > > Hi Alex, >

Re: Fast serialization for Kotlin data classes

2021-09-16 Thread Matthias Pohl
Hi Alex, have you had a look at TypeInfoFactory? That might be the best way to come up with a custom serialization mechanism. See the docs [1] for further details. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_ser

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-15 Thread Matthias Pohl
Thanks Leonard for the announcement. I guess that is helpful. @Robert is there any way we can change the default setting to something else (e.g. greater than 0 days)? Only having the last month available as a default is kind of annoying considering that the time setting is quite hidden. Matthias

Re: KafkaSource builder and checkpointing with parallelism > kafka partitions

2021-09-15 Thread Matthias Pohl
Hi Lars, I guess you are looking for execution.checkpointing.checkpoints-after-tasks-finish.enabled [1]. This configuration parameter is going to be introduced in the upcoming Flink 1.14 release. Best, Matthias [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#execu

Re: hdfs lease issues on flink retry

2021-09-07 Thread Matthias Pohl
Just for documentation purposes: I created FLINK-24147 [1] to cover this issue. [1] https://issues.apache.org/jira/browse/FLINK-24147 On Thu, Aug 26, 2021 at 6:14 PM Matthias Pohl wrote: > I see - I should have checked my mailbox before answering. I received the > email and was able to

Re: FLINK-14316 happens on version 1.13.2

2021-09-07 Thread Matthias Pohl
Hi Xiangyu, thanks for reaching out to the community. Could you share the entire TaskManager and JobManager logs with us? That might help investigating what's going on. Best, Matthias On Fri, Sep 3, 2021 at 10:07 AM Xiangyu Su wrote: > Hi Yun, > Thanks alot. > I am running a test, and facing th

Re: Savepoint failure along with JobManager crash

2021-08-31 Thread Matthias Pohl
Hi Prasanna, thanks for reaching out to the community. What you're experiencing is that the savepoint was created but the job itself ended up in an inconsistent state with Executions being cancelled instead of being finished. This should have triggered a global failover resulting in a job restart.

Re: Table API demo problem

2021-08-31 Thread Matthias Pohl
I missed the point that it's the purpose of the walkthrough to have the functionality being implemented by the user. So, FLINK-24076 is actually not valid. I initially thought of it as some kind of demo implementation. Sorry for the confusion. On Tue, Aug 31, 2021 at 11:15 AM Matthias Pohl

Re: Table API demo problem

2021-08-31 Thread Matthias Pohl
Hi Manraj, the error messages about libjemalloc.so are caused by Flink 1.13.1 that has been published with the wrong architecture accidentally. I created FLINK-24075 [1] to cover this issue. As a workaround, you could upgrade the base image to Flink 1.13.2 until the Flink 1.13.1 images are republis

Re: Queries regarding Flink upgrade strategies

2021-08-27 Thread Matthias Pohl
when during > upgrade Jobmanagers & Taskmanger pods are on different versions. Also the > impact of recreate strategy in the same context. > > Regards, > Amit > > On Fri, Aug 27, 2021 at 5:32 PM Matthias Pohl > wrote: > >> The upgrade approach mentioned in my previo

Re: Queries regarding Flink upgrade strategies

2021-08-27 Thread Matthias Pohl
t; (physical/virtual) deployment. > I want to understand the upgrade strategies on kubernetes deployments > where Flink is running in pods. If you could help in that area it would be > great. > > Regards, > Amit Bhatia > > On Thu, Aug 26, 2021 at 5:25 PM Matthias Pohl >

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
I see - I should have checked my mailbox before answering. I received the email and was able to login. On Thu, Aug 26, 2021 at 6:12 PM Matthias Pohl wrote: > The link doesn't work, i.e. I'm redirected to a login page. It would be > also good to include the Flink logs and make

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
r you. Please let us know if > you’re not able to see the files. > > > > > > *From:* Matthias Pohl > *Sent:* Thursday, August 26, 2021 9:47 AM > *To:* Shah, Siddharth [Engineering] > *Cc:* user@flink.apache.org; Hailu, Andreas [Engineering] < > andreas.ha...@ny

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread Matthias Pohl
Hi Jonas, have you included the s3 credentials in the Flink config file like it's described in [1]? I'm not sure about this hive.s3.use-instance-credentials being a valid configuration parameter. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems

Re: hdfs lease issues on flink retry

2021-08-26 Thread Matthias Pohl
Hi Siddharth, thanks for reaching out to the community. This might be a bug. Could you share your Flink and YARN logs? This way we could get a better understanding of what's going on. Best, Matthias On Tue, Aug 24, 2021 at 10:19 PM Shah, Siddharth [Engineering] < siddharth.x.s...@gs.com> wrote:

Re: Disabling autogenerated uid/hash doesn't work when using file source

2021-08-26 Thread Matthias Pohl
Hi Vishal, you're right: the FileSource itself doesn't provide these methods. But you could get them through the DataStreamSource (which implements SingleOutputStreamOperator and provides these two methods [1,2]). It is returned by StreamExecutionEnvironment.fromSource [3]. fromSource would need th

Re: Flink Avro Timestamp Precision Issue

2021-08-26 Thread Matthias Pohl
Hi Akshay, thanks for reaching out to the community. There was a similar question on the mailing list earlier this month [1]. Unfortunately, it just doesn't seem to be supported, yet. The feature request was already created with FLINK-23589 [2]. Best, Matthias [1] https://lists.apache.org/thread.

Re: Kinesis Producer not working with Flink 1.11.2

2021-08-26 Thread Matthias Pohl
Hi Sanket, have you considered reaching out to the Kinesis community? I might be wrong but it looks like a Kinesis issue. Best, Matthias On Tue, Aug 24, 2021 at 7:13 PM Sanket Agrawal wrote: > Hi, > > > > We are trying to use Kinesis along with Flink(1.11.2) and JDK 11 on EMR > cluster(6.2). Wh

Re: checkpoints/.../shared cleanup

2021-08-26 Thread Matthias Pohl
Hi Alexey, thanks for reaching out to the community. I have a question: What do you mean by "the shared subfolder still grows"? As far as I understand, the shared folder contains the state of incremental checkpoints. If you cancel the corresponding job and start a new job from one of the retained i

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-08-26 Thread Matthias Pohl
Hi Denis, I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The code

Re: Queries regarding Flink upgrade strategies

2021-08-26 Thread Matthias Pohl
Hi Amit, upgrading Flink versions means that you should stop your jobs with a savepoint first. A new cluster with the new Flink version can be deployed next. Then, this cluster can be used to start the jobs from the previously created savepoints. Each job should pick up the work from where it stopp

Re: Bulk Scheduler timeout when creating several jobs in flink kubernetes HA deployment

2021-08-26 Thread Matthias Pohl
Hi Gil, could you provide the complete logs (TaskManager & JobManager) for us to investigate it? The error itself and the behavior you're describing sounds like expected behavior if there are not enough slots available for all the submitted jobs to be handled in time. Have you tried increasing the

Re: 1.13 Flamegraphs

2021-08-13 Thread Matthias Pohl
Hi Mason, I'm adding Alex to the thread as he might be able to help answer this question in the most precise way next week. Best, Matthias On Fri, Aug 6, 2021 at 7:43 PM Mason Chen wrote: > Hi all, > > Does the sample processing also sample threads that do not belong to the > Flink framework? F

Re: ProcessFunctionTestHarnesses for testing Python functions

2021-08-13 Thread Matthias Pohl
Hi Bogdan, it does not look like it is by just doing a brief check of the code. But maybe Dian can give a more detailed answer here. I'm gonna add him to this thread. Best, Matthias On Wed, Jun 9, 2021 at 3:47 PM Bogdan Sulima wrote: > Hi all, > > in Java/Scala i was using ProcessFunctionTestHa

  1   2   3   >