Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Jingsong Li
Hi Yingjie, Thanks for your explanation. I have no more questions. +1 On Tue, Dec 14, 2021 at 3:31 PM Yingjie Cao wrote: > > Hi Jingsong, > > Thanks for your feedback. > > >>> My question is, what is the maximum parallelism a job can have with the > >>> default configuration? (Does this break o

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Jingsong, Thanks for your feedback. >>> My question is, what is the maximum parallelism a job can have with the default configuration? (Does this break out of the box) Yes, you are right, these two options are related to network memory and framework off-heap memory. Generally, these changes w

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Jingsong Li
Hi Yingjie, +1 for this FLIP. I'm pretty sure this will greatly improve the ease of batch jobs. Looks like "taskmanager.memory.framework.off-heap.batch-shuffle.size" and "taskmanager.network.sort-shuffle.min-buffers" are related to network memory and framework.off-heap.size. My question is, wha

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Jiangang, Thanks for your suggestion. >>> The config can affect the memory usage. Will the related memory configs be changed? I think we will not change the default network memory settings. My best expectation is that the default value can work for most cases (though may not the best) and for

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-13 Thread Yingjie Cao
Hi Yun, Thanks for your feedback. I think setting taskmanager.network.sort-shuffle.min-parallelism to 1 and using sort-shuffle for all cases by default is a good suggestion. I am not choosing this value mainly because two reasons: 1. The first one is that it increases the usage of network memory

Re: using flink retract stream and rockdb, too many intermediate result of values cause checkpoint too heavy to finish

2021-12-13 Thread Caizhi Weng
Hi! Changes of input tables will cause corresponding changes in output table Which sink are you using? If it is an upsert sink then Flink SQL planner will filter out UPDATE_BEFORE messages automatically. Also if your sink supports something like "ignore delete messages" it can also filter out de

Re: Sending an Alert to Slack, AWS sns, mattermost

2021-12-13 Thread Robert Cullen
Yes, That's the correct use case. Will this work with the DataStream API? UDFs are for the Table API, correct? Is there a custom sink that can be applied? Such as this Fraud Detection example [1]. But in this use case instead of sending the alert to a log it sends the message to a webhook? [1]

Re: Confusion about rebalance bytes sent metric in Flink UI

2021-12-13 Thread Caizhi Weng
Hi! This doesn't seem to be the expected behavior. Rebalance shuffle should send records to one of the parallelism, not all. If possible could you please explain what your Flink job is doing and preferably share your user code so that others can look into this case? tao xiao 于2021年12月11日周六 01:1

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread narasimha
Thanks TImo, that was helpful. On Mon, Dec 13, 2021 at 7:19 PM Prasanna kumar < prasannakumarram...@gmail.com> wrote: > Chesnay Thank you for the clarification. > > On Mon, Dec 13, 2021 at 6:55 PM Chesnay Schepler > wrote: > >> The flink-shaded-zookeeper jars do not contain log4j. >> >> On 13/12

Re: Sending an Alert to Slack, AWS sns, mattermost

2021-12-13 Thread Caizhi Weng
Hi! Could you please elaborate more on your use case? Do you want to check the records in a data stream and if some condition is met then send an alert? If that is the case, you can use a UDF for checking and sending alerts. See [1] for detailed explanation about UDF. [1] https://nightlies.apach

Re: CoGroupedStreams and disableAutoGeneratedUIDs

2021-12-13 Thread Dan Hill
Thanks! Filed. https://issues.apache.org/jira/browse/FLINK-25285 Thias, I'll try your recommendation. On Sun, Dec 12, 2021 at 11:38 PM Timo Walther wrote: > Hi Dan, > > if there is no way of setting a uid(), then it sounds like a bug in the > API that should be fixed. Feel free to open an is

Sending an Alert to Slack, AWS sns, mattermost

2021-12-13 Thread Robert Cullen
Hello, I'm looking for some guidance on how to send alert notifications from a DataStream to a Slack channel and possibly other alerting tools (ie. AWS sns, mattermost) -- Robert Cullen 240-475-4490

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-13 Thread Timothy James
Thank you Timo. Hi Arvid! I note that that ticket proposes two alternatives for solution. The first > Either we allow a properties map similar to Kafka or Kinesis properties to our connectors. seems to solve our problem. The second, much more detailed, appears unrelated to our needs: > Or so

Re: Unable to create new native thread error

2021-12-13 Thread David Morávek
Maybe PID limit for cgroup could also come into play with kubernetes [1][2] (*/sys/fs/cgroup/pids/pids.max*)? How many threads does your TM create before crashing? [1] https://kubernetes.io/docs/concepts/policy/pid-limiting/#pod-pid-limits [2] https://access.redhat.com/discussions/4713291 On Mon,

IOException or StacklessClosedChannelException on flink-connector-kinesis triggers job to restart

2021-12-13 Thread Leon Xu
Hi Flink users, I use flink-1.12.5 kinesis connector to consume data from kinesis. >From time to time I am getting IOException or StacklessClosedChannelException, which will fail the Flink operator when it by default reaches 10 times and trigger the entire job to restart. I have two questions: I

Re: Unable to create new native thread error

2021-12-13 Thread Ilan Huchansky
Hi David, We already increased the max number of threads: sudo cat /proc/sys/kernel/threads-max 3094538 Our cluster runs over docker, the machines hosting the dockers are dedicated only to the cluster. The configuration of the docker are pulled from the hosts so same number of threads is confi

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-13 Thread Timo Walther
Hi Timothy, unfortunetaly, this is not supported yet. However, the effort will be tracked under the following ticket: https://issues.apache.org/jira/browse/FLINK-19589 I will loop-in Arvid (in CC) which might help you in contributing the missing functioniality. Regards, Timo On 10.12.21

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
It turned out this was a bug and will be fixed in the next (non-log4j) patch version: https://issues.apache.org/jira/browse/FLINK-23704 Regards, Timo On 13.12.21 14:11, Timo Walther wrote: Hi Morgan, I was assuming that it is caused by some invalid metrics configuration. But I wasn't aware

Re: Stateful functions module configurations (module.yaml) per deployment environment

2021-12-13 Thread Deniz Koçak
Hello Igal, First of all, thanks for your effort and feedback on that issue. We followed the steps you specified and it seems to be working, in order to briefly summarize what we have done (nothing different actually you specified on your e-mail) 1) Compiled `statefun-flink-distribution` artifact

Re: Unable to create new native thread error

2021-12-13 Thread David Morávek
Hi Ilan, can you please check number of threads on the task-managers / OS? As far as I remember this happens when system can not create any more threads (there is a system wide limit */proc/sys/kernel/threads-max* [1]). Please not that the limit might be exhausted by other processes. [1] https://m

Re: FW: EXT: Re: Issues in Batch Jobs Submission for a Session Cluster

2021-12-13 Thread David Morávek
Hi, As far as I understand "java.io.IOException: Cannot allocate memory" happens when JVM is not able to allocate a new memory from the OS. If that's that case, I'd suggest increasing JVM overhead [1], because that's basically a pool of memory, that is free to be allocated by native libraries. Ho

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread Prasanna kumar
Chesnay Thank you for the clarification. On Mon, Dec 13, 2021 at 6:55 PM Chesnay Schepler wrote: > The flink-shaded-zookeeper jars do not contain log4j. > > On 13/12/2021 14:11, Prasanna kumar wrote: > > Does Zookeeper have this vulnerability dependency ? I see references to > log4j in Shaded Zo

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread Chesnay Schepler
The flink-shaded-zookeeper jars do not contain log4j. On 13/12/2021 14:11, Prasanna kumar wrote: Does Zookeeper have this vulnerability dependency ? I see references to log4j in Shaded Zookeeper jar included as part of the flink distribution. On Mon, Dec 13, 2021 at 1:40 PM Timo Walther wrot

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread Prasanna kumar
Does Zookeeper have this vulnerability dependency ? I see references to log4j in Shaded Zookeeper jar included as part of the flink distribution. On Mon, Dec 13, 2021 at 1:40 PM Timo Walther wrote: > While we are working to upgrade the affected dependencies of all > components, we recommend user

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
Hi Morgan, I was assuming that it is caused by some invalid metrics configuration. But I wasn't aware that this worked before and didn't read that you switched to the new Kafka connector. Indeed, this might be the reason. I will loop-in experts on this topic. Regards, Timo On 13.12.21 13:3

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Geldenhuys, Morgan Karl
Hi Timo, Thank you for the reply. Not really sure how that link helps besides explaining what a histogram is or accessing the metrics through the UI which is not what im interested in. With flink 1.12 and 1.13 the latency metric was working great, however, with 1.14 and the new KafkaSource/Kaf

using flink retract stream and rockdb, too many intermediate result of values cause checkpoint too heavy to finish

2021-12-13 Thread vtygoss
Hi, community! I meet a problem in the procedure of building a streaming production pipeline using Flink retract stream and hudi-hdfs/kafka as storage engine and rocksdb as statebackend. In my scenario, - During a patient's hospitalization, multiple measurements of vital signs are recorded

Re: Flink 1.13.3, k8s HA - ResourceManager was revoked leadership

2021-12-13 Thread David Morávek
Hi Alexey, please be aware that the json-based logs in the mail may not make it pass the spam filter (at least for gmail they did not) :( K8s based leader election is based on optimistic locking of the underlying config-map (~ periodically updating the lease annotation of the config-map). If JM f

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
Hi Morgan, did you see this: https://stackguides.com/questions/68917956/read-flink-latency-tracking-metric-in-datadog Also `metrics.latency.granularity` must be set in the Flink configuration. Not sure if `-D` forwards this properly. Timo On 10.12.21 18:31, Geldenhuys, Morgan Karl wrote

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread Timo Walther
While we are working to upgrade the affected dependencies of all components, we recommend users follow the advisory of the Apache Log4j Community. Also Ververica platform can be patched with a similar approach: To configure the JVMs used by Ververica Platform, you can pass custom Java options