How to do windowing on flink retract stream?

2021-12-15 Thread casel.chen
业务需要按每分钟统计不同交易状态的交易数,接了业务mysql库binlog到flink计算。这是一个retract回撤流。是不能够直接使用Tumble window计算的。 1. 那么是不是只能用全量窗口的方式实现?即group by交易时间落到其时长为一分钟的窗口,再配合state TTL来过期不需要的状态? 2. 这样一来的话每来一笔交易都会更新状态,如果直接输出到下游mysql保存的话会对mysql造成很大写压力,那么是不是可以再接一个Tumble window获取每个指标的最新统计值输出? 3.

RE: unexpected result of interval join when using sql

2021-12-15 Thread Schwalbe Matthias
Probably an oversight ... did you actually mean to publish your password? Better change it the sooner possible ... Thias From: cy Sent: Donnerstag, 16. Dezember 2021 06:55 To: user@flink.apache.org Subject: unexpected result of interval join when using sql Hi Flink 1.14.0 Scala 2.12 I'm

Re: Confusion about rebalance bytes sent metric in Flink UI

2021-12-15 Thread Arvid Heise
Ah yes I see it now as well. Yes you are right, each record should be replicated 9 times to send to one of the instances each. Your upstream is not inflating the record size? The number of records seems to work decently. @pnowojski FYI. On Thu, Dec 16, 2021 at 2:20 AM tao xiao wrote: > Hi

????

2021-12-15 Thread ??????
flinksql??adhoc??cep flink??etl

unexpected result of interval join when using sql

2021-12-15 Thread cy
Hi Flink 1.14.0 Scala 2.12 I'm using flink sql interval join ability, here is my table schema and sql create table `queue_3_ads_ccops_perf_o_ebs_volume_capacity` ( `dtEventTime` timestamp(3), `dtEventTimeStamp` bigint, `sourceid` string, `cluster_name` string, `poolname` string,

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Parag Somani
Thank you Chesnay for expediting this fix...! Can you suggest, when can I get binaries for 1.14.2 flink version? On Thu, Dec 16, 2021 at 5:52 AM Chesnay Schepler wrote: > We will push docker images for all new releases, yes. > > On 16/12/2021 01:16, Michael Guterl wrote: > > Will you all be

Re: Flink 1.13.3, k8s HA - ResourceManager was revoked leadership

2021-12-15 Thread Yang Wang
Could you please check whether the JobManager has a long fullGC, which will cause the leadership lost? BTW, increasing the timeout should help. high-availability.kubernetes.leader-election.lease-duration: 60s high-availability.kubernetes.leader-election.renew-deadline: 60s Best, Yang Alexey

Re: Confusion about rebalance bytes sent metric in Flink UI

2021-12-15 Thread tao xiao
Hi Arvid The second picture shows the metrics of the upstream operator. The upstream has 150 parallelisms as you can see in the first picture. I expect the bytes sent is about 9 * bytes received as we have 9 downstream operators connecting. Hi Caizhi, Let me create a minimal reproducible DAG and

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Chesnay Schepler
We will push docker images for all new releases, yes. On 16/12/2021 01:16, Michael Guterl wrote: Will you all be pushing Docker images for the 1.11.6 release? On Wed, Dec 15, 2021 at 3:26 AM Chesnay Schepler wrote: The current ETA is 40h for an official announcement. We are

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Michael Guterl
Will you all be pushing Docker images for the 1.11.6 release? On Wed, Dec 15, 2021 at 3:26 AM Chesnay Schepler wrote: > The current ETA is 40h for an official announcement. > We are validating the release today (concludes in 16h), publish it > tonight, then wait for mirrors to be sync (about a

Re: Rescaling feature disabled for Flink 1.14

2021-12-15 Thread Chesnay Schepler
Note that this REST/CLI command is not related to Reactive Mode in any way. On 15/12/2021 23:53, Chesnay Schepler wrote: Yes, this feature was indeed removed, and has been since 1.9.0. See https://lists.apache.org/thread/oby7fmz9crphonxw3l0g8b9zvybg3sno for some background. On 15/12/2021

Re: Rescaling feature disabled for Flink 1.14

2021-12-15 Thread Chesnay Schepler
Yes, this feature was indeed removed, and has been since 1.9.0. See https://lists.apache.org/thread/oby7fmz9crphonxw3l0g8b9zvybg3sno for some background. On 15/12/2021 23:47, Geldenhuys, Morgan Karl wrote: Greetings all, I am trying to test the rescaling feature of Flink 1.14, however

Rescaling feature disabled for Flink 1.14

2021-12-15 Thread Geldenhuys, Morgan Karl
Greetings all, I am trying to test the rescaling feature of Flink 1.14, however when i send a rest request to the endpoint I receive the following message: "org.apache.flink.runtime.rest.handler.RestHandlerException: Rescaling is temporarily disabled. See FLINK-12312. \tat

Re: IOException/StacklessClosedChannelException on flink-connector-kinesis trigger job to restart

2021-12-15 Thread Leon Xu
Hi Arvid, Thanks for your reply. In our use case, we are running Flink on EMR cluster. When we are reading data from a kinesis stream starting from TRIM_HORIZON offset through an EFO consumer, the flink source will get these IOException or StacklessClosedChannelException, which flink will

Information request: Reactive mode and Rescaling

2021-12-15 Thread Geldenhuys, Morgan Karl
Greetings, I would like to find out more about Flink's new reactive mode as well as the rescaling feature regarding fault tolerance. For the following question lets assume checkpointing is enabled using HDFS. So first question, if I have a job where the source(s) and sink(s) are configured

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-15 Thread Arvid Heise
Hi Timothy, The issue would require a refactor FileSystems abstraction to allow multiple FileSystems objects of the same FileSystem type being configured differently. While this would improve the code quality and enable such use cases, I currently have no capacity to work on it or guide it. If

Re: Looking for MultipleLinearRegression in Flink

2021-12-15 Thread Arvid Heise
Hi, Could you please check if it's in AI-Flow [1]? Maybe @姜鑫 can help. [1] https://github.com/flink-extended/ai-flow/ On Wed, Dec 15, 2021 at 8:35 AM thekingofcity wrote: > Hi, > > I'm looking for multiple linear regression in recent Flink versions. I do > find it in Flink 1.2 but have no

Re: FileSource with Parquet Format - parallelism level

2021-12-15 Thread Arvid Heise
Hi Krzysztof, yes you are correct if you use the new FileSource: * Please note that file blocks are only exposed by some file systems, such as HDFS. File systems * that do not expose block information will not create multiple file splits per file, but keep the * files as one source split. For

Re: Sending an Alert to Slack, AWS sns, mattermost

2021-12-15 Thread Arvid Heise
I recommend using AsyncIO [1] and RichAsyncFunction instead. SinkFunction will be removed at the end of Flink 1.X and can quickly turn into a bottleneck if used on many requests. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/ On Tue, Dec 14, 2021

Re: using flink retract stream and rockdb, too many intermediate result of values cause checkpoint too heavy to finish

2021-12-15 Thread Arvid Heise
Can you please describe your actual use case? What do you want to achieve low-latency or high-throughput? What are the consumers of the produced dataset? It sounds to me as if this is classical sensor aggregation. I have not heard of any sensor aggregation that doesn't use windowing. So you'd

Re: Confusion about rebalance bytes sent metric in Flink UI

2021-12-15 Thread Arvid Heise
Hi, Could you please clarify which operator we see in the second picture? If you are showing the upstream operator, then this has only parallelism 1, so there shouldn't be multiple subtasks. If you are showing the downstream operator, then the metric would refer to the HASH and not REBALANCE.

Re: IOException/StacklessClosedChannelException on flink-connector-kinesis trigger job to restart

2021-12-15 Thread Arvid Heise
Hi Leon, The kinesis consumer allows you to specify how many of these exceptions can be thrown without a job failure. This is independent from Flink's restart policy. So in your case, if you feel like the Kinesis consumer is giving up too quickly, you could simply increase

Re: UDF and Broadcast State Pattern

2021-12-15 Thread Krzysztof Chmielewski
Thank you very much for the clarification Seth. Best Regards Krzysztof Chmielewski śr., 15 gru 2021, 16:12 użytkownik Seth Wiesman napisał: > Hi Krzysztof, > > There is a difference in semantics here between yourself and Caizhi. SQL > UDFs can be used statefully - see AggregateFunction and >

Re: UDF and Broadcast State Pattern

2021-12-15 Thread Seth Wiesman
Hi Krzysztof, There is a difference in semantics here between yourself and Caizhi. SQL UDFs can be used statefully - see AggregateFunction and TableAggregateFunction for examples. You even have access to ListView and MapView which are backed by ListState and MapState accordingly. These functions

Re: Periodic Job Failure

2021-12-15 Thread Chesnay Schepler
How are you deploying the job and the external services? Is the period in which this happens usually the same? Is it just a connection issue with external services, or are there other errors as well? On 15/12/2021 15:47, Julian Cardarelli wrote: Hello – We have a job that seems to stop

Periodic Job Failure

2021-12-15 Thread Julian Cardarelli
Hello - We have a job that seems to stop working after some period of time - perhaps 10-12 days. The job itself appears in the running state, but for some reason it just stops communicating to external services. I know this e-mail will be like "we don't know what's wrong with your code." I

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Chesnay Schepler
The 1.12.6 release was cancelled as a new log4j CVE was discovered during the release finalization. We will only release 1.12.7. Our recommendation is to upgrade to 1.12.7 once it is released. On 15/12/2021 14:03, V N, Suchithra (Nokia - IN/Bangalore) wrote: Thanks Chesney for info. I can

RE: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread V N, Suchithra (Nokia - IN/Bangalore)
Thanks Chesney for info. I can see 1.12.5 is the last release in 1.12.x flink versions. Flink 1.12.6 contains log4j 2.15 and flink 1.12.7 contains log4j 2.16. As per the Apache community it is recommended to upgrade to log4j 2.16. Is there a dependency to release flink 1.12.7 after the release

Re: Stateful functions module configurations (module.yaml) per deployment environment

2021-12-15 Thread Igal Shilman
Hello Deniz, Glad to hear that it worked for you! as this is a feature that might benefit others in the community I've just merged this to our main branch, which means that feature releases will have that feature :-) Currently there are no plans to backport this to 3.1 however. Cheers, Igal. On

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Chesnay Schepler
The current ETA is 40h for an official announcement. We are validating the release today (concludes in 16h), publish it tonight, then wait for mirrors to be sync (about a day), then we announce it. On 15/12/2021 12:08, V N, Suchithra (Nokia - IN/Bangalore) wrote: Hello, Could you please

RE: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hello, Could you please tell when we can expect Flink 1.12.7 release? We are waiting for the CVE fix. Regards, Suchithra From: Chesnay Schepler Sent: Wednesday, December 15, 2021 4:04 PM To: Richard Deurwaarder Cc: user Subject: Re: CVE-2021-44228 - Log4j2 vulnerability We will also

RE: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hello, Could you please tell when we can expect Flink 1.12.7 release? We are waiting for the CVE fix. Regards, Suchithra From: Chesnay Schepler Sent: Wednesday, December 15, 2021 4:04 PM To: Richard Deurwaarder Cc: user Subject: Re: CVE-2021-44228 - Log4j2 vulnerability We will also update

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Chesnay Schepler
We will also update the docker images. On 15/12/2021 11:29, Richard Deurwaarder wrote: Thanks for picking this up quickly! I saw you've made a second minor upgrade to upgrade to log4j2 2.16 which is perfect. Just to clarify: Will you also push new docker images for these releases as well?

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-15 Thread Richard Deurwaarder
Thanks for picking this up quickly! I saw you've made a second minor upgrade to upgrade to log4j2 2.16 which is perfect. Just to clarify: Will you also push new docker images for these releases as well? In particular flink 1.11.6 (Sorry we must upgrade soon! :() On Tue, Dec 14, 2021 at 2:33 AM

Re: stateSerializer(1.14.0) not compatible with previous stateSerializer(1.13.1)

2021-12-15 Thread 李诗君
Hi, Im using Flink-SQL, so maybe it is the default kryo serializer. > 2021年12月10日 下午4:15,Roman Khachatryan 写道: > > Hi, > > Compatibility might depend on specific serializers, > could you please share which serializers you use to access the state? > > Regards, > Roman > > On Fri, Dec 10, 2021

Re: UDF and Broadcast State Pattern

2021-12-15 Thread Krzysztof Chmielewski
Thank you, yes I was thinking about simply running my own thread in UDF and consume some queue something like that. Having some background with DataStreamAPI i was hoping that I can reuse same mechanisms (like Broadcast State Pattern or CoProcessFunction) in Flink SQL. However it seems there is a

Re: Direct buffer memory in job with hbase client

2021-12-15 Thread Xintong Song
Hi Anton, You may want to try increasing the task off-heap memory, as your tasks are using hbase client which needs off-heap (direct) memory. The default task off-heap memory is 0 because most tasks do not use off-heap memory. Unfortunately, I cannot advise on how much task off-heap memory your

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-15 Thread Fabian Paul
Hi Megh, Flink offers the ParquetVectorizedInputFormat which is already heavily optimized. Unfortunately, you need to need to implement some of the methods depending on your type. In general, the BulkFormat gives you more control and allows more optimizations but is harder to implement. Best,