Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Till Rohrmann
Congrats, Dian! Cheers, Till On Fri, Aug 28, 2020 at 8:33 AM Wei Zhong wrote: > Congratulations Dian! > > > 在 2020年8月28日,14:29,Jingsong Li 写道: > > > > Congratulations , Dian! > > > > Best, Jingsong > > > > On Fri, Aug 28, 2020 at 11:06 AM Walter Peng > wrote: > > c

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Wei Zhong
Congratulations Dian! > 在 2020年8月28日,14:29,Jingsong Li 写道: > > Congratulations , Dian! > > Best, Jingsong > > On Fri, Aug 28, 2020 at 11:06 AM Walter Peng > wrote: > congrats! > > Yun Tang wrote: > > Congratulations , Dian! > > > -- > Best, Jingsong Lee

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Jingsong Li
Congratulations , Dian! Best, Jingsong On Fri, Aug 28, 2020 at 11:06 AM Walter Peng wrote: > congrats! > > Yun Tang wrote: > > Congratulations , Dian! > -- Best, Jingsong Lee

Re: Not able to Assign Watermark in Flink 1.11

2020-08-27 Thread Khachatryan Roman
Hi Anuj Jain, You need to provide the type parameter when calling WatermarkStrategy.forBoundedOutOfOrderness like this: bookingFlowConsumer.assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(15)) Regards, Roman On Fri, Aug 28, 2020 at 6:49 AM aj wrote:

user@flink.apache.org

2020-08-27 Thread Danny Chan
Hi, Sofya T. Irwin ~ Can you share your case why you need a timed-window join there ? Now the sql timed window join is not supported yet, and i want to hear your voice if it is necessary to support in SQL. Sofya T. Irwin 于2020年7月30日周四 下午10:44写道: > Hi, > I'm trying to investigate a SQL job usi

Re: Different deserialization schemas for Kafka keys and values

2020-08-27 Thread Manas Kale
Hi Robert, Thanks for the info! On Thu, Aug 27, 2020 at 8:01 PM Robert Metzger wrote: > Hi, > > Check out the KafkaDeserializationSchema ( > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#the-deserializationschema) > which allows you to deserialize the key

Not able to Assign Watermark in Flink 1.11

2020-08-27 Thread aj
I am getting this error when trying to assign watermark in Flink 1.11 *"Cannot resolve method 'withTimestampAssigner(anonymous org.apache.flink.api.common.eventtime.SerializableTimestampAssigner)'"* FlinkKafkaConsumer bookingFlowConsumer = new FlinkKafkaConsumer(topics, new KafkaGenericAvroD

Re: Performance issue associated with managed RocksDB memory

2020-08-27 Thread Yun Tang
Hi Juha Thanks for your enthusiasm to dig this problem and sorry for jumping in late for this thread to share something about write buffer manager in RocksDB. First of all, the reason why you meet the poor performance is due to writer buffer manager has been assigned a much lower limit (due to

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
Hi, This is a known issue in 1.11.0, and has been fixed in 1.11.1. Best, Jark On Fri, 28 Aug 2020 at 06:52, Rex Fenley wrote: > Hi again! > > I'm tested out locally in docker on Flink 1.11 first to get my bearings > before downgrading to 1.10 and figuring out how to replace the Debezium > con

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Walter Peng
congrats! Yun Tang wrote: Congratulations , Dian!

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yun Tang
Congratulations , Dian! Best Yun Tang From: Yang Wang Sent: Friday, August 28, 2020 10:28 To: Arvid Heise Cc: Benchao Li ; dev ; user-zh ; Dian Fu ; user Subject: Re: [ANNOUNCE] New PMC member: Dian Fu Congratulations Dian ! Best, Yang Arvid Heise mailto:a

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yang Wang
Congratulations Dian ! Best, Yang Arvid Heise 于2020年8月28日周五 上午1:39写道: > Congrats Dian :) > > On Thu, Aug 27, 2020 at 5:01 PM Benchao Li wrote: > >> Congratulations Dian! >> >> Cranmer, Danny 于2020年8月27日周四 下午10:55写道: >> >>> Congratulations Dian! :D >>> >>> On 27/08/2020, 15:25, "Robert M

Re: OOM error for heap state backend.

2020-08-27 Thread Congxian Qiu
Hi The stack said that the job failed when restoring from checkpoint/savepoint. If encounter this when in failover, maybe you can try to find out the root cause which caused the job failover. For the stack, it because when restoring `HeapPriorityQueue`, there would ensure there are enough siz

Re: Debezium Flink EMR

2020-08-27 Thread Rex Fenley
Hi again! I'm tested out locally in docker on Flink 1.11 first to get my bearings before downgrading to 1.10 and figuring out how to replace the Debezium connector. However, I'm getting the following error ``` Provided trait [BEFORE_AND_AFTER] can't satisfy required trait [ONLY_UPDATE_AFTER]. This

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Ah, I think the "Result Updating" is what got me -- INNER joins do the job! On Thu, Aug 27, 2020 at 3:38 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > oops, the example query should actually be: > > SELECT table_1.a, table_1.b, table_2.c > FROM table_1 > LEFT OUTER JOIN table_2 ON

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
oops, the example query should actually be: SELECT table_1.a, table_1.b, table_2.c FROM table_1 LEFT OUTER JOIN table_2 ON table_1.b = table_2.b; and duplicate results should actually be: Record(a = "data a 1", b = "data b 1", c = "data c 1") Record(a = "data a 1", b = "data b 1", c = null) Reco

Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Hey all, I've got a Flink 1.10 Streaming SQL job using the Blink Planner that is reading from a few CSV files and joins some records across them into a couple of data streams (yes, this could be a batch job won't get into why we chose streams unless it's relevant). These joins are producing some d

Re: Failures due to inevitable high backpressure

2020-08-27 Thread Arvid Heise
Hi Hubert, The most straight-forward reason for backpressure is under-provisioning of the cluster. An application over time usually needs gradually more resources. If the user base of your company grows, so does the amount of messages (be it click stream, page impressions, or any kind of transacti

Re: Debezium Flink EMR

2020-08-27 Thread Rex Fenley
Thanks! On Thu, Aug 27, 2020 at 5:33 AM Jark Wu wrote: > Hi, > > Regarding the performance difference, the proposed way will have one more > stateful operator (deduplication) than the native 1.11 cdc support. > The overhead of the deduplication operator is just similar to a simple > group by agg

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-27 Thread Arvid Heise
Hi Averell, This is a known bug [1] caused by the used AWS S3 library not respecting the classloader [2]. The best solution is to upgrade to 1.10.1 (or take the s3-hadoop jar from 1.10.1). Don't try to put Xerces manually anywhere. [1] https://issues.apache.org/jira/browse/FLINK-16014 [2] https:

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Arvid Heise
Congrats Dian :) On Thu, Aug 27, 2020 at 5:01 PM Benchao Li wrote: > Congratulations Dian! > > Cranmer, Danny 于2020年8月27日周四 下午10:55写道: > >> Congratulations Dian! :D >> >> On 27/08/2020, 15:25, "Robert Metzger" wrote: >> >> CAUTION: This email originated from outside of the organizatio

Re: Example flink run with security options? Running on k8s in my case

2020-08-27 Thread Nico Kruber
Actually, your curl command may be incorrect since you didn't specify https as the protocol: Its man page says: > If you specify URL without protocol:// prefix, curl will attempt to guess > what protocol you might want. It will then default to HTTP but try other > protocols based on often-used hos

Re: SDK vs Connectors

2020-08-27 Thread Robert Metzger
Hi Prasanna, (General remark: For such questions, please send the email only to user@flink.apache.org. There's no need to email to dev@ as well.) I don't think Flink can do much if the library you are using isn't throwing exceptions. Maybe the library has other means of error reporting (a callbac

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Benchao Li
Congratulations Dian! Cranmer, Danny 于2020年8月27日周四 下午10:55写道: > Congratulations Dian! :D > > On 27/08/2020, 15:25, "Robert Metzger" wrote: > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and

Re: OOM error for heap state backend.

2020-08-27 Thread Robert Metzger
Hi Vishwas, Your scenario sounds like RocksDB would actually be recommended. I would always suggest to start with RocksDB, unless your state is really small compared to the available memory, or you need to optimize for performance. But maybe your job is running fine with RocksDB (performance wise)

Re: Default Flink Metrics Graphite

2020-08-27 Thread Robert Metzger
I don't think these error messages give us a hint why you can't see the metrics (because they are about registering metrics, not reporting them) Are you sure you are using the right configuration parameters for Flink 1.10? That all required JARs are in the lib/ folder (on all machines) and that yo

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-27 Thread Robert Metzger
Hi, I guess you've loaded the S3 filesystem using the s3 FS plugin. You need to put the right jar file containing the SAX2 driver class into the plugin directory where you've also put the S3 filesystem plugin. You can probably find out the name of the right sax2 jar file from your local setup wher

Re: Different deserialization schemas for Kafka keys and values

2020-08-27 Thread Robert Metzger
Hi, Check out the KafkaDeserializationSchema ( https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#the-deserializationschema) which allows you to deserialize the key and value bytes coming from Kafka. Best, Robert On Thu, Aug 27, 2020 at 1:56 PM Manas Kale wr

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Robert Metzger
Congratulations Dian! On Thu, Aug 27, 2020 at 3:09 PM Congxian Qiu wrote: > Congratulations Dian > Best, > Congxian > > > Xintong Song 于2020年8月27日周四 下午7:50写道: > > > Congratulations Dian~! > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Thu, Aug 27, 2020 at 7:42 PM Jark Wu wrote: > >

Re: Resource leak in DataSourceNode?

2020-08-27 Thread Robert Metzger
Hi Mark, Thanks a lot for your message and the good investigation! I believe you've found a bug in Flink. I filed an issue for the problem: https://issues.apache.org/jira/browse/FLINK-19064. Would you be interested in opening a pull request to fix this? Otherwise, I'm sure a committer will pick u

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Congxian Qiu
Congratulations Dian Best, Congxian Xintong Song 于2020年8月27日周四 下午7:50写道: > Congratulations Dian~! > > Thank you~ > > Xintong Song > > > > On Thu, Aug 27, 2020 at 7:42 PM Jark Wu wrote: > > > Congratulations Dian! > > > > Best, > > Jark > > > > On Thu, 27 Aug 2020 at 19:37, Leonard Xu wrote: >

Re: Async IO with SQL API

2020-08-27 Thread Jark Wu
Hi, Sorry for the late reply. AFAIK, it's impossible to do Async IO on pure Table API / SQL in 1.9 old planner. A doable way is convert the Table into DataStream and apply AsyncFunction on it. Best, Jark On Thu, 20 Aug 2020 at 00:35, Spurthi Chaganti wrote: > Thank you Till for your response.

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
Hi, Regarding the performance difference, the proposed way will have one more stateful operator (deduplication) than the native 1.11 cdc support. The overhead of the deduplication operator is just similar to a simple group by aggregate (max on each non-key column). Best, Jark On Tue, 25 Aug 2020

Re: [Survey] Demand collection for stream SQL window join

2020-08-27 Thread Jark Wu
Thanks for the survey! I'm also interested on the use cases of DataStream window join. Best, Jark On Thu, 27 Aug 2020 at 14:40, Danny Chan wrote: > Hi, users, here i want to collect some use cases about the window join[1], > which is a supported feature on the data stream. The purpose is to ma

Different deserialization schemas for Kafka keys and values

2020-08-27 Thread Manas Kale
Hi, I have a kafka topic on which the key is serialized in a custom format and the value is serialized as JSON. How do I create a FlinkKafakConsumer that has different deserialization schemas for the key and value? Here's what I tried: FlinkKafkaConsumer> advancedFeatureData = new FlinkKafkaConsum

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Xintong Song
Congratulations Dian~! Thank you~ Xintong Song On Thu, Aug 27, 2020 at 7:42 PM Jark Wu wrote: > Congratulations Dian! > > Best, > Jark > > On Thu, 27 Aug 2020 at 19:37, Leonard Xu wrote: > > > Congrats, Dian! Well deserved. > > > > Best > > Leonard > > > > > 在 2020年8月27日,19:34,Kurt Young

Re: JSON to Parquet

2020-08-27 Thread Averell
Hi Dawid, Thanks for the suggestion. So, basically I'll need to use the JSON connector to get the JSON strings into Rows, and from Rows to Parquet records using the parquet connecter? I have never tried the TableAPI in the past, have been using the StreamingAPI only. Will follow your suggestion no

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Jark Wu
Congratulations Dian! Best, Jark On Thu, 27 Aug 2020 at 19:37, Leonard Xu wrote: > Congrats, Dian! Well deserved. > > Best > Leonard > > > 在 2020年8月27日,19:34,Kurt Young 写道: > > > > Congratulations Dian! > > > > Best, > > Kurt > > > > > > On Thu, Aug 27, 2020 at 7:28 PM Rui Li wrote: > > > >>

SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-27 Thread Averell
Hello, I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well as writing output to S3a using StreamingFileSink. The job runs well until I add the Java Hadoop properties: /-Dfs.s3a.acl.default= BucketOwnerFullControl/. Since after that, the checkpoint process fails to complete

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Leonard Xu
Congrats, Dian! Well deserved. Best Leonard > 在 2020年8月27日,19:34,Kurt Young 写道: > > Congratulations Dian! > > Best, > Kurt > > > On Thu, Aug 27, 2020 at 7:28 PM Rui Li wrote: > >> Congratulations Dian! >> >> On Thu, Aug 27, 2020 at 5:39 PM Yuan Mei wrote: >> >>> Congrats! >>> >>> On T

RE: Example flink run with security options? Running on k8s in my case

2020-08-27 Thread Adam Roberts
Hey folks, outside of Kubernetes things are great yep, with the same generated files.   So to share what I'm doing a little more... and I've modified things to be more inline with the current docs   keytool -genkeypair -alias flink.internal -keystore internal.keystore -dname "CN=flink.internal" -st

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Kurt Young
Congratulations Dian! Best, Kurt On Thu, Aug 27, 2020 at 7:28 PM Rui Li wrote: > Congratulations Dian! > > On Thu, Aug 27, 2020 at 5:39 PM Yuan Mei wrote: > >> Congrats! >> >> On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang wrote: >> >>> Congratulations Dian! >>> >>> Best, >>> Xingbo >>> >>> ji

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Rui Li
Congratulations Dian! On Thu, Aug 27, 2020 at 5:39 PM Yuan Mei wrote: > Congrats! > > On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang wrote: > >> Congratulations Dian! >> >> Best, >> Xingbo >> >> jincheng sun 于2020年8月27日周四 下午5:24写道: >> >>> Hi all, >>> >>> On behalf of the Flink PMC, I'm happy to

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yangze Guo
Congrats Dian! Best, Yangze Guo On Thu, Aug 27, 2020 at 6:26 PM Zhu Zhu wrote: > > Congratulations Dian! > > Thanks, > Zhu > > Zhijiang 于2020年8月27日周四 下午6:04写道: > > > Congrats, Dian! > > > > -- > > From:Yun Gao > > Send Time:2020年8

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Zhu Zhu
Congratulations Dian! Thanks, Zhu Zhijiang 于2020年8月27日周四 下午6:04写道: > Congrats, Dian! > > -- > From:Yun Gao > Send Time:2020年8月27日(星期四) 17:44 > To:dev ; Dian Fu ; user < > user@flink.apache.org>; user-zh > Subject:Re: Re: [ANNOUNC

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Zhijiang
Congrats, Dian! -- From:Yun Gao Send Time:2020年8月27日(星期四) 17:44 To:dev ; Dian Fu ; user ; user-zh Subject:Re: Re: [ANNOUNCE] New PMC member: Dian Fu Congratulations Dian ! Best Yun ---

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yun Gao
Congratulations Dian ! Best Yun -- Sender:Marta Paes Moreira Date:2020/08/27 17:42:34 Recipient:Yuan Mei Cc:Xingbo Huang; jincheng sun; dev; Dian Fu; user; user-zh Theme:Re: [ANNOUNCE] New PMC member: Dian Fu Congrats, Dian! On

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Paul Lam
Congrats, Dian! Best, Paul Lam > 2020年8月27日 17:42,Marta Paes Moreira 写道: > > Congrats, Dian! > > On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei > wrote: > Congrats! > > On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang > wrote: > Congratulatio

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Marta Paes Moreira
Congrats, Dian! On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei wrote: > Congrats! > > On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang wrote: > >> Congratulations Dian! >> >> Best, >> Xingbo >> >> jincheng sun 于2020年8月27日周四 下午5:24写道: >> >>> Hi all, >>> >>> On behalf of the Flink PMC, I'm happy to annou

Re: [DISCUSS] Remove Kafka 0.10.x connector (and possibly 0.11.x)

2020-08-27 Thread Paul Lam
Hi, I think it’s okay, given that we can either migrate to the universal connector or still use the compatible 0.10/0.11 connector of 1.11 release as Chesnay mentioned when upgrading to 1.12. IIUC, the migration process to the universal connector would be (please correct me if I’m wrong): 1. S

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yuan Mei
Congrats! On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang wrote: > Congratulations Dian! > > Best, > Xingbo > > jincheng sun 于2020年8月27日周四 下午5:24写道: > >> Hi all, >> >> On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now >> part of the Apache Flink Project Management Committee (PM

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Xingbo Huang
Congratulations Dian! Best, Xingbo jincheng sun 于2020年8月27日周四 下午5:24写道: > Hi all, > > On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now part > of the Apache Flink Project Management Committee (PMC). > > Dian Fu has been very active on PyFlink component, working on various >

[ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread jincheng sun
Hi all, On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now part of the Apache Flink Project Management Committee (PMC). Dian Fu has been very active on PyFlink component, working on various important features, such as the Python UDF and Pandas integration, and keeps checking an

Re: [DISCUSS] Remove Kafka 0.10.x connector (and possibly 0.11.x)

2020-08-27 Thread Aljoscha Krettek
@Konstantin: Yes, I'm talking about dropping those modules. We don't have any special code for supporting Kafka 0.10/0.11 in the "modern" connector, that comes from the Kafka Consumer/Producer code we're using. @Paul: The modern Kafka connector works with Kafka brokers as far back as 0.10, wou

Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-27 Thread Zhijiang
Congrats, thanks for the release manager work Zhu Zhu and everyone involved in! Best, Zhijiang -- From:liupengcheng Send Time:2020年8月26日(星期三) 19:37 To:dev ; Xingbo Huang Cc:Guowei Ma ; user-zh ; Yangze Guo ; Dian Fu ; Zhu Zhu ; us