Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Martijn Visser
I also noticed that we two replies in a separate thread on the User mailing list, which can be found at https://lists.apache.org/thread/m5ntl3cj81wg7frbfqg9v75c7hqnxtls. I've included Clayton and David in this email, to at least centralize the conversation once more :) On Wed, Oct 5, 2022 at

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread David Anderson
I want to clarify one point here, which is that modifying jobs written in Scala to use Flink's Java API does not require porting them to Java. I can readily understand why folks using Scala might rather use Java 17 than Java 11, but sticking to Scala will remain an option even if Flink's Scala API

Re: ClassNotFoundException when loading protobuf message class in Flink SQL

2022-10-05 Thread James McGuire via user
Thanks for the tip, I will use the flink-sql-connector-kafka jar instead. I was able to get it to work by moving the SimpleTest.jar inside the lib folder. I am not sure why this worked but passing in the jar with the --jar flag did not work. Thanks, James On Tue, Oct 4, 2022 at 7:32 PM Benchao

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Martijn Visser
@Maciek I saw that I missed replying to your question: > Could you please remind what was the conclusion of discussion on upgrading Scala to 2.12.15/16? > https://lists.apache.org/thread/hwksnsqyg7n3djymo7m1s7loymxxbc3t - I couldn't find any follow-up vote? There is a vote thread, but that

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Martijn Visser
Hi Andrew, While definitely no expert on this topic, my first thought was if this idea could be solved with the idea that was proposed in FLIP-246 https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source I'm also looping in Mason Chen who was the initiator of that

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Andrew Otto
(Ah, note that I am considering very simple streaming apps here, e.g. event enrichment apps. No windowing or complex state. The only state is the Kafka offsets, which I suppose would also have to be managed from Kafka, not from Flink state.) On Wed, Oct 5, 2022 at 9:54 AM Andrew Otto wrote:

KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-05 Thread Andrew Otto
Hi all, *tl;dr: Would it be possible to make a KafkaSource that uses Kafka's built in consumer assignment for Flink tasks?* At the Wikimedia Foundation we are evaluating whether we can use a Kafka 'stretch' cluster to simplify the multi-datacenter

Re: How to rebalance a Flink streaming table?

2022-10-05 Thread Yaroslav Tkachenko
Hey Pavel, I was looking for something similar a while back and the best thing I came up with was using the DataStream API to do all the shuffling and THEN converting the stream to a table using fromDataStream/fromChangelogStream. On Wed, Oct 5, 2022 at 4:54 AM Pavel Penkov wrote: > I have a

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Gaël Renoux
> I'm curious what target Scala versions people are currently interested in. > I would've expected that everyone wants to migrate to Scala 3, for which several wrapper projects around Flink already exist The Scala 3 tooling is still subpar (we're using IntelliJ), so I'm not sure I would move my

Support Checkpointing in MongoDB Changestream Source(s)

2022-10-05 Thread Armin Schnabel
Dear Flink community, my Flink pipeline listens on inserts into MongoDB GridFS and processes the inserted document which is then written into a MongoDB Sink again, which is working so far. Now I want to enable Checkpointing but after reading the documentations, training slides, stackoverflow,

Re: Flink FaultTolerant at operator level

2022-10-05 Thread Krzysztof Chmielewski
I had a similar use case. What we did is that we decided that data for enrichment must be versioned, for example our enrichment data was "refreshed" once a day and we kept old data. During the enrichment process we lookup data for given version based on record's metadata. Regards. Krzysztof

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Chesnay Schepler
> It's possible that for the sake of the Scala API, we would occasionally require some changes in the Java API. As long as those changes are not detrimental to Java users, they should be considered. That's exactly the model we're trying to get to. Don't fix scala-specific issues with scala

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Gaël Renoux
Hello everyone, I've already answered a bit on Twitter, I'll develop my thoughts a bit here. For context, my company (DataDome) has a significant codebase on Scala Flink (around 110K LOC), having been using it since 2017. I myself am an enthusiastic Scala developer (I don't think I'd like moving

How to rebalance a Flink streaming table?

2022-10-05 Thread Pavel Penkov
I have a table that reads a Kafka topic and effective parallelism is equal to the number of Kafka partitions. Is there a way to reshuffle the data like with DataStream API to increase effective parallelism?

Old s3 files referenced in sink's state after migration from 1.14 to 1.15

2022-10-05 Thread Vararu, Vadim
Hi all, We have some jobs that write parquet files in s3, bucketing by processing time in a structure like /year/month/day/hour. On 13th of September, we have migrated our Flink runtime 1.14.5 to 1.15.2 and now we have some jobs crashing at checkpointing because of being unable to find some

Flink FaultTolerant at operator level

2022-10-05 Thread Great Info
I have flink job and the current flow looks like below Source_Kafka->*operator-1*(key by partition)->*operator-2*(enrich the record)-*Sink1-Operator* & *Sink2-Operator * With this flow the current problem is at operator-2, the core logic runs here is to lookup some reference status data from

RE: Re:Question about Flink Broadcast State event ordering

2022-10-05 Thread Qing Lim
Oh, thank you for your explanation! From: 仙路尽头谁为峰 Sent: 05 October 2022 09:13 To: Qing Lim Cc: User Subject: 回复: Re:Question about Flink Broadcast State event ordering Hi Qing: The key point is that the broadcast side may have different partitions that interleaves. If you can make sure

回复: Re:Question about Flink Broadcast State event ordering

2022-10-05 Thread 仙路尽头谁为峰
Hi Qing: The key point is that the broadcast side may have different partitions that interleaves. If you can make sure those messages you want to be ordered go into the same partition, then I think the order can be reserved. Best regards! 从 Windows 版邮件发送 发件人: Qing Lim 发送时间: 2022年10月5日 15:16

Re:如何处理Flink KafkaSource的异常的数据

2022-10-05 Thread RS
Hi, 当前的SQL是不支持的,需要的话,可以自己实现一个connector或者UDF,把错误数据输出到其他地方 Thanks 在 2022-09-29 10:02:34,"Summer" 写道: > >你好,我想问一下,如果来源于Kakfka的一条数据出现错误,会导致任务执行失败,日志抛出该条错误数据。 > > >为保证任务执行,需要在*** WITH内加'value.json.ignore-parse-errors' = 'true', >'value.json.fail-on-missing-field' = 'false' > > > >

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-05 Thread Maciek Próchniak
Hi Martin, Could you please remind what was the conclusion of discussion on upgrading Scala to 2.12.15/16? https://lists.apache.org/thread/hwksnsqyg7n3djymo7m1s7loymxxbc3t - I couldn't find any follow-up vote? If it's acceptable to break binary compatibility by such an upgrade, then

RE: Re:Question about Flink Broadcast State event ordering

2022-10-05 Thread Qing Lim
Hi, thanks for answering my question. Is there anyway to make the order reflecting the upstream? I wish to broadcast messages that has deletion semantic, so ordering matters here. I guess worst case I can use some logical timestamp to reason about order at downstream. From: xljtswf2022 Sent: