Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread 刘建刚
Good work for flink's batch processing! Remote shuffle service can resolve the container lost problem and reduce the running time for batch jobs once failover. We have investigated the component a lot and welcome Flink's native solution. We will try it and help improve it. Thanks, Liu Jiangang Yi

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yun Gao
Very thanks for all the warm responses ! We are greatly welcome more use cases and co-work on Flink Remote Shuffle and bash processing with Flink~ Best, Yun -- From:Yingjie Cao Send Time:2021 Dec. 1 (Wed.) 11:16 To:dev Subject:Re

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-30 Thread Hang Ruan
Hi, In 1.12.0-1.12.5 versions, committing offset to kafka when the checkpoint is open is the default behavior in KafkaSourceBuilder. And it can not be changed in KafkaSourceBuilder. By this FLINK-24277 , we could change the behavior. This problem

Re: Does Flink ever delete any sink S3 files?

2021-11-30 Thread Yun Gao
Hi Dan, The file sink would first write records to temporary files, namely .part-*, and commit them on checkpoint succeeding by renaming them to formal files, namely part-*. Best, Yun -- From:Dan Hill Send Time:2021 Dec. 1 (Wed.)

Does Flink ever delete any sink S3 files?

2021-11-30 Thread Dan Hill
Hi. I'm debugging an issue with a system that listens for files written to S3. I'm assuming Flink does not modify sink objects after they've been written. I've seen a minicluster test locally write a ".part-" file. I wanted to double check to make sure S3 sinks won't write and delete files. ```

Re: REST API for detached minicluster based integration test

2021-11-30 Thread Caizhi Weng
Hi! I see. So to test your watermark strategy you would like to fetch the watermarks downstream. I would suggest taking a look at org.apache.flink.streaming.api.operators.AbstractStreamOperator. This class has a processWatermark method, which is called when a watermark flows through this operator

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Jingsong Li
Amazing! Thanks Yingjie and all contributors for your great work. Best, Jingsong On Wed, Dec 1, 2021 at 10:52 AM Yun Tang wrote: > > Great news! > Thanks for all the guys who contributed in this project. > > Best > Yun Tang > > On 2021/11/30 16:30:52 Till Rohrmann wrote: > > Great news, Yingjie

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yun Tang
Great news! Thanks for all the guys who contributed in this project. Best Yun Tang On 2021/11/30 16:30:52 Till Rohrmann wrote: > Great news, Yingjie. Thanks a lot for sharing this information with the > community and kudos to all the contributors of the external shuffle service > :-) > > Cheers,

Re: REST API for detached minicluster based integration test

2021-11-30 Thread Jin Yi
thanks for the reply caizhi! we're on flink 1.12.3. in the test, i'm using a custom watermark strategy that is derived from BoundedOutOfOrdernessWatermarks that emits watermarks using processing time after a period of no events to keep the timer-reliant operators happy. basically, it's using eve

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-30 Thread Marco Villalobos
Thanks! However, I noticed that KafkaSourceOptions.COMMIT_OFFSETS_ON_CHECKPOINT does not exist in Flink 1.12. Is that property supported with the string "commit.offsets.on.checkpoints"? How do I configure that behavior so that offsets get committed on checkpoints in Flink 1.12 when using the Kaf

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Till Rohrmann
Great news, Yingjie. Thanks a lot for sharing this information with the community and kudos to all the contributors of the external shuffle service :-) Cheers, Till On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao wrote: > Hi dev & users, > > We are happy to announce the open source of remote shuffl

Re: Query regarding exceptions API(/jobs/:jobid/exceptions)

2021-11-30 Thread Matthias Pohl
Thanks for sharing this information. I verified that it's a bug in Flink. The issue is that the Exceptions you're observing are happening while the job is initialized. We're not setting the exception history properly in that case. Matthias On Mon, Nov 29, 2021 at 2:08 PM Mahima Agarwal wrote: >

Re: FLink Accessing two hdfs cluster

2021-11-30 Thread David Morávek
Hi chenqizhu, this exception doesn't seem to come from Flink, but rather from a YARN container bootstrap. When YARN container starts up, it needs to download resources from HDFS (your job archives / configuration / distributed cache / ...) which are necessary for startup of the user application (

Re: how to run streaming process after batch process is completed?

2021-11-30 Thread Alexander Preuß
Hi Vtygoss, Can you explain a bit more about your ideal pipeline? Is the batch data bounded data or could you also process it in streaming execution mode? And is the streaming data derived from the batch data or do you just want to ensure that the batch has been finished before running the process

[ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 Thread Yingjie Cao
Hi dev & users, We are happy to announce the open source of remote shuffle project [1] for Flink. The project is originated in Alibaba and the main motivation is to improve batch data processing for both performance & stability and further embrace cloud native. For more features about the project,

Re: How to Fan Out to 100s of Sinks

2021-11-30 Thread Fabian Paul
Hi Shree, I think for every Iceberg Table you have to instantiate a different sink in your program. You basically have one operator before your sinks that decides where to route the records. You probably end up with one Iceberg sink for each of your customers. Maybe you can take a look at the Demu

how to run streaming process after batch process is completed?

2021-11-30 Thread vtygoss
Hi, community! By Flink, I want to unify batch process and streaming process in data production pipeline. Batch process is used to process inventory data, then streaming process is used to process incremental data. But I meet a problem, there is no state in batch and the result is error if i

Re: Parquet schema per bucket in Streaming File Sink

2021-11-30 Thread Francesco Guardiani
Hi Zack, > I want to customize this job to "explode" the map as column names and values You can do this in a select statement extracting manually the map values using the map access built-in

FLink Accessing two hdfs cluster

2021-11-30 Thread chenqizhu
hi, Flink version 1.13 supports configuration of Hadoop properties in flink-conf.yaml via flink.hadoop.*. There is A requirement to write checkpoint to HDFS with SSDS (called Bcluster ) to speed checkpoint writing, but this HDFS cluster is not the default HDFS in the flink client (called Ac