from:"Eleanore Jin"

Default Flink S3 FileSource timeout due to large file listing

2023-09-25 文章 Eleanore Jin

Hello Flink Community, Flink Version: 1.16.1, Zookeeper for HA. My Flink Applications reads raw parquet files hosted in S3, applies transformations and re-writes them to S3, under a different location. Below is my code to read from parquets from S3: ``` final Configuration configuration = new C

Re: Container is running beyond physical memory limits

2021-02-20 文章 Eleanore Jin

Hi 这是我之前看到一篇关于OOM KILL 的分析文章，不知道对你有没有用 http://www.whitewood.me/2021/01/02/%E8%AF%A6%E8%A7%A3-Flink-%E5%AE%B9%E5%99%A8%E5%8C%96%E7%8E%AF%E5%A2%83%E4%B8%8B%E7%9A%84-OOM-Killed/ On Thu, Feb 18, 2021 at 9:01 AM lian wrote: > 各位大佬好： > 1. 背景：使用Flink > SQL实现回撤流的功能，使用了Last_Value，第二层聚合进行sum求和计算，主要是依

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-10-02 文章 Eleanore Jin

27/2020 7:02 PM, Eleanore Jin wrote: > > I have noticed this: if I have Thread.sleep(1500); after the patch call > returned 202, then the directory gets cleaned up, in the meanwhile, it > shows the job-manager pod is in completed state before getting terminated: > see screenshot: htt

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 文章 Eleanore Jin

the job? Is there a way to check if cancel is completed? So that the stop tm and jm can be called afterwards? Thanks a lot! Eleanore On Sun, Sep 27, 2020 at 9:37 AM Eleanore Jin wrote: > Hi Congxian, > I am making rest call to get the checkpoint config: curl -X GET \ > > http://lo

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 文章 Eleanore Jin

NCELLATION` is set, then the > checkpoint will be kept when canceling a job. > > PS the image did not show > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints > Best, > Congxian > > > Eleanore Jin 于2020年9月27日

Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-26 文章 Eleanore Jin

Hi experts, I am running flink 1.10.2 on kubernetes as per job cluster. Checkpoint is enabled, with interval 3s, minimumPause 1s, timeout 10s. I'm using FsStateBackend, snapshots are persisted to azure blob storage (Microsoft cloud storage service). Checkpointed state is just source kafka topic o

Re: 关于sink失败不消费kafka消息的处理

2020-08-28 文章 Eleanore Jin

rier消息并且把barrier消息之前的数据处理完成。 > 所以chk机制提交offset时，可以保证之前的数据已经写入sink，是at least once的。 > > Eleanore Jin 于2020年8月28日周五上午1:17写道： > > > 感谢大家的回答， > > > > 我用的是APACHE BEAM，然后RUNNER 用的是Flink，这里是Beam 提供的KAFKA 的CONNECTOR， > 如果看source > > 的话，它是有state checkpointed: Beam

Re: 关于sink失败不消费kafka消息的处理

2020-08-27 文章 Eleanore Jin

ache.org] > 发送时间: 2020年8月27日星期四 10:06 > 收件人: user-zh > 主题: Re: 关于sink失败不消费kafka消息的处理 > > Hi Eleanore，shizk233 同学给出的解释已经很全面了。 > > 对于你后面提的这个问题，我感觉这个理解应该不太正确。 > 开了checkpoint之后，虽然kafka producer没有用两阶段提交，但是也可以保证在checkpoint成功的时候 > 会将当前的所有数据flush出去。如果flush失败，那应该是会导致checkpoint失败

Re: 关于sink失败不消费kafka消息的处理

2020-08-26 文章 Eleanore Jin

也可以回滚，这样就能保证一致性。 > > 这样的话 chk就是at least once，chk+upsert或者chk+2pc就是exactly once了。 > > 3.kafka auto commit > chk快照的state不仅仅是source的offset，还有数据流中各个算子的state，chk机制会在整个数据流完成同一个chk > n的时候才提交offset。 > kafka auto commit不能保障这种全局的一致性，因为auto commit是自动的，不会等待整个数据流上同一chk n的完成。 > > Eleanore

Re: 关于sink失败不消费kafka消息的处理

2020-08-26 文章 Eleanore Jin

Hi Benchao 可以解释一下为什么sink没有两阶段提交，那就是at least once 的语义吗？比如source和 sink 都是kafka，如果 sink 不是两段式提交，那么checkpoint 的state 就只是source 的 offset，这种情况下和使用kafka auto commit offset 看起来似乎没有什么区别可否具体解释一下？谢谢！ Eleanore On Tue, Aug 25, 2020 at 9:59 PM Benchao Li wrote: > 这种情况需要打开checkpoint来保证数据的不丢。如果sink没有两阶段提交

改动source或sink operator后无法从savepoint恢复作业

2020-08-10 文章 Eleanore Jin

请教各位我用的是 Beam 2.23.0, with flink runner 1.8.2. 想要实验启动checkpoint 和 Beam KafkaIO EOS(exactly once semantics) 以后，添加或删除source/sink operator 然后从savepoint恢复作业的情况。我是在电脑上run kafka 和 flink cluster (1 job manager, 1 task manager) 下面是我尝试的不同场景： 1. 在SAVEPOINT 后，添加一个source topic 在savepoint之前: read from input

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 文章 Eleanore Jin

eporter > [2]. > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#availability > > Best, > Yang > > Eleanore Jin 于2020年8月5日周三下午11:52写道： > >> Hi Yang and Till, >> >> Thanks a lot for the help! I have the similar question as Till

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 文章 Eleanore Jin

set the resartPolicy and >>>> backoffLimit, >>>> this is not a clean and correct way to go. We should terminate the >>>> jobmanager process with zero exit code in such situation. >>>> >>>> @Till Rohrmann I just have one concern. Is it a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 文章 Eleanore Jin

t .spec.template.spec.restartPolicy = "Never" and >>> spec.backoffLimit = 0. >>> Refer here[1] for more information. >>> >>> Then, when the jobmanager failed because of any reason, the K8s job will >>> be marked failed. And K8s will not restart the

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 文章 Eleanore Jin

clusterframework/ApplicationStatus.java#L32 > > On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin > wrote: > >> Hi Experts, >> >> I have a flink cluster (per job mode) running on kubernetes. The job is >> configured with restart strategy >> >> restart-strate

Behavior for flink job running on K8S failed after restart strategy exhausted

2020-07-31 文章 Eleanore Jin

Hi Experts, I have a flink cluster (per job mode) running on kubernetes. The job is configured with restart strategy restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s So after 3 times retry, the job will be marked as FAILED, hence the pods are not running. However

Re: Flink Training - why cannot keyBy hour?

2020-07-02 文章 Eleanore Jin

space is unbounded, and we can't intervene to clean up > stale state. > > Regards, > David > > On Wed, Jul 1, 2020 at 2:26 AM Eleanore Jin > wrote: > >> Hi experts, >> >> I am going through Ververica flink training, and when doing the lab with >

Flink Training - why cannot keyBy hour?

2020-06-30 文章 Eleanore Jin

Hi experts, I am going through Ververica flink training, and when doing the lab with window (https://training.ververica.com/exercises/windows), basically it requires to compute within an hour which driver earns the most tip. The logic is to 0. keyBy driverId 1. create 1 hour window based on event

Re: run flink on edge vs hub

2020-05-18 文章 Eleanore Jin

8284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources > [3] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html > [4] > https:

run flink on edge vs hub

2020-05-17 文章 Eleanore Jin

Hi Community, Currently we are running flink in 'hub' data centers where data is ingested into the platform via kafka, and flink job will read from kafka, do the transformations, and publish to another kafka topic. I would also like to see if the same logic (read input message -> do transformatio

Re: Broadcast stream causing GC overhead limit exceeded

2020-05-06 文章 Eleanore Jin

e to wait for input from the > other side. > Since the error is not about lack of memory, the buffering in Flink state > might not be the problem here. > > Best, Fabian > > > > > > Am So., 3. Mai 2020 um 03:39 Uhr schrieb Eleanore Jin < > eleanore@gmail.com&g

Export user metrics with Flink Prometheus endpoint

2020-05-05 文章 Eleanore Jin

Hi all, I just wonder is it possible to use Flink Metrics endpoint to allow Prometheus to scrape user defined metrics? Context: In addition to Flink metrics, we also collect some application level metrics using opencensus. And we run opencensus agent as side car in kubernetes pod to collect metri

Broadcast stream causing GC overhead limit exceeded

2020-05-02 文章 Eleanore Jin

Hi All, I am using apache Beam with Flink (1.8.2). In my job, I am using Beam sideinput (which translates into Flink NonKeyedBroadcastStream) to do filter of the data from main stream. I have experienced OOM: GC overhead limit exceeded continuously. After did some experiments, I observed followi

Flink Task Manager GC overhead limit exceeded

2020-04-29 文章 Eleanore Jin

Hi All, Currently I am running a flink job cluster (v1.8.2) on kubernetes with 4 pods, each pod with 4 parallelism. The flink job reads from a source topic with 96 partitions, and does per element filter, the filtered value comes from a broadcast topic and it always use the latest message as the

Re: how to send back result via job manager to client

2020-04-19 文章 Eleanore Jin

Hi Kurt, 谢谢，我了解过后如果有问题再请教 Best Eleanore On Sun, Apr 19, 2020 at 7:18 PM Kurt Young wrote: > 可以看下这个jira：https://issues.apache.org/jira/browse/FLINK-14807 > > Best, > Kurt > > > On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin > wrote: > > > Hi， > > 刚

how to send back result via job manager to client

2020-04-19 文章 Eleanore Jin

Hi，刚刚读到一篇关于Flink 在OLAP 上的使用案例（ https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/），其中一点提到了: [image: image.png] 这部分优化，源于 OLAP 的一个特性：OLAP 会将最终计算结果发给客户端，通过JobManager 转发给 Client。想请问一下这一点是如何实现的：通过JobManager 把结果转发给Client. 谢谢！ Eleanore

Start flink job from the latest checkpoint programmatically

2020-03-12 文章 Eleanore Jin

Hi All, The setup of my flink application is allow user to start and stop. The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program. When restart a stopped job, it means to spin up new job cluster wit

Re: scaling issue Running Flink on Kubernetes

2020-03-11 文章 Eleanore Jin

for you, who >>> is one of the community experts in this area. >>> >>> Thank you~ >>> >>> Xintong Song >>> >>> >>> >>> On Wed, Mar 11, 2020 at 10:37 AM Eleanore Jin >>> wrote: >>> >>>

Re: scaling issue Running Flink on Kubernetes

2020-03-10 文章 Eleanore Jin

g., the existing TMs might have some file already localized, > or some memory buffers already promoted to the JVM tenured area, while the > new TMs have not. > > Thank you~ > > Xintong Song > > > > On Wed, Mar 11, 2020 at 9:25 AM Eleanore Jin > wrote: > >> Hi E

scaling issue Running Flink on Kubernetes

2020-03-10 文章 Eleanore Jin

Hi Experts, I have my flink application running on Kubernetes, initially with 1 Job Manager, and 2 Task Managers. Then we have the custom operator that watches for the CRD, when the CRD replicas changed, it will patch the Flink Job Manager deployment parallelism and max parallelism according to th

Re: Is incremental checkpoints needed?

2020-03-10 文章 Eleanore Jin

; > On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin > wrote: > >> Hi All, >> >> I am using Apache Beam to construct the pipeline, and this pipeline is >> running with Flink Runner. >> >> Both Source and Sink are Kafka topics, I have enabled Beam Exactly onc

Is incremental checkpoints needed?

2020-03-10 文章 Eleanore Jin

Hi All, I am using Apache Beam to construct the pipeline, and this pipeline is running with Flink Runner. Both Source and Sink are Kafka topics, I have enabled Beam Exactly once semantics. I believe how it works in beam is: the messages will be cached and not processed by the KafkaExactlyOnceSin

Re: How to test flink job recover from checkpoint

2020-03-04 文章 Eleanore Jin

if (SOME_CONDITION) { > > throw new RuntimeException("Lets test checkpointing"); > > } > > return value; > > } > > }); > > > > ~ Abhinav Bajaj > > > > > > *From: *Eleanor

How to test flink job recover from checkpoint

2020-03-04 文章 Eleanore Jin

Hi, I have a flink application and checkpoint is enabled, I am running locally using miniCluster. I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint? Thanks a lot! Eleanore

Default Flink S3 FileSource timeout due to large file listing

Re: Container is running beyond physical memory limits

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

Checkpoint dir is not cleaned up after cancel the job with monitoring API

Re: 关于sink失败不消费kafka消息的处理

Re: 关于sink失败不消费kafka消息的处理

Re: 关于sink失败不消费kafka消息的处理

Re: 关于sink失败不消费kafka消息的处理

改动source或sink operator后无法从savepoint恢复作业

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Behavior for flink job running on K8S failed after restart strategy exhausted

Re: Flink Training - why cannot keyBy hour?

Flink Training - why cannot keyBy hour?

Re: run flink on edge vs hub

run flink on edge vs hub

Re: Broadcast stream causing GC overhead limit exceeded

Export user metrics with Flink Prometheus endpoint

Broadcast stream causing GC overhead limit exceeded

Flink Task Manager GC overhead limit exceeded

Re: how to send back result via job manager to client

how to send back result via job manager to client

Start flink job from the latest checkpoint programmatically

Re: scaling issue Running Flink on Kubernetes

Re: scaling issue Running Flink on Kubernetes

scaling issue Running Flink on Kubernetes

Re: Is incremental checkpoints needed?

Is incremental checkpoints needed?

Re: How to test flink job recover from checkpoint

How to test flink job recover from checkpoint

34 matches

Site Navigation

Mail list logo

Footer information