Hello Flink Community,
Flink Version: 1.16.1, Zookeeper for HA.
My Flink Applications reads raw parquet files hosted in S3, applies
transformations and re-writes them to S3, under a different location.
Below is my code to read from parquets from S3:
```
final Configuration configuration = new C
Hi
这是我之前看到一篇关于OOM KILL 的分析文章,不知道对你有没有用
http://www.whitewood.me/2021/01/02/%E8%AF%A6%E8%A7%A3-Flink-%E5%AE%B9%E5%99%A8%E5%8C%96%E7%8E%AF%E5%A2%83%E4%B8%8B%E7%9A%84-OOM-Killed/
On Thu, Feb 18, 2021 at 9:01 AM lian wrote:
> 各位大佬好:
> 1. 背景:使用Flink
> SQL实现回撤流的功能,使用了Last_Value,第二层聚合进行sum求和计算,主要是依
27/2020 7:02 PM, Eleanore Jin wrote:
>
> I have noticed this: if I have Thread.sleep(1500); after the patch call
> returned 202, then the directory gets cleaned up, in the meanwhile, it
> shows the job-manager pod is in completed state before getting terminated:
> see screenshot: htt
the job? Is there a way to check if
cancel is completed? So that the stop tm and jm can be called afterwards?
Thanks a lot!
Eleanore
On Sun, Sep 27, 2020 at 9:37 AM Eleanore Jin wrote:
> Hi Congxian,
> I am making rest call to get the checkpoint config: curl -X GET \
>
> http://lo
NCELLATION` is set, then the
> checkpoint will be kept when canceling a job.
>
> PS the image did not show
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints
> Best,
> Congxian
>
>
> Eleanore Jin 于2020年9月27日
Hi experts,
I am running flink 1.10.2 on kubernetes as per job cluster. Checkpoint is
enabled, with interval 3s, minimumPause 1s, timeout 10s. I'm
using FsStateBackend, snapshots are persisted to azure blob storage
(Microsoft cloud storage service).
Checkpointed state is just source kafka topic o
rier消息并且把barrier消息之前的数据处理完成。
> 所以chk机制提交offset时,可以保证之前的数据已经写入sink,是at least once的。
>
> Eleanore Jin 于2020年8月28日周五 上午1:17写道:
>
> > 感谢大家的回答,
> >
> > 我用的是APACHE BEAM, 然后RUNNER 用的是Flink, 这里是Beam 提供的KAFKA 的CONNECTOR,
> 如果看source
> > 的话,它是有state checkpointed: Beam
ache.org]
> 发送时间: 2020年8月27日 星期四 10:06
> 收件人: user-zh
> 主题: Re: 关于sink失败 不消费kafka消息的处理
>
> Hi Eleanore,shizk233 同学给出的解释已经很全面了。
>
> 对于你后面提的这个问题,我感觉这个理解应该不太正确。
> 开了checkpoint之后,虽然kafka producer没有用两阶段提交,但是也可以保证在checkpoint成功的时候
> 会将当前的所有数据flush出去。如果flush失败,那应该是会导致checkpoint失败
也可以回滚,这样就能保证一致性。
>
> 这样的话 chk就是at least once,chk+upsert或者chk+2pc就是exactly once了。
>
> 3.kafka auto commit
> chk快照的state不仅仅是source的offset,还有数据流中各个算子的state,chk机制会在整个数据流完成同一个chk
> n的时候才提交offset。
> kafka auto commit不能保障这种全局的一致性,因为auto commit是自动的,不会等待整个数据流上同一chk n的完成。
>
> Eleanore
Hi Benchao
可以解释一下为什么sink没有两阶段提交,那就是at least once 的语义吗? 比如source和 sink 都是kafka, 如果 sink
不是两段式提交,那么checkpoint 的state 就只是source 的 offset,这种情况下和使用kafka auto commit
offset 看起来似乎没有什么区别
可否具体解释一下? 谢谢!
Eleanore
On Tue, Aug 25, 2020 at 9:59 PM Benchao Li wrote:
> 这种情况需要打开checkpoint来保证数据的不丢。如果sink没有两阶段提交
请教各位
我用的是 Beam 2.23.0, with flink runner 1.8.2. 想要实验启动checkpoint 和 Beam KafkaIO
EOS(exactly once semantics) 以后,添加或删除source/sink operator
然后从savepoint恢复作业的情况。我是在电脑上run kafka 和 flink cluster (1 job manager, 1 task
manager)
下面是我尝试的不同场景:
1. 在SAVEPOINT 后,添加一个source topic
在savepoint之前: read from input
eporter
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#availability
>
> Best,
> Yang
>
> Eleanore Jin 于2020年8月5日周三 下午11:52写道:
>
>> Hi Yang and Till,
>>
>> Thanks a lot for the help! I have the similar question as Till
set the resartPolicy and
>>>> backoffLimit,
>>>> this is not a clean and correct way to go. We should terminate the
>>>> jobmanager process with zero exit code in such situation.
>>>>
>>>> @Till Rohrmann I just have one concern. Is it a
t .spec.template.spec.restartPolicy = "Never" and
>>> spec.backoffLimit = 0.
>>> Refer here[1] for more information.
>>>
>>> Then, when the jobmanager failed because of any reason, the K8s job will
>>> be marked failed. And K8s will not restart the
clusterframework/ApplicationStatus.java#L32
>
> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin
> wrote:
>
>> Hi Experts,
>>
>> I have a flink cluster (per job mode) running on kubernetes. The job is
>> configured with restart strategy
>>
>> restart-strate
Hi Experts,
I have a flink cluster (per job mode) running on kubernetes. The job is
configured with restart strategy
restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
So after 3 times retry, the job will be marked as FAILED, hence the pods
are not running. However
space is unbounded, and we can't intervene to clean up
> stale state.
>
> Regards,
> David
>
> On Wed, Jul 1, 2020 at 2:26 AM Eleanore Jin
> wrote:
>
>> Hi experts,
>>
>> I am going through Ververica flink training, and when doing the lab with
>
Hi experts,
I am going through Ververica flink training, and when doing the lab with
window (https://training.ververica.com/exercises/windows), basically it
requires to compute within an hour which driver earns the most tip.
The logic is to
0. keyBy driverId
1. create 1 hour window based on event
8284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
> [4]
> https:
Hi Community,
Currently we are running flink in 'hub' data centers where data is ingested
into the platform via kafka, and flink job will read from kafka, do the
transformations, and publish to another kafka topic.
I would also like to see if the same logic (read input message -> do
transformatio
e to wait for input from the
> other side.
> Since the error is not about lack of memory, the buffering in Flink state
> might not be the problem here.
>
> Best, Fabian
>
>
>
>
>
> Am So., 3. Mai 2020 um 03:39 Uhr schrieb Eleanore Jin <
> eleanore@gmail.com&g
Hi all,
I just wonder is it possible to use Flink Metrics endpoint to allow
Prometheus to scrape user defined metrics?
Context:
In addition to Flink metrics, we also collect some application level
metrics using opencensus. And we run opencensus agent as side car in
kubernetes pod to collect metri
Hi All,
I am using apache Beam with Flink (1.8.2). In my job, I am using Beam
sideinput (which translates into Flink NonKeyedBroadcastStream) to do
filter of the data from main stream.
I have experienced OOM: GC overhead limit exceeded continuously.
After did some experiments, I observed followi
Hi All,
Currently I am running a flink job cluster (v1.8.2) on kubernetes with 4
pods, each pod with 4 parallelism.
The flink job reads from a source topic with 96 partitions, and does per
element filter, the filtered value comes from a broadcast topic and it
always use the latest message as the
Hi Kurt,
谢谢, 我了解过后如果有问题再请教
Best
Eleanore
On Sun, Apr 19, 2020 at 7:18 PM Kurt Young wrote:
> 可以看下这个jira:https://issues.apache.org/jira/browse/FLINK-14807
>
> Best,
> Kurt
>
>
> On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin
> wrote:
>
> > Hi,
> > 刚
Hi,
刚刚读到一篇关于Flink 在OLAP 上的使用案例 (
https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/),
其中一点提到了:
[image: image.png]
这部分优化,源于 OLAP 的一个特性:OLAP 会将最终计算结果发给客户端,通过JobManager 转发给 Client。
想请问一下这一点是如何实现的: 通过JobManager 把结果转发给Client.
谢谢!
Eleanore
Hi All,
The setup of my flink application is allow user to start and stop.
The Flink job is running in job cluster (application jar is available to
flink upon startup). When stop a running application, it means exit the
program.
When restart a stopped job, it means to spin up new job cluster wit
for you, who
>>> is one of the community experts in this area.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Wed, Mar 11, 2020 at 10:37 AM Eleanore Jin
>>> wrote:
>>>
>>>
g., the existing TMs might have some file already localized,
> or some memory buffers already promoted to the JVM tenured area, while the
> new TMs have not.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Mar 11, 2020 at 9:25 AM Eleanore Jin
> wrote:
>
>> Hi E
Hi Experts,
I have my flink application running on Kubernetes, initially with 1 Job
Manager, and 2 Task Managers.
Then we have the custom operator that watches for the CRD, when the CRD
replicas changed, it will patch the Flink Job Manager deployment
parallelism and max parallelism according to th
;
> On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin
> wrote:
>
>> Hi All,
>>
>> I am using Apache Beam to construct the pipeline, and this pipeline is
>> running with Flink Runner.
>>
>> Both Source and Sink are Kafka topics, I have enabled Beam Exactly onc
Hi All,
I am using Apache Beam to construct the pipeline, and this pipeline is
running with Flink Runner.
Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
semantics.
I believe how it works in beam is:
the messages will be cached and not processed by the KafkaExactlyOnceSin
if (SOME_CONDITION) {
>
> throw new RuntimeException("Lets test checkpointing");
>
> }
>
> return value;
>
> }
>
> });
>
>
>
> ~ Abhinav Bajaj
>
>
>
>
>
> *From: *Eleanor
Hi,
I have a flink application and checkpoint is enabled, I am running locally
using miniCluster.
I just wonder if there is a way to simulate the failure, and verify that
flink job restarts from checkpoint?
Thanks a lot!
Eleanore
34 matches
Mail list logo