Re: How to know (in code) how many times the job restarted?

2021-06-18 Thread Roman Khachatryan
> do you mean inside the processElement() method? I used a simple mapper with Thread.sleep before ExceptionSimulatorProcess. > what is 0ms pause? do you mean > env.getCheckpointConfig().setMinPauseBetweenCheckpoints(0); ? Yes, env.getCheckpointConfig().setMinPauseBetweenCheckpoints(0); > How do

Upcoming Meetup talk on using Flink & Pinot - Pinot vs Elasticsearch, a Tale of Two PoCs

2021-06-18 Thread Ken Krugler
Hi all, Next Tuesday I’m be showing about how we use Flink + Pinot to provide a new and improved analytics engine for Adbeat . https://www.meetup.com/apache-pinot/events/277817649/ — Ken PS - Some caveats - it’s mostly about Pinot and Elasticsearch, and we’re using Flink

Re: How to know (in code) how many times the job restarted?

2021-06-18 Thread Felipe Gutierrez
On Fri, Jun 18, 2021 at 5:40 PM Roman Khachatryan wrote: > I tried to run the test that you mentioned > (WordCountFilterQEPTest#integrationTestWithPoisonPillRecovery) in rev. > 6f08d0a. > > In IDE, I see that: > - checkpoint is never triggered (sentence is too short, checkpoint > pause and interv

Re: How to know (in code) how many times the job restarted?

2021-06-18 Thread Roman Khachatryan
I tried to run the test that you mentioned (WordCountFilterQEPTest#integrationTestWithPoisonPillRecovery) in rev. 6f08d0a. In IDE, I see that: - checkpoint is never triggered (sentence is too short, checkpoint pause and interval are too large) - exception is never thrown, so the job never restarte

Monitoring Exceptions using Bugsnag

2021-06-18 Thread Kevin Lam
Hi all, I'm interested in instrumenting an Apache Flink application so that we can monitor exceptions. I was wondering what the best practices are here? Is there a good way to observe all the exceptions inside of a Flink application, including Flink internals? We are currently thinking of using B

Re: Recommendation for dealing with Job Graph incompatibility across varying Flink versions

2021-06-18 Thread Sonam Mandal
Hi Paul, Thanks for getting back to me. I did take a look at the Google GO operator, and they use the /bin/flink client for job submission. My understanding is that in this scenario users must ensure that their job jar is compatible with the Flink version, and the client will just take care of

Re: Please advise bootstrapping large state

2021-06-18 Thread Marco Villalobos
It was not clear to me that JdbcInputFormat was part of the DataSet api. Now I understand. Thank you. On Fri, Jun 18, 2021 at 5:23 AM Timo Walther wrote: > Hi Marco, > > as Robert already mentioned, the BatchTableEnvironment is simply build > on top of the DataSet API, partitioning functionali

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
I'm glad I could help, I hope it will solve your problem :) Best, Piotrek pt., 18 cze 2021 o 14:38 Felipe Gutierrez napisał(a): > On Fri, Jun 18, 2021 at 1:41 PM Piotr Nowojski > wrote: > >> Hi, >> >> Keep in mind that this is a quite low level approach to this problem. It >> would be much bet

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Felipe Gutierrez
On Fri, Jun 18, 2021 at 1:41 PM Piotr Nowojski wrote: > Hi, > > Keep in mind that this is a quite low level approach to this problem. It > would be much better to make sure that after recovery watermarks are still > being emitted. > yes. Indeed it looks like a very low level. I did a small test

Re: Handling Large Broadcast States

2021-06-18 Thread Timo Walther
Hi Rion, as far as I know we also don't support broadcast streaming joins in Table API/SQL. Are you sure that you need a broadcast pattern? Or would a regular hash join using connect() with a CoProcessFunction also work for you? Maybe with an artifical key to spread the load more evently?

[DISCUSS] Apache Flink (1.14) Release cycle duration

2021-06-18 Thread Johannes Moser
Dear Flink Users, We are already working on the Flink 1.14. release and currently, following the usual release cycle, aiming for a release early September. You can follow the process in the Apache Flink wiki [1]. Within the dev mailing list the discussion appeared if we would want to push the re

Re: Please advise bootstrapping large state

2021-06-18 Thread Timo Walther
Hi Marco, as Robert already mentioned, the BatchTableEnvironment is simply build on top of the DataSet API, partitioning functionality is also available in DataSet API. So using the JdbcInputFormat directly should work in DataSet API. Otherwise I would recommend to use some initial pipeline

Re: Handling Large Broadcast States

2021-06-18 Thread Piotr Nowojski
Hi, As far as I know there are no plans to support other state backends with BroadcastState. I don't know about any particular technical limitation, it probably just hasn't been done. Also I don't know how much effort that would be. Probably it wouldn't be easy. Timo, can you chip in how for exa

Re: The memory usage of the job is very different between Flink1.9 and Flink1.12

2021-06-18 Thread Piotr Nowojski
Hi, always when upgrading I would suggest to check release notes first [1] Best, Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html#memory-management pt., 18 cze 2021 o 12:24 Haihang Jing napisał(a): > Ask a question, the same business logic

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
Hi, Keep in mind that this is a quite low level approach to this problem. It would be much better to make sure that after recovery watermarks are still being emitted. If you are using a built-in source, it's probably easier to do it in a custom operator. I would try to implement a custom one base

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Felipe Gutierrez
Hello Piotrek, On Fri, Jun 18, 2021 at 11:48 AM Piotr Nowojski wrote: > Hi, > > As far as I can tell timers should be checkpointed and recovered. What may > be happening is that the state of the last seen watermarks by operators on > different inputs and different channels inside an input is not

Re: PyFlink LIST type problem

2021-06-18 Thread Dian Fu
Hi Laszlo, It seems because the json format supports object array type and doesn’t support list type. However, it still hasn’t provided object array type in PyFlink Datastream API [1]. I have created a ticket as a following up. For now, I guess you could implement it yourself and could take a

The memory usage of the job is very different between Flink1.9 and Flink1.12

2021-06-18 Thread Haihang Jing
Ask a question, the same business logic, the same resource configuration, the memory usage of the job is very different between Flink1.9 and Flink1.12. Using jemalloc analysis, it is found that the UncompressBlockContentsForCompressionType method of rocksdb takes up more memory and runs the same ti

Re: How to know (in code) how many times the job restarted?

2021-06-18 Thread Felipe Gutierrez
I investigated a little bit more. I created the same POC on a Flink version 1.13. I have this ProcessFunction where I want to count the times it recovers. I tested with ListState and ValueState and it seems that during the integration test (only for integration test) the process is hanging on the o

Re: Task is always created state after submit a example job

2021-06-18 Thread Piotr Nowojski
Hi, I would start by looking at the Job Manager and Task Manager logs. Take a look if Task Managers connected/registered in the Job Manager and if so, if there were no problems when submitting the job. It seems like either there are not enough slots, or slots are actually not available. Best, Pio

Re: flink kafka producer avro serializer problem

2021-06-18 Thread Arvid Heise
Hi Kevin, Could you please expand why using Avro maps is not an option? [1] Of course, you wouldn't get some advantages like safe schema evolution but I can't see how that's supposed to work in your case anyways. Note that a map is also inefficient as keys are duplicated in each record. Dynamical

Re: Flow of events when Flink Iterations are used in DataStream API

2021-06-18 Thread Piotr Nowojski
Hi, In old Flink versions (prior to 1.9) that would be the case. If operator D emitted a record to Operator B, but Operator B hasn't yet processed when checkpoint is happening, this record would be lost during recovery. Operator D would be recovered with it's state as it was after emitting this re

回复: Flink sql case when problem

2021-06-18 Thread Jacky Yin 殷传旺
Hello Jing, Regarding the convention(from 'IN' to 'OR') threshold, could you please kindly explain it with more details? Is it the count of the items of the 'IN' clause? BR, Jacky 发件人: JING ZHANG 发送时间: 2021年6月18日 15:19 收件人: Leonard Xu 抄送: 纳兰清风 ; User-Flink 主

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski
Hi, As far as I can tell timers should be checkpointed and recovered. What may be happening is that the state of the last seen watermarks by operators on different inputs and different channels inside an input is not persisted. Flink is assuming that after the restart, watermark assigners will emi

Re: Discard checkpoint files through a single recursive call

2021-06-18 Thread Piotr Nowojski
Hi, Unfortunately at the moment I think there are no plans to push for this. I would suggest you to bump/cast a vote on https://issues.apache.org/jira/browse/FLINK-13856 in order to allows us more accurately prioritise efforts. Best, Piotrek śr., 16 cze 2021 o 05:46 Jiahui Jiang napisał(a): >

Re: Web UI shows my AssignTImestamp is in high back pressure but in/outPoolUsage are both 0.

2021-06-18 Thread Piotr Nowojski
Hi Haocheng, Regarding the first part, yes. For a very long time there was a trivial bug that was displaying the maximum "backpressure status" ("HIGH" in your case) from all of the subtasks, for every subtask, instead of showing the subtask's individual status. [1] It is/will be fixed in Flink 1.

Re: Recommendation for dealing with Job Graph incompatibility across varying Flink versions

2021-06-18 Thread Paul K Moore
Hi Sonam, I am not a long-standing Flink user (3 months only) so perhaps others will have a more authoritative view. I would say that I am using Flink in k8s, and have had some good success with the Google Flink operator (https://github.com/GoogleCloudPlatform/flink-on-k8s-operator). This inc

Re: How to set state.backend.rocksdb.latency-track-enabled

2021-06-18 Thread Yun Tang
Hi Chen-Che, The PR-16177 [1] is the documentation for state access latency tracking, thought it has not been merged, you could still refer it for more details. [1] https://github.com/apache/flink/pull/16177 Best Yun Tang From: Chen-Che Huang Sent: Friday, Ju

Re: How to set state.backend.rocksdb.latency-track-enabled

2021-06-18 Thread Chen-Che Huang
Hi Yangze, Got it. I'll evaluate to enable this feature and see whether I can gain some insights. Many thanks for your reply. On 2021/06/18 07:52:53, Yangze Guo wrote: > Hi, Chen-Che, > > IIUC, the "state.backend.rocksdb.latency-track-enabled" is just a > reject alternative and has been incor

Re: How to set state.backend.rocksdb.latency-track-enabled

2021-06-18 Thread Yangze Guo
Hi, Chen-Che, IIUC, the "state.backend.rocksdb.latency-track-enabled" is just a reject alternative and has been incorrectly written to the release note. You can refer to the [1] instead. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#state-backends-laten

How to set state.backend.rocksdb.latency-track-enabled

2021-06-18 Thread Chen-Che Huang
Hi, The 1.13 release note (https://flink.apache.org/news/2021/05/03/release-1.13.0.html) mentions that we can set state.backend.rocksdb.latency-track-enabled to obtain some rockdb metrics with a marginal impact. However, I couldn't see state.backend.rocksdb.latency-track-enabled in https://ci

Re: How to know (in code) how many times the job restarted?

2021-06-18 Thread Felipe Gutierrez
No, it didn't work. The "context.isRestored()" returns true when I run the application on the Flink standalone-cluster and it is recovering after a failure. When I do the same on a integration test it does not returns true after a failure. I mean, I can log the exception that is causing the failur

Re: Flink sql case when problem

2021-06-18 Thread JING ZHANG
Hi houying, The root cause of `CodeGenException` is comparing Integer with Varchar (b is VARCHAR, '' and '0' are VARCHAR). The Problem could be solved by updating type of b from INTEGER to VARCHAR. Note, comparing INTEGER with VARCHAR may introduce other unexpected results. For example in your abov