Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-08 Thread Piotr Nowojski
Re-adding user mailing list Hey Alex, In that case I can see two scenarios that could lead to missing files. Keep in mind that incremental checkpoints are referencing previous checkpoints in order to minimise the size of the checkpoint (roughly speaking only changes since the previous checkpoint

Re: Re: Add control mode for flink

2021-06-08 Thread 刘建刚
Thanks for the reply. It is a good question. There are multi choices as follows: 1. We can persist control signals in HighAvailabilityServices and replay them after failover. 2. Only tell the users that the control signals take effect after they are checkpointed. Steven Wu [via Apach

Re: ByteSerializationSchema in PyFlink

2021-06-08 Thread Wouter Zorgdrager
Hi Dian, all, The way I resolved right now, is to write my own custom serializer which only maps from bytes to bytes. See the code below: public class KafkaBytesSerializer implements SerializationSchema, DeserializationSchema { @Override public byte[] deserialize(byte[] bytes) throws IOEx

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Till Rohrmann
Thanks for the update Yingjie. Would it make sense to write a short blog post about this feature including some performance improvement numbers? I think this could be interesting to our users. Cheers, Till On Mon, Jun 7, 2021 at 4:49 AM Jingsong Li wrote: > Thanks Yingjie for the great effort!

Re: State migration for sql job

2021-06-08 Thread Kurt Young
What kind of expectation do you have after you add the "max(a)" aggregation: a. Keep summing a and start to calculate max(a) after you added. In other words, max(a) won't take the history data into account. b. First process all the historical data to get a result of max(a), and then start to compu

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-08 Thread Chirag Dewan
Hi, Although this looks like a problem to me, I still cant conclude it.  I tried reducing my TM replicas from 2 to 1 with 4 slots and 4 cores each. I was hoping that with single TM there will be file write conflicts. But that doesn't seem to be the case as still get the: Caused by: org.apache.fl

How to unsubscribe?

2021-06-08 Thread Geldenhuys , Morgan Karl
How can I unsubscribe to this mailing lists? The volume of is just getting too much at the moment. Following the steps described in the website (https://flink.apache.org/community.html) did not appear to do anything. Sorry for the spam and thanks in advance.

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Yingjie Cao
Hi Till, Thanks for the suggestion. The blog post is already on the way. Best, Yingjie Till Rohrmann 于2021年6月8日周二 下午5:30写道: > Thanks for the update Yingjie. Would it make sense to write a short blog > post about this feature including some performance improvement numbers? I > think this could

Re: How to unsubscribe?

2021-06-08 Thread Leonard Xu
Hi, Morgan Just send an email with any content to user-unsubscr...@flink.apache.org will unsubscribe the mail from Flink user mail list. And also send an email to with any content to dev-unsubscr...@flink.apache.org

Jupyter PyFlink Web UI

2021-06-08 Thread maverick
Hi, I've got a question. I'm running PyFlink code from Jupyter Notebook starting TableEnvironment with following code: env_settings = EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build() table_env = TableEnvironment.create(env_settings) How can I enable Web UI in thi

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Till Rohrmann
Great :-) On Tue, Jun 8, 2021 at 1:11 PM Yingjie Cao wrote: > Hi Till, > > Thanks for the suggestion. The blog post is already on the way. > > Best, > Yingjie > > Till Rohrmann 于2021年6月8日周二 下午5:30写道: > >> Thanks for the update Yingjie. Would it make sense to write a short blog >> post about thi

Re: Query related to Minimum scrape interval for Prometheus and fetching metrics of all vertices in a job through Flink Rest API

2021-06-08 Thread Ashutosh Uttam
Thanks Matthias. We are using Prometheus for fetching metrics. Is there any recommended scrape interval ? Also is there any impact if lower scrape intervals are used? Regards, Ashutosh On Fri, May 28, 2021 at 7:17 PM Matthias Pohl wrote: > Hi Ashutosh, > you can set the metrics update interval

Re: Allow setting job name when using StatementSet

2021-06-08 Thread Yuval Itzchakov
Yup, that worked. Thank you guys for pointing it out! On Tue, Jun 8, 2021, 09:33 JING ZHANG wrote: > I agree with Nico, I just add the link of pipeline.name > > here. > > Nicolaus Weidner 于2021年6月7日周一 > 下午11

Re: Query related to Minimum scrape interval for Prometheus and fetching metrics of all vertices in a job through Flink Rest API

2021-06-08 Thread Chesnay Schepler
There is no recommended scrape interval because it is largely dependent on your requirements. For example, if you're fine with reacting to problems within an hour, then a 5s scrape interval doesn't make sense. The lower the interval the more resources must of course be spent on serving the pro

Re: State migration for sql job

2021-06-08 Thread aitozi
Thanks for JING & Kurt's reply. I think we prefer to choose the option (a) that will not take the history data into account. IMO, if we want to process all the historical data, we have to store the original data, which may be a big overhead to backend. But if we just aggregate after the new adde

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Yun Gao
Hi Thomas, I tried but do not re-produce the exception yet. I have filed an issue for the exception first [1]. [1] https://issues.apache.org/jira/browse/FLINK-22928 --Original Mail -- Sender:Thomas Wang Send Date:Tue Jun 8 07:45:52 2021 Recipients:Yun Gao

Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
Hello, I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Re: Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-08 Thread Yun Gao
Hi Chirag, As far as I know, If you are running a single job, I think all th pods share the same state.checkpoints.dir configuration should be as expected, and it is not necessary to configuraiton the rocksdb local dir since Flink will chosen a default dir. Regarding the latest exception, I t

Re: Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
Sorry, fat finger send before I finished writing…. Hello, I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config: kubernetes.service-account: flink-service-account high-availability: org.apache.flink.kubernetes.highavailability.Kube

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Kezhu Wang
Could it be same as FLINK-21028[1] (titled as “Streaming application didn’t stop properly”, fixed in 1.11.4, 1.12.2, 1.13.0) ? [1]: https://issues.apache.org/jira/browse/FLINK-21028 Best, Kezhu Wang On June 8, 2021 at 22:54:10, Yun Gao (yungao...@aliyun.com) wrote: Hi Thomas, I tried but do

📝2 weeks left to submit your talks for Flink Forward Global 2021!

2021-06-08 Thread Caito Scherr
Hi there, The Call for Presentations [1] for Flink Forward Global 2021 closes in just 2 weeks on Monday, June 21! Are you working on an inspiring Flink story, real-world application, or use case? Now is a good time to finalize and submit your talk ideas to get the chance to present them to the Fli

Re: Questions about implementing a flink source

2021-06-08 Thread Evan Palmer
Hello again, Thank you for all of your help so far, I have a few more questions if you have the time :) 1. Deserialization Schema There's been some debate within my team about whether we should offer a DeserializationSchema and SerializationSchema in our source and sink. If we include don't inc

Persisting state in RocksDB

2021-06-08 Thread Paul K Moore
Hi all, First post here, so please be kind :) Firstly some context; I have the following high-level job topology: (1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction 1. The FlinkPulsarSource reads event notifications about article updates from a Pulsar topic 2. The RichAsyncFunc

Records Are Never Emitted in a Tumbling Event Window When Each Key Only Has One Record

2021-06-08 Thread Joseph Lorenzini
Hi all,   I have observed behavior joining two keyed streams together, where events are never emitted.  The source of each stream is a different kafka topic. I am curious to know if this expected and if there’s a way to work around it.   I am using a tumbling event window. All record

Re: ByteSerializationSchema in PyFlink

2021-06-08 Thread Dian Fu
Hi Wouter, Great to hear and thanks for the sharing! Regards, Dian > 2021年6月8日 下午4:44,Wouter Zorgdrager 写道: > > Hi Dian, all, > > The way I resolved right now, is to write my own custom serializer which only > maps from bytes to bytes. See the code below: > public class KafkaBytesSerializer

Re: Jupyter PyFlink Web UI

2021-06-08 Thread Dian Fu
Hi Macike, You could try if the following works: ``` table_env.get_config().get_configuration().set_string("rest.bind-port", "xxx") ``` Regards, Dian > 2021年6月8日 下午8:26,maverick 写道: > > Hi, > I've got a question. I'm running PyFlink code from Jupyter Notebook starting > TableEnvironment with

[table-walkthrough exception] Unable to create a source for reading table...

2021-06-08 Thread Lingfeng Pu
Hi, I'm following the tutorial to run the "flink-playground/table-walkthrough" project on IDEA. However, I got *the exception as follows:* Exception in thread "main" org.apache.flink.table.api.ValidationException: Unable to create a source for reading table 'default_catalog.default_database.trans

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
option 2 is probably not feasible, as checkpoint may take a long time or may fail. Option 1 might work, although it complicates the job recovery and checkpoint. After checkpoint completion, we need to clean up those control signals stored in HA service. On Tue, Jun 8, 2021 at 1:14 AM 刘建刚 wrote:

Re: Add control mode for flink

2021-06-08 Thread Paul Lam
+1 for this feature. Setting up a separate control stream is too much for many use cases, it would very helpful if users can leverage the built-in control flow of Flink. My 2 cents: 1. @Steven IMHO, producing control events from JobMaster is similar to triggering a savepoint. The REST api is no

Re: Add control mode for flink

2021-06-08 Thread Steven Wu
> producing control events from JobMaster is similar to triggering a savepoint. Paul, here is what I see the difference. Upon job or jobmanager recovery, we don't need to recover and replay the savepoint trigger signal. On Tue, Jun 8, 2021 at 8:20 PM Paul Lam wrote: > +1 for this feature. Setti

Re: Add control mode for flink

2021-06-08 Thread Xintong Song
> > 2. There are two kinds of existing special elements, special stream > records (e.g. watermarks) and events (e.g. checkpoint barrier). They all > flow through the whole DAG, but events needs to be acknowledged by > downstream and can overtake records, while stream records are not). So I’m > wond

Re: How to configure column width in Flink SQL client?

2021-06-08 Thread Svend
Thanks for the feed-back Ingo, Do you think a PR would be welcome to make that parameter configurable? At the place where I work, UUID are often used as column values and they are 36 character longs => very often a very useful piece of information to us is not readable. I had a quick look, the

Re: How to configure column width in Flink SQL client?

2021-06-08 Thread Ingo Bürk
Hi Svend, I think it definitely makes sense to open a JIRA issue for it to discuss it also with the people working on the SQL client. Thanks for taking care of this! Regards Ingo On Wed, Jun 9, 2021 at 7:25 AM Svend wrote: > Thanks for the feed-back Ingo, > > Do you think a PR would be welcom

Re: Jupyter PyFlink Web UI

2021-06-08 Thread Maciej Bryński
Nope. I found the following solution. conf = Configuration() env = StreamExecutionEnvironment(get_gateway().jvm.org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf._j_configuration)) env_settings = EnvironmentSettings.new_instance().in_strea