Production Readiness of File Source

2021-03-16 Thread Chirag Dewan
Hi, I am intending to use the File source for a production use case. I have a few use cases that are currently not supported like deleting a file once it's processed.  So I was wondering if we can use this in production or write my own implementation? Is there any recommendations around this?

Re: Checkpoint fail due to timeout

2021-03-16 Thread Alexey Trenikhun
In my opinion looks similar. Were you able to tune-up Flink to make it work? I'm stuck with it, I wanted to scale up hoping to reduce backpressure, but to rescale I need to take savepoint, which never completes (at least takes longer than 3 hours). From:

ClassCastException after upgrading Flink application to 1.11.2

2021-03-16 Thread soumoks
Hi, We have upgraded an application originally written for Flink 1.9.1 with Scala 2.11 to Flink 1.11.2 with Scala 2.12.7 and we are seeing the following error at runtime. 2021-03-16 20:37:08 java.lang.RuntimeException at

Re:Re: pyflink使用的一些疑问

2021-03-16 Thread xiaoyue
Hi, Xingbo 想跟您了解一下关于sql_query执行上的细节,flink1.12版本底层执行sql语句的过程中,是否有谓词下退的优化? 从相关的代码测试结果看: 1. pyflink1.11版本的connector定义支持参数read.query来获取数据,执行效率很高,猜测这部分执行交由数据库完成; 2. pyflink1.12版本取消了read.query参数,当定义多个数据源执行join等操作时,耗时很明显(pyflink)

FlinkSql 的一些疑问

2021-03-16 Thread songzhongs...@kedacom.com
请教如下问题: 1. TableEnvironment 的 Checkpoint 以及 RestartStrategy 如何设置? 是仅有通过设置 StreamTableEnvironment ,再 create TableEnv 的方式吗? 2. 使用catalog 时, 对于已经存在的物理表,如何指定 表字段为EventTime 以及Watermark 如何设置。还需要CreateTable 吗? songzhongs...@kedacom.com

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-16 Thread Yun Tang
Hi Alexey, I believe your exception messages are printed from Flink-1.12.2 not Flink-1.12.1 due to the line number of method calling. Could you share exception message of Flink-1.12.1 when rescaling? Moreover, I hope you could share more logs during restoring and rescaling. I want to see

Re: Prefix Seek RocksDB

2021-03-16 Thread Yun Tang
Hi Rex, Prefix seek iterator has not ever been used in Flink when seeking. I hope you could first read more details about this from RocksDB wiki as prefix extractor could impact the performance. Best Yun Tang From: Rex Fenley Sent: Wednesday, March 17, 2021

Re: Pyflink 提交到本地集群报错

2021-03-16 Thread Huilin_WU
你好,谢谢你的回复,现在更新到V1.12就可以直接运行了 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-16 Thread Vishal Santoshi
Neither those are metrics metrics on a ValueState, which is updated at least once every call to process. The metric is the the number of these ValueStates scoped to a key ( am using session windows ). On Mon, Mar 15, 2021 at 11:29 PM Yun Tang wrote: > Hi, > > Could you describe what you

How to advance Kafka offsets manually with checkpointing enabled on Flink (TableAPI)

2021-03-16 Thread Rex Fenley
Hello, I'm wondering how, in the event of a poison pill record on Kafka, to advance a partition's checkpointed offsets by 1 when using the TableAPI/SQL. It is my understanding that when checkpointing is enabled Flink uses its own checkpoint committed offsets and not the offsets committed to

Re: Prefix Seek RocksDB

2021-03-16 Thread Rex Fenley
Thanks for the input, I'll look more into that. Does your answer then imply that Joins and Aggs do not inherently always use prefix seeks? I'd imagine that the join key on join and groupby key on aggs would always be used as prefix keys. Is this not the case? Also, is there good information on

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
I've made a handful of tweaks to it to try and get them to pick up as expected (i.e. adding logging to every available overload for the interceptors, etc) using something similar to the following: fun create(): InterceptingTaskMetricGroup { val operatorGroup = object:

custom metrics within a Trigger

2021-03-16 Thread Aleksander Sumowski
Hi all, I'd like to measure how many events arrive within allowed lateness grouped by particular feature of the event. We assume particular type of events have way more late arrivals and would like to verify this. The natural place to make the measurement would be our custom trigger within

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Chesnay Schepler
Actually you'd have to further subclass the operatorMetricGroup such that addGroup works as expected. This is admittedly a bit of a drag :/ On 3/16/2021 4:35 PM, Chesnay Schepler wrote: The test harness is fully independent of the MiniClusterResource; it isn't actually running a job. That's

Re: Editing job graph at runtime

2021-03-16 Thread Jessy Ping
Hi Timo/Team, Thanks for the reply. Just take the example from the following pseduo code, Suppose , this is the current application logic. firstInputStream = addSource(...)* //Kafka consumer C1* secondInputStream = addSource(...) *//Kafka consumer C2* outputStream = firstInputStream,keyBy(a ->

Re: Python StreamExecutionEnvironment from_collection Kafka example

2021-03-16 Thread Robert Cullen
Thanks All, I've added python and pyflink to the TM image which fixed the problem. Now however submitting a python script to the cluster successfully is sporadic; sometimes it completes but most of the time it just hangs. Not sure what is causing this. On Mon, Mar 15, 2021 at 9:47 PM Xingbo

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Chesnay Schepler
The test harness is fully independent of the MiniClusterResource; it isn't actually running a job. That's why your metrics never arrive at the reporter. You can either: a) use the test harness with a custom MetricGroup implementation that intercepts registered metrics, set in the

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
In this case, I was using a harness to test the function. Although, I could honestly care less about the unit-test surrounding metrics, I'm much more concerned with having something that will actually run and work as intended within a job. The only real concern I have or problem that I want to

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Chesnay Schepler
Are you actually running a job, or are you using a harness for testing your function? On 3/16/2021 3:24 PM, Rion Williams wrote: Hi Chesnay, Thanks for the prompt response and feedback, it's very much appreciated. Please see the inline responses below to your questions: *Was there

Re: [Flink SQL] Leniency of JSON parsing

2021-03-16 Thread Timo Walther
Hi Sebastian, you can checkout the logic your self by looking into https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/debezium/DebeziumJsonDeserializationSchema.java and

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Rion Williams
Hi Chesnay, Thanks for the prompt response and feedback, it's very much appreciated. Please see the inline responses below to your questions: *Was there anything in the logs (ideally on debug)?* I didn't see anything within the logs that seemed to indicate anything out of the ordinary. I'm

Re: Checkpoint fail due to timeout

2021-03-16 Thread 陳昌倬
On Tue, Mar 16, 2021 at 02:32:54AM +, Alexey Trenikhun wrote: > Hi Roman, > I took thread dump: > "Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=200 BLOCKED on > java.lang.Object@5366a0e2 owned by "Legacy Source Thread - Source: > digital-itx-eastus2 -> Filter (6/6)#0" Id=202 > at >

Re:Re: Find many strange measurements in metrics database of influxdb

2021-03-16 Thread Tim yu
Hi Timo, The measurement of influxdb is like the table of mysql, "from table1" is a very strange table name. Maybe this is a bug. Regards, Tim -- - Original Message - From: "Timo Walther" To: user@flink.apache.org Sent: Tue, 16 Mar 2021 13:30:41 +0100 Subject: Re: Find many

Re: Time Temporal Join

2021-03-16 Thread Timo Walther
Hi Satyam, first of all your initial join query can also work, you just need to make sure that no time attribute is in the SELECT clause. As the exception indicates, you need to cast all time attributes to TIMESTAMP. The reason for this is some major design issue that is also explained here

Re: Find many strange measurements in metrics database of influxdb

2021-03-16 Thread Timo Walther
Hi Tim, "from table1" might be the operator that reads "table1" also known as the table scan operator. Could you share more of the metrics and their values? Most of them should be explained in https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/metrics/#system-metrics Regards,

Re: Editing job graph at runtime

2021-03-16 Thread Timo Walther
Hi Jessy, to be precise, the JobGraph is not used at runtime. It is translated into an ExecutionGraph. But nevertheless such patterns are possible but require a bit of manual implementation. Option 1) You stop the job with a savepoint and restart the application with slightly different

Editing job graph at runtime

2021-03-16 Thread Jessy Ping
Hi Team, Is it possible to edit the job graph at runtime ? . Suppose, I want to add a new sink to the flink application at runtime that depends upon the specific parameters in the incoming events.Can i edit the jobgraph of a running flink application ? Thanks Jessy

Re: flink sql 的 count(distinct )问题

2021-03-16 Thread Benchao Li
Hi, 你可以理解为用的是MapState来保存的状态。 op <520075...@qq.com> 于2021年3月16日周二 下午3:00写道: > 各位大佬好,想问下flinksql里的count (distinct)默认是用哪种state保存的状态 -- Best, Benchao Li

Re: Handle late message with flink SQL

2021-03-16 Thread Timo Walther
Hi, your explanation makes sense but I'm wondering how the implementation would look like. This would mean bigger changes in a Flink fork, right? Late data handling in SQL is a frequently asked question. Currently, we don't have a good way of supporting it. Usually, we recommend to use

[HEADS UP] Flink Community Survey

2021-03-16 Thread Ana Vasiliuk
Hi everyone! We're constantly working to improve the Flink community experience and need your help! Please take 2 min to share your thoughts with us via a short survey [1]. Your feedback will help us understand where we stand on communication between community members, what activities you prefer

Re: Unit Testing for Custom Metrics in Flink

2021-03-16 Thread Chesnay Schepler
Was there anything in the logs (ideally on debug)? Have you debugged the execution and followed the counter() calls all the way to the reporter? Do you only see JobManager metrics, or is there somewhere also something about the TaskManager? I can see several issues with your code, but none

使用codehaus.janino动态生成类,在map函数中自动映射json对象,找不到我自动生成的类

2021-03-16 Thread hdxg1101300...@163.com
你好: 我这边根据数据字典 动态生产类然后通过map函数对我的json字符串映射到我动态生成的类中; public static Class getClazz(String className,String cls) throws Exception { SimpleCompiler compiler = new SimpleCompiler(); compiler.cook(cls);

Re: Is it possible to mount node local disk for task managers in a k8s application cluster?

2021-03-16 Thread Chen-Che Huang
Hi Yang, Thanks for the reply. Looking forward to 1.13 :) Best wishes, Chen-Che On 2021/03/16 07:41:18, Yang Wang wrote: > I think the pod template[1] is what you are looking for. It will be > released in 1.13. > > [1]. >

Re: Temproal Tables

2021-03-16 Thread superainbower
这个问题我解决了,这样定义应该是可以 On 03/16/2021 15:11, superainbower wrote: 请教下大家,官网中对于时态表的定义的案例是基于debezium的,我现在具基于canal这样定义有问题吗?定义如下 create table produce( id string, name string, price decimal(10,4) update_time timestamp(3) metadata from ‘timestamp’ virtual, primary key(id) not enforced, watermark for

Re: Is it possible to mount node local disk for task managers in a k8s application cluster?

2021-03-16 Thread Yang Wang
I think the pod template[1] is what you are looking for. It will be released in 1.13. [1]. https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template Best, Yang Chen-Che Huang 于2021年3月16日周二 下午1:26写道: > Hi, > > We use the per-job

Re: Flink sql 实现全局row_number()分组排序

2021-03-16 Thread Tian Hengyu
咋么有人啊~~~ -- Sent from: http://apache-flink.147419.n8.nabble.com/

(无主题)

2021-03-16 Thread 黄志高
各位大佬,在做远程提交任务到flink的standalone模式,抛以下异常 Caused by: java.lang.RuntimeException: java.lang.InterruptedException at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:277) at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:981) at

Temproal Tables

2021-03-16 Thread superainbower
请教下大家,官网中对于时态表的定义的案例是基于debezium的,我现在具基于canal这样定义有问题吗?定义如下 create table produce( id string, name string, price decimal(10,4) update_time timestamp(3) metadata from ‘timestamp’ virtual, primary key(id) not enforced, watermark for update_time as update_time )with( ‘connector’=‘Kafka’,

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-16 Thread Alexey Trenikhun
Also restore from same savepoint without change in parallelism works fine. From: Alexey Trenikhun Sent: Monday, March 15, 2021 9:51 PM To: Yun Tang ; Tzu-Li (Gordon) Tai ; user@flink.apache.org Subject: Re: EOFException on attempt to scale up job with RocksDB

flink sql ?? count(distinct )????

2021-03-16 Thread op
??flinksqlcount (distinct??state??

Re: Attach remote debugger to task executor

2021-03-16 Thread Reggie Quimosing
Thanks Julian, that worked! I totally missed this in the documentation. On Mon, Mar 15, 2021 at 4:06 PM Jaffe, Julian wrote: > You can use `env.java.opts.taskmanager` to specify java options for the > task managers specifically. Be aware you may want to set `suspend=n` or be > sure to attach

Re: pyflink使用的一些疑问

2021-03-16 Thread Xingbo Huang
Hi, 补充回答两点 1. 现在Table上是支持sliding window和Tumpling Window的Pandas UDAF[1]的, 在1.13会支持session window的UDAF的支持。对于datastream上window的支持,对于上述几种window,你可以转到table上去操作,对于自定义window,datastream会在1.13支持。 2. 关于性能问题,如果你不使用Python UDFs的话,本质就是跑的Java的代码,python起的作用只是在客户端编译JobGraph的作用,所以不存在说Python

Re: Can I use PyFlink together with PyTorch/Tensorflow/PyTorch

2021-03-16 Thread Xingbo Huang
Hi Yik San, Thanks for the investigation of PyFlink together with all these ML libs. IMO, you could refer to the flink-ai-extended project that supports the Tensorflow on Flink, PyTorch on Flink etc, whose repository url is https://github.com/alibaba/flink-ai-extended. Flink AI Extended is a