Re: accuracy validation of streaming pipeline

2022-05-23 Thread Shengkai Fang
It's a good question. Let me ping @Leonard to share more thoughts. Best, Shengkai vtygoss 于2022年5月20日周五 16:04写道: > Hi community! > > > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about > accuracy validation

Re: Re: Some question with Flink state

2022-05-23 Thread lxk7...@163.com
图片好像有点问题,重新上传一下 lxk7...@163.com From: Hangxiang Yu Date: 2022-05-24 12:09 To: user-zh Subject: Re: Re: Some question with Flink state 你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key selector相关(你可以参照下KeySelector的comments去看是否符合它的规范); 或者方便的话你可以分享下你的key selector相关的逻辑和使用state的逻辑; On Tue, May 24,

Re: Job Logs - Yarn Application Mode

2022-05-23 Thread Shengkai Fang
If you find the JM in the yarn web ui, I think you can also find the webui to access the Flink web ui with the JM. Best, Shengkai

Re: Re: Some question with Flink state

2022-05-23 Thread lxk7...@163.com
以下是我的代码部分 这是最新的一版,根据测试的时候没有啥问题 但是之前使用value state的时候能从数据上看出不对 lxk7...@163.com From: Hangxiang Yu Date: 2022-05-24 12:09 To: user-zh Subject: Re: Re: Some question with Flink state 你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key selector相关(你可以参照下KeySelector的comments去看是否符合它的规范);

Re: Window aggregation fails after upgrading to Flink 1.15

2022-05-23 Thread Shengkai Fang
Glad to see you find the root cause. I think we can shade the janino dependency if it influences the usage. WDYT, godfrey? Best, Shengkai Pouria Pirzadeh 于2022年5月21日周六 00:59写道: > Thanks for help; I digged into it and the issue turned out to be the > version of Janino: > flink-table has pinned

Re: Application mode -yarn dependancy error

2022-05-23 Thread Shengkai Fang
Hi. I think you should send the mail to the user mail list or stack overflow, which is about the usage and help. The dev mail list focus on the design of the Flink itself. Could you share more details for your problems, including - which version you use. - how you use the Flink, including you

Re: Re: Some question with Flink state

2022-05-23 Thread Hangxiang Yu
你是用data stream作业吗,相同key如果分配到了不同的并行度,有可能是和你写的key selector相关(你可以参照下KeySelector的comments去看是否符合它的规范); 或者方便的话你可以分享下你的key selector相关的逻辑和使用state的逻辑; On Tue, May 24, 2022 at 9:59 AM lxk7...@163.com wrote: > 好的,我看这里面邮件都是英文,所以用英文问了个问题。 > >

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Hangxiang Yu
In my opinion, some exceptions in the async phase like timeout may happen related to the network, state size which will change, so maybe next time these failures will not occur. So the config makes sense for these. But this failure in the sync phase usually means the program will always fail and

Re: OutputTag alternative with pyflink 1.15.0

2022-05-23 Thread Lakshya Garg
Thanks for the helpful links Yuxia. On Mon, May 23, 2022 at 2:31 PM yuxia wrote: > Yes, you're right. > > Hopefully, the master branch supported it [1]. But It haven't been > released. If you want to use output tag in python in 1.15, you can apply > this patch[1] to your Flink 1.15 and build it

Re: Json Deserialize in DataStream API with array length not fixed

2022-05-23 Thread Shengkai Fang
Hi. In the SQL, you can just specify the `array_coordinates` type ARRAY[1]. For example, ``` CREATE TABLE source( `array_coordinates` ARRAY> ) WITH ( 'format' = 'json' ) ``` [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/ Zain Haider Nemati

Re: Application mode deployment through API call

2022-05-23 Thread Shengkai Fang
Hi, all. > is there any plan in the Flink community to provide an easier way of deploying Flink with application mode on YARN Yes. Jark has already opened a ticket about how to use the sql client to submit the SQL in application mode[1]. What's more, in FLIP-222 we are able to manage the jobs in

Re: Re: Some question with Flink state

2022-05-23 Thread lxk7...@163.com
好的,我看这里面邮件都是英文,所以用英文问了个问题。 我再描述一下我的问题,我使用键控状态,用的value-state。按理来说,相同的key应该会被分到同一个并行度处理。但是但我使用多个并行度的时候,我发现好像相同的key并没有分配到同一个并行度处理。具体现象在于,我的程序是对同一个用户点击的商品进行累加,在数据里这个商品已经是第二个了,但是程序里这个商品的状态是空,所以导致最终累加的结果是1,而正确结果应该是2。所以我猜测是不是算子对于value-state都是独有的。

Re:Re: pyflink报错:Java gateway process exited before sending its port number

2022-05-23 Thread RS
Hi, 感谢,查询pyflink目录下,里面确实存在多个版本的jar包,我清理了下,可以运行起来了, 看来是PyCharm的bug了,安装新版本的时候没有成功清理旧的版本 Thanks~ 在 2022-05-23 19:27:42,"Dian Fu" 写道: >>> java.lang.NoSuchMethodError: >org.apache.flink.util.NetUtils.getAvailablePort()I > >你的环境是不是不太干净?可以检查一下 PyFlink 安装目录下(site-packages/pyflink/lib) 的那些 jar 包的版本。

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Gaël Renoux
Got it, thank you. I misread the documentation and thought the async referred to the task itself, not the process of taking a checkpoint. I guess there is currently no way to make a job never fail on a failed checkpoint? Gaël Renoux - Lead R Engineer E - gael.ren...@datadome.co W -

Re: Some question with Flink state

2022-05-23 Thread Hangxiang Yu
Hello, All states will not be shared in different parallelisms. BTW, English questions could be sent to u...@flink.apache.org. Best, Hangxiang. On Mon, May 23, 2022 at 4:03 PM lxk7...@163.com wrote: > > Hi everyone >I was used Flink keyed-state in my Project.But I found some questions >

Re: TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Hangxiang Yu
Hi, Gaël Renoux. As you could see in [1], There are some descriptions about the config: "This only applies to the following failure reasons: IOException on the Job Manager, failures in the async phase on the Task Managers and checkpoint expiration due to a timeout. Failures originating from the

Re: Flink Kubernetes Operator: Specifying env variables from ConfigMaps

2022-05-23 Thread Gyula Fóra
Hi Mads! I think you need to use the podTemplate for this. You can either do it in the top level spec or customize it for tm/jm respectively. Keep in mind that pod templates are merged with the base flink template so it's enough to specify the fields relevant for you (in these case the env

SV: SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Oh sorry no I was only looking at the JobManager logs, now I found them, thanks! Från: Chesnay Schepler Skickat: den 23 maj 2022 15:29:45 Till: Christopher Gustafson; user@flink.apache.org Ämne: Re: SV: How to enable statefun metrics for the Slf4j reporter Just

Re: SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Chesnay Schepler
Just to double-check, you are checking the taskmanager logs, correct? On 23/05/2022 15:24, Christopher Gustafson wrote: Yes, Flink metrics are showing up as usual, but none of the ones that are listed in the StateFun documentation.

SV: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Yes, Flink metrics are showing up as usual, but none of the ones that are listed in the StateFun documentation. Från: Chesnay Schepler Skickat: den 23 maj 2022 14:29:15 Till: Christopher Gustafson; user@flink.apache.org Ämne: Re: How to enable statefun metrics

Flink Kubernetes Operator: Specifying env variables from ConfigMaps

2022-05-23 Thread Mads Ellersgaard Kalør
Hi, We use a number of environment variables to configure our Flink pipelines, such as Kafka connection info, hostnames for external services etc. This works well when running a standalone Kubernetes deployment or on a local environment in Docker, but I cannot find any documentation about how

Re: flink sql api, exception when setting "table.exec.state.ttl"

2022-05-23 Thread Chesnay Schepler
You're probably mixing Flink versions. From the stack trace we can see that Flink classes are being loaded from 2 different jars (rocketmq-flink-1.0.0-SNAPSHOT.jar/flink-dist_2.12-1.13.5.jar); I'd suggest to resolve that first and see if the error persists. On 23/05/2022 14:32, 李诗君 wrote:

flink sql api, exception when setting "table.exec.state.ttl"

2022-05-23 Thread 李诗君
flink version: 1.13.5 java code: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inStreamingMode() .build();

Re: How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Chesnay Schepler
You shouldn't have to do more than that. Flink metrics are showing up as expected? Including metrics from tasks? On 23/05/2022 14:03, Christopher Gustafson wrote: Hi! I am trying to enable the StateFun metrics in the documentation to be logged using the Slf4j reporter but I cannot figure

Re: Applying backpressure to limit state memory consumption

2022-05-23 Thread Robin Cassan
Thanks Yu'an for your answer! Our issue lies in the fact that the window size is variable with the incoming traffic and would like a solution to avoid filling our window state in case of spikes. Even with RocksDB we would eventually be limited by the disk size (which, admittedly, is usually

How to enable statefun metrics for the Slf4j reporter

2022-05-23 Thread Christopher Gustafson
Hi! I am trying to enable the StateFun metrics in the documentation to be logged using the Slf4j reporter but I cannot figure out how to do it, and the documentation is pretty vague if you are not familiar with the Flink metrics beforehand. Could someone show me how to enable it, i.e what

Re: pyflink报错:Java gateway process exited before sending its port number

2022-05-23 Thread Dian Fu
>> java.lang.NoSuchMethodError: org.apache.flink.util.NetUtils.getAvailablePort()I 你的环境是不是不太干净?可以检查一下 PyFlink 安装目录下(site-packages/pyflink/lib) 的那些 jar 包的版本。 On Mon, May 23, 2022 at 4:22 PM RS wrote: > Hi, > 在Pycharm中,测试Pyflink示例代码,启动运行报错,代码为官方文档中的代码 > 参考官方文档: >

Re: Flink UI in Application Mode

2022-05-23 Thread Zain Haider Nemati
Hi David, Thanks for your response. When submitting a job in application mode it gives a url at the end but that is different i.e. on different ports when you submit different jobs in application mode. Is there a port/ui where I can see the consolidated list of jobs running instead of checking

Re:Re:Re:在自定义表聚合函数 TableAggregateFunction 使用 emitUpdateWithRetract 异常

2022-05-23 Thread sjf0115
Flink 版本:1.13.5 函数完整代码如下: ``` public class Top2RetractTableAggregateFunction extends TableAggregateFunction, Top2RetractTableAggregateFunction.Top2RetractAccumulator> { private static final Logger LOG = LoggerFactory.getLogger(Top2RetractTableAggregateFunction.class); // Top2

Re: Request for Review: FLINK-27507 and FLINK-27509

2022-05-23 Thread David Anderson
I've taken care of this. David On Sun, May 22, 2022 at 4:12 AM Shubham Bansal wrote: > Hi Everyone, > > I am not sure who to reach out for the reviews of these changesets, so I > am putting this on the mailing list here. > > I have raised the review for > FLINK-27507 -

Re:Re:在自定义表聚合函数 TableAggregateFunction 使用 emitUpdateWithRetract 异常

2022-05-23 Thread sjf0115
函数代码如下:```public class Top2RetractTableAggregateFunction extends TableAggregateFunctionTuple2Long, Integer, Top2RetractTableAggregateFunction.Top2RetractAccumulator {private static final Logger LOG = LoggerFactory.getLogger(Top2RetractTableAggregateFunction.class);// Top2 聚合中间结果数据结构

TolerableCheckpointFailureNumber not always applying

2022-05-23 Thread Gaël Renoux
Hello everyone, We're having an issue on our Flink job: it restarted because it failed a checkpoint, even though it shouldn't have. We've set the tolerableCheckpointFailureNumber to 1 million to never have the job restart because of this. However, the job did restart following a checkpoint

Re: OutputTag alternative with pyflink 1.15.0

2022-05-23 Thread yuxia
Yes, you're right. Hopefully, the master branch supported it [1]. But It haven't been released. If you want to use output tag in python in 1.15, you can apply this patch[1] to your Flink 1.15 and build it by yourself[3]. BTW, if you don't want to bother to build. You can use java/scala api.

Re: Flink UI in Application Mode

2022-05-23 Thread David Morávek
Hi Zain, you can find a link to web-ui either in the CLI output after the job submission or in the YARN ResourceManager web ui [1]. With YARN Flink needs to choose the application master port at random (could be somehow controlled by setting _yarn.application-master.port_) as there might be

pyflink报错:Java gateway process exited before sending its port number

2022-05-23 Thread RS
Hi, 在Pycharm中,测试Pyflink示例代码,启动运行报错,代码为官方文档中的代码 参考官方文档:https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/table_api_tutorial/ 报错如下: Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError:

Some question with Flink state

2022-05-23 Thread lxk7...@163.com
Hi everyone I was used Flink keyed-state in my Project.But I found some questions that make me confused. when I used value-state in multi parallelism the value is not I wanted. So I guess that value-state is in every parallelism. every parallelism saved their only value which means

Re: Kinesis Sink - Data being received with intermittent breaks

2022-05-23 Thread Danny Cranmer
Hi Zain, Glad you found the problem, good luck! Thanks, Danny Cranmer On Fri, May 20, 2022 at 10:05 PM Zain Haider Nemati wrote: > Hi Danny, > I looked into it in a bit more thorough detail, the bottleneck seems to be > the transform function which is at 100% and causing back pressuring. Im >