Re: [DISCUSS] Change default planner to blink planner in 1.11

2020-03-31 Thread Jark Wu
+1 to make blink planner as default planner. We should give blink planner more exposure to encourage users trying out new features and lead users to migrate to blink planner. Glad to see blink planner is used in production since 1.9! @Benchao Best, Jark On Wed, 1 Apr 2020 at 11:31, Benchao Li

Re: [DISCUSS] Change default planner to blink planner in 1.11

2020-03-31 Thread Benchao Li
Hi Kurt, It's excited to hear that the community aims to make Blink Planner default in 1.11. We have been using blink planner since 1.9 for streaming processing, it works very well, and covers many use cases in our company. So +1 to make it default in 1.11 from our side. Kurt Young

【反馈收集】在 1.11 版本中将 blink planner 作为默认的 planner

2020-03-31 Thread Kurt Young
大家好, 正如大家所知,Blink planner 是 Flink 1.9 版本中引入的一个全新的 Table API 和 SQL 的翻译优化 器,并且我们已经在 1.10 版本中将其作为 SQL CLI 的默认 planner。由于社区很早就已经决定不再 针对老的优化器去增添任何新的功能,所以从功能和性能上来说,老的 flink planner 已经缺了很多 现在社区最新的 SQL 相关的功能,比如新的类型系统,更多的DDL的支持,以及即将在 1.11 发布 的新的 TableSource 和 TableSink 接口和随之而来的对 Binlog 类日志的解析。

[DISCUSS] Change default planner to blink planner in 1.11

2020-03-31 Thread Kurt Young
Hi Dev and User, Blink planner for Table API & SQL is introduced in Flink 1.9 and already be the default planner for SQL client in Flink 1.10. And since we already decided not introducing any new features to the original Flink planner, it already lacked of so many great features that the

Re: flink 1.10 support LONG as watermark?

2020-03-31 Thread jingjing bai
Thanks a lot! Jark Wu 于2020年4月1日周三 上午1:13写道: > Hi Jing, > > I created https://issues.apache.org/jira/browse/FLINK-16889 to support > converting from BIGINT to TIMESTAMP. > > Best, > Jark > > On Mon, 30 Mar 2020 at 20:30, jingjing bai > wrote: > >> Hi jarkWu! >> >> Is there a FLIP to do so?

Re: 回复: ProcessWindowFunction中如何有效清除state呢

2020-03-31 Thread Yun Tang
Hi 我觉得你的整个程序能从没有checkpoint开始跑就很奇怪,你们的 value state descriptor里面没有定义default value,那么调用#value() 接口返回的就是null,所以第一次调用 #update 时候还从state里面取值,最后还能跑通就很奇怪。 我建议本地在IDE里面debug看一下吧,可以把clear的条件改一下,不要弄成隔天才清理,可以让本地可以复现问题。 祝好 唐云 From: 守护 <346531...@qq.com> Sent: Tuesday, March

Re: flink 1.10 support LONG as watermark?

2020-03-31 Thread Jark Wu
Hi Jing, I created https://issues.apache.org/jira/browse/FLINK-16889 to support converting from BIGINT to TIMESTAMP. Best, Jark On Mon, 30 Mar 2020 at 20:30, jingjing bai wrote: > Hi jarkWu! > > Is there a FLIP to do so? I'm very glad to learn from idea. > > > Best, > jing > > Jark Wu

Fwd: Complex graph-based sessionization (potential use for stateful functions)

2020-03-31 Thread Robert Metzger
Forwarding Seth's answer to the list -- Forwarded message - From: Seth Wiesman Date: Tue, Mar 31, 2020 at 4:47 PM Subject: Re: Complex graph-based sessionization (potential use for stateful functions) To: Krzysztof Zarzycki Cc: user , Hi Krzysztof, This is a great use case

Flink in EMR configuration problem

2020-03-31 Thread Antonio Martínez Carratalá
Hi, I'm trying to run a job in a Flink cluster in Amazon EMR from java code but I'm having some problems This is how I create the cluster: StepConfig copyJarStep = new StepConfig()

Re: Run several jobs in parallel in same EMR cluster?

2020-03-31 Thread Antonio Martínez Carratalá
I could not make it work as I wanted with taskmanager.numberOfTaskSlots to 2, but I found a way for running them in parallel, just creating a cluster for each job since they are independent Thanks On Mon, Mar 30, 2020 at 4:22 PM Gary Yao wrote: > Can you try to set config option

Flink Kafka Consumer Throughput reduces over time

2020-03-31 Thread Arpith techy
Currently I've Flink consumer with following properties, Flink consumes record at around 400 messages/sec at start of program but later on as numBuffersOut exceeds 100, data rate falls to 200messages/sec. I've set parallelism to only 1, it's Avro based consumer and checkpointing is disabled.

flink 1.10 catalog保存到hive

2020-03-31 Thread 宇张
hi: 我们这面想使用hive来存储flink catalog数据,那么在元数据保存删除的时候怎么来校验是否拥有hive元数据操作权限哪

Latency tracking together with broadcast state can cause job failure

2020-03-31 Thread Lasse Nedergaard
Hi We have in both Flink 1.9.2 and 1.10 struggled with random deserialze and Index out of range exception in one of our job. We also get out of memory exceptions. We have now identified it as a latency tracking together with broadcast state Causing the problem. When we do integration testing

Re: Question about the flink 1.6 memory config

2020-03-31 Thread Xintong Song
The container cut-off accounts for not only metaspace, but also native memory footprint such as thread stack, code cache, compressed class space. If you run streaming jobs with rocksdb state backend, it also accounts for the rocksdb memory usage. The consequence of less cut-off depends on your

Re: Question about the flink 1.6 memory config

2020-03-31 Thread Xintong Song
The container cut-off accounts for not only metaspace, but also native memory footprint such as thread stack, code cache, compressed class space. If you run streaming jobs with rocksdb state backend, it also accounts for the rocksdb memory usage. The consequence of less cut-off depends on your

flink dashboard 有没有好的账号认证方式

2020-03-31 Thread 陈伟
各位好: 我现在在搭建flink平台的时候,发现dashboard没有账号权限的控制,只要知道了地址就可以访问并提交job,感觉这个风险比较大,大家在生产环境是如何解决这个问题的? 我看到网上说通过Nginx做认证控制, 我感觉这个有点不太优雅,flink官方有没有认证相关的插件?有没有稍微优雅一点的解决方案? 期待大家的回复

Question about the flink 1.6 memory config

2020-03-31 Thread LakeShen
Hi community, Now I am optimizing the flink 1.6 task memory configuration. I see the source code, at first, the flink task config the cut-off memory, cut-off memory = Math.max(600,containerized.heap-cutoff-ratio * TaskManager Memory), containerized.heap-cutoff-ratio default value is 0.25. For

Question about the flink 1.6 memory config

2020-03-31 Thread LakeShen
Hi community, Now I am optimizing the flink 1.6 task memory configuration. I see the source code, at first, the flink task config the cut-off memory, cut-off memory = Math.max(600,containerized.heap-cutoff-ratio * TaskManager Memory), containerized.heap-cutoff-ratio default value is 0.25. For

Re: Correct way to e2e test a Flink application?

2020-03-31 Thread Laurent Exsteens
Hi Tzu-Li, thanks a lot for your answer. I will try this! However, I was looking for something that does fully simulate a Flink cluster, including job-manager to task manager serialization issues and full isolation of the task managers (I guess in the MiniClusterResource, we are still on the

Re: State & Generics

2020-03-31 Thread Laurent Exsteens
Hello Mike, thanks for the info. I tried to do sth similar in Java. Not there yet but I think that should be feasible. However, like you said, that means additional operations for each event. Yesterday I managed to find another solution: create the type information outside of the class and pass

?????? ProcessWindowFunction??????????????state??

2020-03-31 Thread ????
?? ??if(stateDate.equals("") || stateDate.equals(date))pv_st.clear()1.??pv_st2.state.clear() nullnullpv_st.update(pv_st.value() +

Re: ProcessWindowFunction中如何有效清除state呢

2020-03-31 Thread Yun Tang
Hi 从代码看 if(stateDate.equals("") || stateDate.equals(date)) 无法判断究竟是从哪里获取到stateDate变量赋值,不清楚你这里里面的判断逻辑是否能生效。 其次,state.clear() 之后,再次获取时,返回值会是null,代码片段里面也没有看出来哪里有对这个的校验。 祝好 唐云 From: 守护 <346531...@qq.com> Sent: Tuesday, March 31, 2020 12:33 To: user-zh Subject:

Re: keyby的乱序处理

2020-03-31 Thread tingli ke
HI,再次补充一下我的场景,如下图所示: 1、kafka TopicA的Partiton1的数据包含3个user的数据 2、flink在对该分区生成了w1、w2、w3...的watermark 问题来了: 1、w1、w2、w3...的watermark只能保证user1、user2、user3的整体数据的有序处理对吗? 2、在对user1、user2、user3进行keyby后,w1、w2、w3...的watermark能保证user1或者user2或者user3的有序处理吗? 期待大神的回复! [image: image.png] jun su 于2020年3月31日周二

回复: Flink SQL中动态嵌套字段如何定义DDL

2020-03-31 Thread 111
Hi, 嗯,之前尝试了一下,没有写属性,所以没有值显示,还以为是不支持MAP。 使用的时候data[‘a’]就好了 Best, Xinghalo 在2020年03月31日 14:59,Benchao Li 写道: 可以尝试把data字段定义为一个map类型。 111 于2020年3月31日周二 下午2:56写道: Hi, 我们在使用streamsets作为CDC工具,输出到kafka中的内容是嵌套多变的类型,如: {database:a, table: b, type:update, data:{a:1,b:2,c:3}} {database:a, table: c,

Re: [Third-party Tool] Flink memory calculator

2020-03-31 Thread Yangze Guo
Hi, there. In the latest version, the calculator supports dynamic options. You could append all your dynamic options to the end of "bin/calculator.sh [-h]". Since "-tm" will be deprecated eventually, please replace it with "-Dtaskmanager.memory.process.size=". Best, Yangze Guo On Mon, Mar 30,

Re: [Third-party Tool] Flink memory calculator

2020-03-31 Thread Yangze Guo
Hi, there. In the latest version, the calculator supports dynamic options. You could append all your dynamic options to the end of "bin/calculator.sh [-h]". Since "-tm" will be deprecated eventually, please replace it with "-Dtaskmanager.memory.process.size=". Best, Yangze Guo On Mon, Mar 30,

Re: Flink SQL中动态嵌套字段如何定义DDL

2020-03-31 Thread Benchao Li
可以尝试把data字段定义为一个map类型。 111 于2020年3月31日周二 下午2:56写道: > Hi, > 我们在使用streamsets作为CDC工具,输出到kafka中的内容是嵌套多变的类型,如: > {database:a, table: b, type:update, data:{a:1,b:2,c:3}} > {database:a, table: c, type:update, data:{c:1,d:2}} > 请问这种类型该如何定义DDL? > > > Best, > Xinghalo > > -- Benchao Li School of

Flink SQL中动态嵌套字段如何定义DDL

2020-03-31 Thread 111
Hi, 我们在使用streamsets作为CDC工具,输出到kafka中的内容是嵌套多变的类型,如: {database:a, table: b, type:update, data:{a:1,b:2,c:3}} {database:a, table: c, type:update, data:{c:1,d:2}} 请问这种类型该如何定义DDL? Best, Xinghalo

Re: Log file environment variable 'log.file' is not set.

2020-03-31 Thread Robert Metzger
Hey Vitaliy, Check this documentation on how to use Flink with Hadoop: https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/hadoop.html For your setup, I would recommend referencing the Hadoop jars from your Hadoop vendor by setting export HADOOP_CLASSPATH=`hadoop