date:20200929

退订

2020-09-29 Thread 提运亨

退订

回复: Flink 1.11 table.executeInsert 程序退出

2020-09-29 Thread 史正超

这个是一个已知问题，可以看看这个jira: https://issues.apache.org/jira/browse/FLINK-18545 规避这个问题的话，可以不用执行 tableEnv.execute("jobname"); 直接用 executeSql 就可以了，遇到INSERT语句就能生成job了。发件人: HunterXHunter <1356469...@qq.com> 发送时间: 2020年9月30日 2:32 收件人: user-zh@flink.apache.org 主题: Flink 1.11

回复: 回复：关于flink sql cdc

2020-09-29 Thread 史正超

HI, Kyle Zhang, 我刚才重现了你的问题，虽然你的mysql binlog设置是ROW格式，但是不排除其它session更改了binlog_format格式。重现步骤： 1. 登录mysql客户端(注意用cmd登录) 执行语句， SET SESSION binlog_format='MIXED'; SET SESSION tx_isolation='REPEATABLE-READ'; COMMIT; 2. 随便update或者insert一条语句。然后就得到了和你一样的错误： 2020-09-30 10:46:37.607

Flink 1.11 table.executeInsert 程序退出

2020-09-29 Thread HunterXHunter

当我在使用 StreamTableEnvironment Api的时候； Table a = getStreamTable(getKafkaDataStream("test", "localhost:9092", "latest"),"topic,offset,msg"); tableEnv.createTemporaryView("test", a); tableEnv.executeSql(DDLSourceSQLManager.createCustomPrintlnRetractSinkTbl("printlnSink_retract"));

Re: Re: Flink消费kafka，突然间任务输出结果从原来的每秒几十万降低到几万每秒

2020-09-29 Thread Yang Peng

感谢回复，我们看了consumer的lag很小而且监控显示数据流入量也没明显变化但是感觉这部分数据只是offset被更新了但是数据没有消费到，这个问题之前没有遇到过这是突发发现的而且任务重启了没法jstack判断了 hailongwang <18868816...@163.com> 于2020年9月29日周二下午10:35写道： > > > > 不过也比较奇怪，Source 数据的 format的话，应该不会使得 CPU 降低，这期间 Iowait 高吗 > 也可以 jstack 采下堆栈看下，GC等看下。 > 至于 Source format

flink1.11.1 kafka eos checkpoints??????Timeout expired after 600000milliseconds

2020-09-29 Thread ????????

flink 1.9kafkaSemantic??EXACTLY_ONCE1.11?? ,checkpoints??InitProducerId??TransactionTimeout??MaxBlockMS org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60milliseconds while awaiting InitProducerId

回复：关于flink sql cdc

2020-09-29 Thread 谢治平

能不能退掉邮箱信息，退出 | | 谢治平 | | 邮箱：xiezhiping...@163.com | 签名由网易邮箱大师定制在2020年09月30日 09:24，Kyle Zhang 写道： show variables like '%binlog_format%'确实是ROW On Tue, Sep 29, 2020 at 7:39 PM Kyle Zhang wrote: > Hi,all > 今天在使用sql cdc中遇到以一个问题，版本1.11.2，idea中运行，我的ddl是 > CREATE TABLE mysql_binlog ( > id

Re: 关于flink sql cdc

2020-09-29 Thread Kyle Zhang

show variables like '%binlog_format%'确实是ROW On Tue, Sep 29, 2020 at 7:39 PM Kyle Zhang wrote: > Hi,all > 今天在使用sql cdc中遇到以一个问题，版本1.11.2，idea中运行，我的ddl是 > CREATE TABLE mysql_binlog ( > id INT NOT NULL, > emp_name STRING, > age INT > ) WITH ( > 'connector' = 'mysql-cdc', > 'hostname' =

Re: 关于flink sql cdc

2020-09-29 Thread Kyle Zhang

代码部分基本没有什么东西 public class CDC { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); env.setParallelism(1);

Re: Re: sql-cli执行sql报错

2020-09-29 Thread hl9...@126.com

没有修改kafka，就用官方的jar。后来我用1.11.2版本重新尝试了下，成功了，没有任何错误。这个问题就不纠结了 hl9...@126.com 发件人： Benchao Li 发送时间： 2020-09-29 18:17 收件人： user-zh 主题： Re: Re: sql-cli执行sql报错这个错误看起来比较奇怪。正常来讲flink-sql-connector-kafka_2.11-1.10.2.jar里面应该都是shaded之后的class了，但是却报了一个非shaded的ByteArrayDeserializer。

Flink 1.12 snapshot throws ClassNotFoundException

2020-09-29 Thread Lian Jiang

Hi, I use Flink source master to build a snapshot and use the jars in my project. The goal is to avoid hacky deserialization code caused by avro 1.8 in old Flink versions since Flink 1.12 uses avro 1.10. Unfortunately, the code throws below ClassNotFoundException. I have verified that the

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-29 Thread Austin Cawley-Edwards

Hey Till, Thanks for the reply -- I'll try to see if I can reproduce this in a small repo and share it with you. Best, Austin On Tue, Sep 29, 2020 at 3:58 AM Till Rohrmann wrote: > Hi Austin, > > could you share with us the exact job you are running (including the > custom window trigger)?

Re: Efficiently processing sparse events in a time windows

2020-09-29 Thread Steven Murdoch

Thanks David, this is very helpful. I'm glad that it's not just that I had missed something obvious from the (generally very clear) documentation. I found various features that felt almost right (e.g. the priority queue behind Timers) but nothing that did the job. The temporal state idea does

Re: Savepoint incomplete when job was killed after a cancel timeout

2020-09-29 Thread Paul Lam

Hi Till, Thanks a lot for the pointer! I tried to restore the job using the savepoint in a dry run, and it worked! Guess I've misunderstood the configuration option, and confused by the non-existent paths that the metadata contains. Best, Paul Lam Till Rohrmann 于2020年9月29日周二下午10:30写道： >

flink 1.11.2 Table sql聚合后设置并行度为2及以上，数据无法输出

2020-09-29 Thread 王刚

这个问题我们之前使用sql窗口的时候也遇到过，当时是在1.7版本的tablesource后面加了个rebanlance算子让数据少的kafka分区的subtask watermark均衡下发送自autohome 发件人: Benchao Li mailto:libenc...@apache.org>> 发送时间: 2020-09-29 18:10:42 收件人: user-zh mailto:user-zh@flink.apache.org>> 主题: Re: flink 1.11.2 Table

Re: Re: 回复： BLinkPlanner sql join状态清理

2020-09-29 Thread 刘大龙

Hi, MiniBatch Agg目前没有实现State TTl，我提了个PR修复这个问题，参考https://github.com/apache/flink/pull/11830 @Jark，辛苦有空时帮忙reveiw一下代码，这个问题越来越多用户用户遇到了。 > -原始邮件- > 发件人: "刘建刚" > 发送时间: 2020-09-29 18:27:47 (星期二) > 收件人: user-zh > 抄送: > 主题: Re: 回复： BLinkPlanner sql join状态清理 > >

Re:Re: Flink消费kafka，突然间任务输出结果从原来的每秒几十万降低到几万每秒

2020-09-29 Thread hailongwang

不过也比较奇怪，Source 数据的 format的话，应该不会使得 CPU 降低，这期间 Iowait 高吗也可以 jstack 采下堆栈看下，GC等看下。至于 Source format 能力的话，可以自己测试下单个线程的QPS多少，然后乘以 Partition个数就是了。 Best, Hailong Wang 在 2020-09-29 20:06:50，"Yang Peng" 写道： >感谢回复，我重启完任务之后消费恢复了，我查看了我们的监控（监控kafkamanager上groupid消费速度）发现消费速度并没有下降，目前分区是90

Re: Savepoint incomplete when job was killed after a cancel timeout

2020-09-29 Thread Till Rohrmann

Thanks for sharing the logs with me. It looks as if the total size of the savepoint is 335kb for a job with a parallelism of 60 and a total of 120 tasks. Hence, the average size of a state per task is between 2.5kb - 5kb. I think that the state size threshold refers to the size of the per task

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-29 Thread Till Rohrmann

For 1. I was wondering whether we can't write the leader connection information directly when trying to obtain the leadership (trying to update the leader key with one's own value)? This might be a little detail, though. 2. Alright, so we are having a similar mechanism as we have in ZooKeeper

Re: Flink消费kafka，突然间任务输出结果从原来的每秒几十万降低到几万每秒

2020-09-29 Thread Yang Peng

感谢回复，我重启完任务之后消费恢复了，我查看了我们的监控（监控kafkamanager上groupid消费速度）发现消费速度并没有下降，目前分区是90 flinkkafkaconsumer消费的并行度也是90 应该不是分区的问题，至于2这个source序列化耗时严重这个有什么方式可以查看吗？ hailongwang <18868816...@163.com> 于2020年9月29日周二下午8:59写道： > > > > Hi Yang Peng: > 根据你的描述，可以猜测瓶颈是在 Source 上，有可能以下情况： > 1. Kafka 集群和Flink

Re:Flink消费kafka，突然间任务输出结果从原来的每秒几十万降低到几万每秒

2020-09-29 Thread hailongwang

Hi Yang Peng: 根据你的描述，可以猜测瓶颈是在 Source 上，有可能以下情况： 1. Kafka 集群和Flink 集群之间的带宽被其它打满了。 2. Source 的序列化耗时严重，导致拉取变慢。可以尝试着扩kafka 分区，加大Source并发看下。 Best， Hailong Wang 在 2020-09-29 19:44:44，"Yang Peng" 写道： >请教大家一个问题，flink实时任务消费kafka写入到kafka，Flink版本是1.9.1 线上kafka集群为1.1.1 >kafka集群为容器化集群部署在K8s上，任务运行了很久

Flink消费kafka，突然间任务输出结果从原来的每秒几十万降低到几万每秒

2020-09-29 Thread Yang Peng

请教大家一个问题，flink实时任务消费kafka写入到kafka，Flink版本是1.9.1 线上kafka集群为1.1.1 kafka集群为容器化集群部署在K8s上，任务运行了很久今天突然发现任务的数据产出降低了很多，发现cp 没有问题 kafka消费没有积压，也没有反压，也没有任何异常日志，kafka集群也没有异常，flink集群也没有异常，但是发现监控上tm的cpu负载也降低了 tm上网卡流量也降低了，除此之外没有其他异常信息，大家又遇到这种情况的吗、

Re: Re: flink使用在docker环境中部署出现的两个问题

2020-09-29 Thread cxydeve...@163.com

你好,我这边看到您在另一个问题[1]中有做了相关的回答, 我在k8s上部署是遇到相同的问题,相同的错误,您这边是否有空帮忙试试看是不是flink-docker的bug, 还是我的什么配置错了我也发出了一个新的问题[2] [1]:http://apache-flink.147419.n8.nabble.com/flink-1-11-on-kubernetes-td4586.html#a4692

Re: Savepoint incomplete when job was killed after a cancel timeout

2020-09-29 Thread Till Rohrmann

Hi Paul, could you share with us the logs of the JobManager? They might help to better understand in which order each operation occurred. How big are you expecting the size of the state to be? If it is smaller than state.backend.fs.memory-threshold, then the state data will be stored in the

Re: Flink Batch 模式下，The rpc invocation size 113602196 exceeds the maximum akka framesize

2020-09-29 Thread zheng faaron

Hi，可以检查一下这个参数是否设置正确，也可以在jobmanager页面上看下是否有这个参数。我之前遇到过类似问题，设置这个参数可以解决问题。 Best, Faaron Zheng From: jy l Sent: Monday, September 28, 2020 4:57:46 PM To: user-zh@flink.apache.org Subject: Re: Flink Batch 模式下，The rpc invocation size 113602196 exceeds the maximum

Re: flink1.11.2基于官网在k8s上部署是正常的,但是加了volume配置之后报错Read-only file system

2020-09-29 Thread cxydeve...@163.com

官网例子[1]没有改动之前,是可以正常启动的原来的配置文件内容如下 ... "volumes": [ { "name": "flink-config-volume", "configMap": { "name": "flink-config", "items": [ { "key": "flink-conf.yaml", "path": "flink-conf.yaml" },

Savepoint incomplete when job was killed after a cancel timeout

2020-09-29 Thread Paul Lam

Hi, We have a Flink job that was stopped erroneously with no available checkpoint/savepoint to restore, and are looking for some help to narrow down the problem. How we ran into this problem: We stopped the job using cancel with savepoint command (for compatibility issue), but the command

关于flink sql cdc

2020-09-29 Thread Kyle Zhang

Hi,all 今天在使用sql cdc中遇到以一个问题，版本1.11.2，idea中运行，我的ddl是 CREATE TABLE mysql_binlog ( id INT NOT NULL, emp_name STRING, age INT ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'xxx', 'port' = '3306', 'username' = 'root', 'password' = 'root', 'database-name' = 'test', 'table-name' =

Re: 回复： flink sql count问题

2020-09-29 Thread Robin Zhang

Hi lemon，不是很理解你的疑问是什么，flink是事件驱动的，所以，来一条数据，就会被处理，走你的逻辑，就会产生一个结果，如果是第一次出现的key，只有一条数据，如果是状态中已经存在的key，会在控制台输出两条数据，一条true的是最终sink的结果。所以，每次输出一条结果有什么问题吗？ Best, Robin lemon wrote >

回复：flink1.11实时写入hive，写入速度很慢，checkpoint为60秒，并行度为1

2020-09-29 Thread Jun Zhang

你的kafka的分区数是多少，把flink的并行度加大到kafka的分区数。 BestJun -- 原始邮件 -- 发件人: me

flink1.11实时写入hive，写入速度很慢，checkpoint为60秒，并行度为1

2020-09-29 Thread me

flink1.11实时写入hive，写入速度很慢，checkpoint为60秒，并行度为1 tableEnv.executeSql("insert into dwd_security_log select * from " + table) 实际写入hive之后，查看hdfs上写入的文件为19M,这是60秒内写入hive的，flink流式写入hive通过checkpotin来把数据刷入hive中。请问大家只有有什么提升写入速度的参数或者方式吗？

Re: pyflink1.11 window groupby出错

2020-09-29 Thread Xingbo Huang

Hello, 现在的descriptor的方式存在很多bug，社区已经在进行重构了。当前你可以使用DDL[1]的方式来解决问题。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/#how-to-use-connectors Best, Xingbo 刘乘九于2020年9月29日周二下午5:46写道： > 各位大佬，我想尝试下pyflink 进行时间窗口下的指标统计，写了一个demo发现table APi 的group >

Re: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread todd

是不是和你上层依赖的jar冲突了?exclude剔除不了冲突jar吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 回复： BLinkPlanner sql join状态清理

2020-09-29 Thread 刘建刚

miniBatch下是无法ttl的，这个是修复方案：https://github.com/apache/flink/pull/11830 Benchao Li 于2020年9月29日周二下午5:18写道： > Hi Ericliuk, > > 这应该是实现的bug，你可以去社区建一个issue描述下这个问题。 > 有时间的话也可以帮忙修复一下，没有时间社区也会有其他小伙伴帮忙来修复的~ > > Ericliuk 于2020年9月29日周二下午4:59写道： > > >

Re: Apache Qpid connector.

2020-09-29 Thread Master Yoda

Hi Austin, thanks for the response. Yes, the protocol is AMQP. I will try out the RabbitMQ connector with Qpid thanks, Parag On Sat, Sep 26, 2020 at 12:22 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey (Master) Parag, > > I don't know anything about Apache Qpid, but from the

Re: Re: sql-cli执行sql报错

2020-09-29 Thread Benchao Li

这个错误看起来比较奇怪。正常来讲flink-sql-connector-kafka_2.11-1.10.2.jar里面应该都是shaded之后的class了，但是却报了一个非shaded的ByteArrayDeserializer。我感觉这个应该是你自己添加了一下比较特殊的逻辑导致的。可以介绍下你对kafka connector做了哪些改造么？ hl9...@126.com 于2020年9月28日周一下午6:06写道： > 按照您的方法重试了下，又报了另一个错误： > Flink SQL> CREATE TABLE tx ( > >

Re: flink 1.11.2 Table sql聚合后设置并行度为2及以上，数据无法输出

2020-09-29 Thread Benchao Li

这个问题的原因应该是你的kafka partition数量应该是大于1的，并且不是所有partition都有数据导致的。你可以检查下你的kafka topic。目前来讲，只要你的每个kafka 的partition都有数据，那么watermark应该是可以正常产生的。跟并行度无关。 Asahi Lee <978466...@qq.com> 于2020年9月27日周日下午6:05写道： > 你好！ > 我使用flink >

Re: 使用异步IO时，数据写入到capacity数后，卡住不再消费source端数据了。

2020-09-29 Thread 王敏超

嗯嗯，是的。安装大佬的方法，的确成功了。再次感谢大佬 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread Dream-底限

可以直接用yarnclient直接提交，flinkonyarn也是yarnclient提交的吧，不过感觉自己实现一遍挺麻烦的，我们最后也选的是process的方式 xiao cai 于2020年9月29日周二下午5:54写道： > 这个我们有尝试，遇到了classpath的问题，导致包冲突，无法启动进程，你们有遇到过相关的情况吗？ > > > 原始邮件 > 发件人: todd > 收件人: user-zh > 发送时间: 2020年9月29日(周二) 17:36 > 主题: Re: 怎么样在Flink中使用java代码提交job到yarn > > >

Re: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread xiao cai

这个我们有尝试，遇到了classpath的问题，导致包冲突，无法启动进程，你们有遇到过相关的情况吗？原始邮件发件人: todd 收件人: user-zh 发送时间: 2020年9月29日(周二) 17:36 主题: Re: 怎么样在Flink中使用java代码提交job到yarn https://github.com/todd5167/flink-spark-submiter 可以参考这个案例，用ClusterCLient提交。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink配置hdfs状态后端遇到的问题

2020-09-29 Thread jester_jim

Hi Robin Zhang, 其实在只要不是在根目录下创建文件夹，只要在我指定的目录下创建即可，我其实是有权限的，Hadoop管理员给我分配了一个目录，我想把目录设置到分配的目录，但是一直报这个错，想问一下除了创建job信息，Flink还有什么机制会去hdfs上创建文件或文件夹的？祝好！在 2020年9月29日 17:15，Robin Zhang 写道： Hi jester_jim，配置文件中指定的checkpoints（以后简称ckp）的目录只是一个父目录，flink在首次触发每个job的ckp时会在这个父目录下新建多级文件夹，命名为指定的job名字/job

Re: 使用异步IO时，数据写入到capacity数后，卡住不再消费source端数据了。

2020-09-29 Thread Benchao Li

你的timeout方法应该要正确的处理ResultFuture，比如ResultFuture.complete或者completeExceptionally，如果你什么都没做，那么这个异步请求就还没有真的结束。王敏超于2020年9月29日周二下午5:43写道： > AsyncDataStream > //顺序异步IO > .orderedWait(input, new AsyncDatabaseRequest(), 5000, > TimeUnit.MILLISECONDS, 1000) > >

?????? flink sql count????

2020-09-29 Thread lemon

count0 flinkcount??ifwhere?? ?? selectcount(if(name like '%',1 , null)) where name like'%' or name

Re: Poor performance with large keys using RocksDB and MapState

2020-09-29 Thread ירון שני

Thanks Yun!, I used this option, and it greatly helped 2:44 val be = new RocksDBStateBackend("file:///tmp")class MyConfig extends DefaultConfigurableOptionsFactory { override def createColumnOptions(currentOptions:

pyflink1.11 window groupby出错

2020-09-29 Thread 刘乘九

各位大佬，我想尝试下pyflink 进行时间窗口下的指标统计，写了一个demo发现table APi 的group 方法报错，网上搜索了一下相关内容也没有解决问题，想请各位大佬帮帮忙看一下是哪里写错了？错误信息： py4j.protocol.Py4JJavaError: An error occurred while calling o95.select. : org.apache.flink.table.api.ValidationException: A group window expects a time attribute for grouping in a

使用异步IO时，数据写入到capacity数后，卡住不再消费source端数据了。

2020-09-29 Thread 王敏超

AsyncDataStream //顺序异步IO .orderedWait(input, new AsyncDatabaseRequest(), 5000, TimeUnit.MILLISECONDS, 1000) 当我没重写timeout方法的时候，会执行这个报错信息 resultFuture.completeExceptionally(new TimeoutException("Async function call has timed out.")) 当我重写了timeout方法，如下，程序就卡住了，求大佬解答。 override def

Re:Re: Re: Re: 了解下大家生产中都用什么任务调度系统呢

2020-09-29 Thread Michael Ran

~.~ 不是有几百个star 嘛。海豚这个到apache 社区会强大些在 2020-09-29 16:45:30，"赵一旦" 写道： >看了下，hera算了，虽然文档看起来还行，但是5个star，不敢用。 >海豚这个看起来还不错，可以试试看。 > >Michael Ran 于2020年9月29日周二上午10:43写道： > >> ~。~ hera、海豚都行 >> 在 2020-09-29 09:58:45，"chengyanan1...@foxmail.com" < >> chengyanan1...@foxmail.com> 写道： >> > >> >Apache

Re: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread todd

https://github.com/todd5167/flink-spark-submiter 可以参考这个案例，用ClusterCLient提交。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 回复: Re: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread todd

https://github.com/todd5167/flink-spark-submiter 可以参考下 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink sql count问题

2020-09-29 Thread Robin Zhang

Hi lemon, 内部判断if函数可以替换为case when Best， Robin lemon wrote > 请教各位： > 我有一个sql任务需要进行count，在count中有一个表达式，只想count符合条件的记录， > 之前在hive中是这么写的：count(if(name like '南京%',1 , null))，但是flink > sql中count不能为null，有什么别的方法能实现该功能吗？ > 使用的是flink1.10.1 blink > -- Sent from:

Re: 如何在流式数据源上使用分析函数LAG和EAD函数

2020-09-29 Thread Robin Zhang

Hi Benchao，感谢回复，解决了我最近的疑惑。 Best， Robin Benchao Li-2 wrote > Hi Robin, > > 目前LAG/LEAD函数在流式场景下的实现的确是有bug的，那个实现只能在批式场景下work， > 是线上其实没有考虑流式的场景。所以你看到的结果应该是它只能返回当前数据。 > 这个问题我也是最近才发现的，刚刚建了一个issue[1] 来跟踪这个问题。 > 当前如果你想实现类似功能，可以先自己写一个udaf来做。 > > [1]

Re: 回复： BLinkPlanner sql join状态清理

2020-09-29 Thread Benchao Li

Hi Ericliuk, 这应该是实现的bug，你可以去社区建一个issue描述下这个问题。有时间的话也可以帮忙修复一下，没有时间社区也会有其他小伙伴帮忙来修复的~ Ericliuk 于2020年9月29日周二下午4:59写道： > 我最近也遇到了这个问题，看了下，blink，并且配置minibatch优化后就会不使用IdleStateRetention相关配置了。 > < > http://apache-flink.147419.n8.nabble.com/file/t491/Xnip2020-09-29_16-55-32.png> > > > 不太清楚为什么用了mini

Re: Flink配置hdfs状态后端遇到的问题

2020-09-29 Thread Robin Zhang

Hi jester_jim，配置文件中指定的checkpoints（以后简称ckp）的目录只是一个父目录，flink在首次触发每个job的ckp时会在这个父目录下新建多级文件夹，命名为指定的job名字/job id.所以，并不是新建父目录就可以，依然会存在权限问题。祝好，Robin Zhang Flink中文社区的各位大佬你们好：本人是小白，由于对Flink不太了解，想学，然后搭了个Flink standalone（1.11.2

Re: 如何在流式数据源上使用分析函数LAG和EAD函数

2020-09-29 Thread Benchao Li

Hi Robin, 目前LAG/LEAD函数在流式场景下的实现的确是有bug的，那个实现只能在批式场景下work，是线上其实没有考虑流式的场景。所以你看到的结果应该是它只能返回当前数据。这个问题我也是最近才发现的，刚刚建了一个issue[1] 来跟踪这个问题。当前如果你想实现类似功能，可以先自己写一个udaf来做。 [1] https://issues.apache.org/jira/browse/FLINK-19449 Robin Zhang 于2020年9月29日周二下午2:04写道： > 环境： flink 1.10，使用flinkSQL > >

Re: 回复： BLinkPlanner sql join状态清理

2020-09-29 Thread Ericliuk

我最近也遇到了这个问题，看了下，blink，并且配置minibatch优化后就会不使用IdleStateRetention相关配置了。不太清楚为什么用了mini batch就没读取这个配置。一个属于状态清理，一个属于agg优化，优化配置要影响到原来的状态query状态清理么？ -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Re: Re: 了解下大家生产中都用什么任务调度系统呢

2020-09-29 Thread 赵一旦

看了下，hera算了，虽然文档看起来还行，但是5个star，不敢用。海豚这个看起来还不错，可以试试看。 Michael Ran 于2020年9月29日周二上午10:43写道： > ~。~ hera、海豚都行 > 在 2020-09-29 09:58:45，"chengyanan1...@foxmail.com" < > chengyanan1...@foxmail.com> 写道： > > > >Apache DolphinScheduler 你值得拥有 > > > >https://dolphinscheduler.apache.org/zh-cn/ > > > > > > >

创建BatchTableEnvironment报错

2020-09-29 Thread hl9...@126.com

flink 1.11.2版本，我写了个WordCountTable例子，代码如下： public class WordCountTable { public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);

Re: Flink Batch Processing

2020-09-29 Thread Timo Walther

Hi Sunitha, currently, not every connector can be mixed with every API. I agree that it is confusing from time to time. The HBase connector is an InputFormat. DataSet, DataStream and Table API can work with InputFormats. The current Hbase input format might work best with Table API. If you

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-29 Thread Till Rohrmann

Hi Austin, could you share with us the exact job you are running (including the custom window trigger)? This would help us to better understand your problem. I am also pulling in Klou and Timo who might help with the windowing logic and the Table to DataStream conversion. Cheers, Till On Mon,

Re: Flink Batch Processing

2020-09-29 Thread Till Rohrmann

Hi Sunitha, here is some documentation about how to use the Hbase sink with Flink [1, 2]. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hbase.html [2] https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-hbase-connector.html Cheers, Till On

Re: Reading from HDFS and publishing to Kafka

2020-09-29 Thread Aljoscha Krettek

Hi, I actually have no experience running a Flink job on K8s against a kerberized HDFS so please take what I'll say with a grain of salt. The only thing you should need to do is to configure the path of your keytab and possibly some other Kerberos settings. For that check out [1] and [2].

Re: Flink Batch Processing

2020-09-29 Thread s_penakalap...@yahoo.com

Hi Piotrek, Thank you for the reply. Flink changes are good, However Flink is changing so much that we are unable to get any good implementation examples either on Flink documents or any other website. Using HBaseInputFormat I was able to read the data as a DataSet<>, now I see that DataSet

Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-29 Thread Till Rohrmann

Great, thanks Klou! Cheers, Till On Mon, Sep 28, 2020 at 5:07 PM Kostas Kloudas wrote: > Hi all, > > I will have a look. > > Kostas > > On Mon, Sep 28, 2020 at 3:56 PM Till Rohrmann > wrote: > > > > Hi Cristian, > > > > thanks for reporting this issue. It looks indeed like a very critical >

flinksql多sink案例

2020-09-29 Thread todd

最近在看flinksql优化部分的代码，Flink在对多sink情况下进行公共子图的优化。所以想问下finksql执行多sink的案例，在org.apache.flink.table.planner.plan.optimize.RelNodeBlockPlanBuilder#buildRelNodeBlockPlan中convertedRelNodes.size才能大于1. -- Sent from: http://apache-flink.147419.n8.nabble.com/

flink消费kafka，事务关闭问题

2020-09-29 Thread 宁吉浩

hi，all 最近在使用 flink 读写kafka，频繁输出procuder close 日志但是flink checkpoint观察一段时间没有失败，数据也写入到kafka了,观察kafka server.log 也没发现报错目前写入kafka的数据是否有丢失，暂时还没校验，有人可以给我解释下下列的日志是什么原因吗? 频繁输出下列日志 : 2020-09-29 14:13:28,916 INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.2.0 2020-09-29 14:13:28,916

Re: checkpoint rocksdb hdfs 如何协调，保证数据不丢失

2020-09-29 Thread 王冶

Hi~ 按你的问题顺序回答如下: 1. Flink中的RocksDB是支持保存到hdfs的，且支持的非常好，将rocksdb的存储路径设置为hdfs路径即可。 2. in-flight的数据是保存在本地磁盘的，仅当checkpoint的时候，才会将本地的状态拷贝到hdfs。而且checkpoint本身不会因为远程拷贝影响计算速度。 3.

Re:Re: flink基于源码如何编译打包生成flink-table-blink.jar

2020-09-29 Thread XiaChang

Hi Xingbo Huang 我试一下，非常感谢在 2020-09-29 13:53:02，"Xingbo Huang" 写道： >Hi XiaChang > >你可以在flink-table目录下执行打包命令。然后flink-table-uber-blink的target目录下生成的flink-table-uber-blink_2.11-1.12-SNAPSHOT.jar这个包就是你要的flink-table-blink_2.11-1.12-SNAPSHOT.jar > >Best, >Xingbo > >zilong xiao 于2020年9月29日周二

Flink配置hdfs状态后端遇到的问题

2020-09-29 Thread jester_jim

Flink中文社区的各位大佬你们好：本人是小白，由于对Flink不太了解，想学，然后搭了个Flink standalone（1.11.2 jdk1.8）集群，集群本身运行没有什么问题，作业运行也没什么问题。但是最近有用到状态后端，在配置hdfs的时候遇到了一个无法理解的问题，我在issues也没找到解决方法。问题大概是，我在Flink配置文件配置了HDFS状态后端，但是呢Hadoop（CDH2.5.0）是生产系统的集群，有kerberos认证，目录权限不全开放。Flink的配置信息如下： state.backend: filesystem # Directory for

Re: 回复: 怎么样在Flink中使用java代码提交job到yarn

2020-09-29 Thread chengyanan1...@foxmail.com

zeeplin的相关文档可以参考： https://www.yuque.com/jeffzhangjianfeng/gldg8w 这里写的很详细，收藏依赖慢慢看发件人： xiao cai 发送时间： 2020-09-29 13:13 收件人： user-zh 主题：回复: Re: 怎么样在Flink中使用java代码提交job到yarn 非常感谢建议，有zeeplin api的相关文档吗原始邮件发件人: chengyanan1...@foxmail.com 收件人: user-zh 发送时间: 2020年9月29日(周二) 09:54 主题: 回复:

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-29 Thread Yang Wang

Hi Till, thanks for your valuable feedback. 1. Yes, leader election and storing leader information will use a same ConfigMap. When a contender successfully performs a versioned annotation update operation to the ConfigMap, it means that it has been elected as the leader. And it will write the

如何在流式数据源上使用分析函数LAG和EAD函数

2020-09-29 Thread Robin Zhang

环境： flink 1.10，使用flinkSQL kafka输入数据如： {"t":"2020-04-01T05:00:00Z", "id":"1", "speed":1.0} {"t":"2020-04-01T05:05:00Z", "id":"1", "speed":2.0} {"t":"2020-04-01T05:10:00Z", "id":"1", "speed":3.0} {"t":"2020-04-01T05:15:00Z", "id":"1", "speed":4.0} {"t":"2020-04-01T05:20:00Z", "id":"1", "speed":5.0}

71 matches

Mail list logo