如何优雅的开发Flink SQL作业

2021-01-03 Thread HideOnBushKi
大佬们好,现在业务需求多,生产作业开始变得繁重,请教3个生产业务场景,主要是想看下各位有什么优雅的思路 1.kakfa table和group_ID应该怎么去维护呢?一个业务一张表吗? 2.如何做到复用表的效果? 3.如果新增一个业务需求,用到之前的一张kafka table,似乎要在一个程序里。执行多次 executeSql("sql 1")才不会乱序,但是这样的话,程序的耦合度会很高,请问该怎么优雅的处理这些场景呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Facing issues on kafka while running a job that was built with 1.11.2-scala-2.11 version onto flink version 1.11.2-scala-2.12

2021-01-03 Thread Yun Gao
Hi Narasimha, Since the Kafka-connect itself is purely implemented with Java, thus I guess that with high probabililty it is not the issue of scala version. I think may first have a check of the kafka cluster's status ? Best, Yun

Re: Tumbling Time Window

2021-01-03 Thread Yun Gao
Hi Navneeth For me I think you may start with using the window function and an example for the custom window function could be found in [1]. From the description I think it should be a standard Tumbling window, if implementing with the customized process function, it would end up have a

Re: flink1.12????OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread ??????

回复: flink1.12错误OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread 刘海
这个应该和内存有关,我之前试过,存储的状态无限增长,导致运行几分钟后任务结束,并抛出异常,可以尝试一下加大内存和清理状态 | | 刘海 | | liuha...@163.com | 签名由网易邮箱大师定制 在2021年1月4日 11:35,咿咿呀呀<201782...@qq.com> 写道: 我按照您这个修改了,跟我之前的也是一样的。能运行的通,输出的结果也是正确的,现在最大的问题是——运行一段时间后(3分钟左右)就出现了OSError: Expected IPC message of type schema but got record batch这个错误 -- Sent

Re: flink1.12错误OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread 咿咿呀呀
我按照您这个修改了,跟我之前的也是一样的。能运行的通,输出的结果也是正确的,现在最大的问题是——运行一段时间后(3分钟左右)就出现了OSError: Expected IPC message of type schema but got record batch这个错误 -- Sent from: http://apache-flink.147419.n8.nabble.com/

回复: Flink SQL DDL Schema csv嵌套json

2021-01-03 Thread amen...@163.com
Flink版本 1.12.0 发件人: amen...@163.com 发送时间: 2021-01-03 16:09 收件人: user-zh 主题: Flink SQL DDL Schema csv嵌套json hi everyone, zhangsan|man|28|goodst...@gmail.com|{"math":98, "language":{"english":89, "french":95}}|china|beijing 这是一条来自kafka消息队列中的数据,当我创建kafka ddl为之定义schema时,报出异常信息: Exception in

Re:flink1.11 mysql cdc checkpoint 失败后程序自动恢复,同步数据出现重复

2021-01-03 Thread smailxie
在程序自动重启恢复的时候,binlog可能被MySQL服务器删除了,导致debeziume connector读取了新的快照。 参考连接:https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-purges-binlog-files_debezium -- Name:谢波 Mobile:13764228893 在 2021-01-04 10:38:30,"lingchanhu" 写道: >sourcr:mysql-cdc

flink1.11 mysql cdc checkpoint 失败后程序自动恢复,同步数据出现重复

2021-01-03 Thread lingchanhu
sourcr:mysql-cdc sink:elasticsearch 问题描述: 从mysql中同步表数据至elasticsearch后,进行新增再删除的某条数据出现问题,导致sink失败(没加primary key)。checkpoint失败,程序自动恢复重启后,checkpoint 成功,但是elasticsearch 中的数据是mysql 表中的两倍,出现重复同步情况。 程序的自动恢复不应该是从当前checkpoint 中记录的binlog 位置再同步么?为什么会再重头同步一次呢? (ddl 中写死了server-id, "

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-03 Thread Xingbo Huang
@Gordon Thanks a lot for the release and for being the release manager. And thanks to everyone who made this release possible! Best, Xingbo Till Rohrmann 于2021年1月3日周日 下午8:31写道: > Great to hear! Thanks a lot to everyone who helped make this release > possible. > > Cheers, > Till > > On Sat, Jan

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-03 Thread Xingbo Huang
@Gordon Thanks a lot for the release and for being the release manager. And thanks to everyone who made this release possible! Best, Xingbo Till Rohrmann 于2021年1月3日周日 下午8:31写道: > Great to hear! Thanks a lot to everyone who helped make this release > possible. > > Cheers, > Till > > On Sat, Jan

Re: flink1.12????OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread ??????
??Xingbo

Re: crontab通过脚本启动flink-job失败,flink-sql-parquet_2.11-1.12.0.jar does not exist

2021-01-03 Thread 冯嘉伟
hi! java.io.FileNotFoundException: File file:/home/xjia/.flink/... 可以看出,从本地加载jar包,而不是hdfs。 我觉得可能是hadoop环境的问题,导致读取的scheme是file,使用 echo $HADOOP_CLASSPATH 检查你的环境。 Important Make sure that the HADOOP_CLASSPATH environment variable is set up (it can be checked by running echo $HADOOP_CLASSPATH). If

Re: flink1.12错误OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread Xingbo Huang
Hi, 不好意思,这么晚才回复。因为你这个报错是发生在数据反序列的过程中,还没有到你写的函数体的具体内容。我看你pandas udf的声明没有问题,那就得看下你的如何使用的了。我写了一个简化版本的,Array[String]作为输入,并且作为输出的,运行起来没有问题。 from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, EnvironmentSettings, DataTypes from pyflink.table.udf

Re: flink1.12错误OSError: Expected IPC message of type schema but got record batch

2021-01-03 Thread 咿咿呀呀
社区的各位大神,有没有碰到这个问题的,请教。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

回复:FlinkSQL 下推的值类型与字段类型不对应

2021-01-03 Thread automths
谢谢你的回答。 但是我的col1,col2就已经是SMALLINT类型的了,我的问题是where条件中值下推过程中是Integer类型的,我希望值也是SMALLINT的。 祝好! | | automths | | autom...@163.com | 在2020年12月31日 18:17,whirly 写道: Hi. 查询语句中可以使用 cast 内置函数将值强制转换为指定的类型,如 select CAST(A.`col1` AS SMALLINT) as col1 from table 参考:

Flink SQL>查询的hive表数据全部为NULL

2021-01-03 Thread Jacob
Dear All, Flink SQL>select * from table1; 在Flink客户端查询hive数据,结果每个字段全为NULL,但数据条数是对的,select count是正确的,查具体数据就是NULL,不知何故?同样的查询在hive客户端是可以查到数据的 hive表时orc文件load的数据。 - Thanks! Jacob -- Sent from: http://apache-flink.147419.n8.nabble.com/

?????? flink 1.12 Cancel Job??????????(??)

2021-01-03 Thread ??????
??yarn-cluster23:50??/opt/module/hadoop3.2.1/bin/yarn application -kill application_1609656886263_0043??kill??job1:30??kill-job??state?? ----

Re: CICD

2021-01-03 Thread Navneeth Krishnan
Thanks Vikash for the response. Yes thats very much feasible but we are planning to move to job/application cluster model where in the artifacts are bundled inside the container. When there is a new container image then we might have to do the following. - Take a savepoint - Upgrade the JM and TM

Re: CICD

2021-01-03 Thread Vikash Dat
Could you not use the JM web address to utilize the rest api? You can start/stop/save point/restore + upload new jars via the rest api. While I did not run on ECS( ran on EMR) I was able to use the rest api to do deployments. On Sun, Jan 3, 2021 at 19:09 Navneeth Krishnan wrote: > Hi All, > >

CICD

2021-01-03 Thread Navneeth Krishnan
Hi All, Currently we are using flink in session cluster mode and we manually deploy the jobs i.e. through the web UI. We use AWS ECS for running the docker container with 2 services definitions, one for JM and other for TM. How is everyone managing the CICD process? Is there a better way to run a

Tumbling Time Window

2021-01-03 Thread Navneeth Krishnan
Hello All, First of all Happy New Year!! Thanks for the excellent community support. I have a job which requires a 2 seconds tumbling time window per key, For each user we wait for 2 seconds to collect enough data and proceed to further processing. My question is should I use the regular DSL

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-03 Thread Till Rohrmann
Great to hear! Thanks a lot to everyone who helped make this release possible. Cheers, Till On Sat, Jan 2, 2021 at 3:37 AM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community released the second bugfix release of the > Stateful Functions (StateFun) 2.2 series, version 2.2.2. > > *We

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-03 Thread Till Rohrmann
Great to hear! Thanks a lot to everyone who helped make this release possible. Cheers, Till On Sat, Jan 2, 2021 at 3:37 AM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community released the second bugfix release of the > Stateful Functions (StateFun) 2.2 series, version 2.2.2. > > *We

Flink SQL DDL Schema csv嵌套json

2021-01-03 Thread amen...@163.com
hi everyone, zhangsan|man|28|goodst...@gmail.com|{"math":98, "language":{"english":89, "french":95}}|china|beijing 这是一条来自kafka消息队列中的数据,当我创建kafka ddl为之定义schema时,报出异常信息: Exception in thread "main" java.lang.IllegalArgumentException: Only simple types are supported in the second level nesting of