Re:Re: Flink SQL并发度设置问题

2020-12-27 Thread hailongwang
根据 keyGroup 的实现特性,并发度最好是 2 的 n 次方。 在 2020-12-28 10:38:23,"赵一旦" 写道: >是否有必要将并行度设置为128的约数我意思是。 > >Shengkai Fang 于2020年12月28日周一 上午10:38写道: > >> hi, 如果热点是某个key的数据量较大造成的,那么re-partition依旧无法解决这个问题。 >> 个人认为最好的解决办法是基于window的 mini-batch 以及 local-global agg,社区正在解这类问题,可以关注下[1] >> >>

Compiling Error for Flink 1.11.3

2020-12-27 Thread Le Xu
Hello! I was trying to compile flink 1.11.3 from github (branch release-1.11.3) but I'm getting the following error saying that it cannot find symbol (adding full trace at the end of the email). Here is my output for mvn -v -- I'm using maven 3.2.5. Apache Maven 3.2.5

Re: flink1.12????OSError: Expected IPC message of type schema but got record batch

2020-12-27 Thread ??????
??pyarrow??0.17.1??

Chaining 2 flink jobs through a KAFKA topic with checkpoint enabled

2020-12-27 Thread Daniel Peled
Hello, We have 2 flink jobs that communicate with each other through a KAFKA topic. Both jobs use checkpoints with EXACTLY ONCE semantic. We have seen the following behaviour and we want to make sure and ask if this is the expected behaviour or maybe it is a bug. When the first job produces a

Re: flink-connector-clickhouse写入ClickHouse 问题

2020-12-27 Thread 张锴
换个第三方工具看看 https://github.com/blynkkk/clickhouse4j cc.blynk.clickhouse clickhouse4j 1.4.4 DanielGu <610493...@qq.com> 于2020年12月28日周一 上午12:22写道: > 使用了阿里的包,写入clickhouse > 阿里云flink-connector-clickhouse写入ClickHouse > < >

Re: 邮件退订

2020-12-27 Thread Xingbo Huang
Hi, 退订请发邮件到 user-zh-unsubscr...@flink.apache.org 详细的可以参考 [1] [1] https://flink.apache.org/zh/community.html#section-1 Best, Xingbo ㊣ 俊 猫 ㊣ <877144...@qq.com> 于2020年12月27日周日 上午11:15写道: > 您好,邮件退订一下

Re: Registering RawType with LegacyTypeInfo via Flink CREATE TABLE DDL

2020-12-27 Thread Yuval Itzchakov
Hi Danny, Yes, I tried implementing the DataTypeFactory for the UDF using TypeInformationRawType (which is deprecated BTW, and there's no support for RawType in the conversion), didn't help. I did manage to get the conversion working using TableEnvironment.toAppendStream (I was previously

Re: flink1.12错误OSError: Expected IPC message of type schema but got record batch

2020-12-27 Thread Xingbo Huang
Hi, 你这个报错源自pyarrow反序列数据时的报错。你使用的pyarrow的版本能提供一下吗 pip list | grep pyarrow可以查看 Best, Xingbo 小学生 <201782...@qq.com> 于2020年12月28日周一 上午10:37写道: > 请教一下各位,使用pyflink中的向量化udf后,程序运行一段时间报错,查资料没有类似的问题,麻烦各位看看是咋回事 > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction 8:

Re: PyFflink UDF Permission Denied

2020-12-27 Thread Xingbo Huang
Hi Andrew, According to the error, you can try to check the file permission of "/test/python-dist-78654584-bda6-4c76-8ef7-87b6fd256e4f/python-files/site-packages/site-packages/pyflink/bin/pyflink-udf-runner.sh" Normally, the permission of this script would be -rwxr-xr-x Best, Xingbo Andrew

Re: 根据业务需求选择合适的flink state

2020-12-27 Thread news_...@163.com
这么做的前提是每条记录是顺序进入KAFKA的才行,但真实场景不能保证这条,有可能较早的打点记录却较晚进入kafka队列。 news_...@163.com 发件人: 张锴 发送时间: 2020-12-28 13:35 收件人: user-zh 主题: 根据业务需求选择合适的flink state 各位大佬帮我分析下如下需求应该怎么写 需求说明: 公司是做直播业务方面的,现在需要实时统计用户的在线时长,来源kafka,每分钟产生直播打点记录,数据中包含eventTime字段。举个例子,用户A

Re: Realtime Data processing from HBase

2020-12-27 Thread s_penakalap...@yahoo.com
Thanks Deepak.  Does this mean Streaming from HBase is not possible using current Streaming API? Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction)

根据业务需求选择合适的flink state

2020-12-27 Thread 张锴
各位大佬帮我分析下如下需求应该怎么写 需求说明: 公司是做直播业务方面的,现在需要实时统计用户的在线时长,来源kafka,每分钟产生直播打点记录,数据中包含eventTime字段。举个例子,用户A 在1,2,3分钟一直产生打点记录,那么他的停留时长就是3分钟,第5,6分钟又来了,那他的停留时长就是2分钟,只要是连续的打点记录就合成一条记录,每个直播间每个用户同理。 我的想法: 我现在的想法是按直播间ID和用户ID分组,然后process,想通过state方式来做,通过截取每条记录的event Time中的分钟数

Re: Realtime Data processing from HBase

2020-12-27 Thread Deepak Sharma
I would suggest another approach here. 1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka. 2.Flink streaming job would be the second job to read for kafka and process data. With the separation of the concern as above , maintaining it would be simpler.

Re: Realtime Data processing from HBase

2020-12-27 Thread s_penakalap...@yahoo.com
Hi Team, Kindly help me with some inputs.. I am using Flink 1.12. Regards,Sunitha. On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, s_penakalap...@yahoo.com wrote: Hi Team, I recently encountered one usecase in my project as described below: My data source is HBaseWe receive huge

Throwing Recoverable Exceptions from Tasks

2020-12-27 Thread Chirag Dewan
Hi, I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function.  My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I

Re: Flink SQL并发度设置问题

2020-12-27 Thread 赵一旦
是否有必要将并行度设置为128的约数我意思是。 Shengkai Fang 于2020年12月28日周一 上午10:38写道: > hi, 如果热点是某个key的数据量较大造成的,那么re-partition依旧无法解决这个问题。 > 个人认为最好的解决办法是基于window的 mini-batch 以及 local-global agg,社区正在解这类问题,可以关注下[1] > > [1]https://issues.apache.org/jira/browse/FLINK-19604 > > 赵一旦 于2020年12月28日周一 上午10:31写道: > > >

Re: flink1.12.0 HA on k8s native运行一段时间后jobmanager-leader产生大量ConfigMap问题

2020-12-27 Thread Yang Wang
感谢使用K8s的HA mode,你用的是Session模式还是Application模式 * 如果是Application模式,那在flink job达到terminal state(FAILED, CANCELED, SUCCEED)时会自动清理掉所有HA相关的ConfigMap,你可以在webui上面cancel任务或者用flink cancel,然后观察一下,应该不会有残留的 *

Re: flink 1.12.0 kubernetes-session部署问题

2020-12-27 Thread Yang Wang
你整个流程理由有两个问题: 1. 镜像找不到 原因应该是和minikube的driver设置有关,如果是hyperkit或者其他vm的方式,你需要minikube ssh到虚拟机内部查看镜像是否正常存在 2. JM链接无法访问 2020-12-27 22:08:12,387 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster session001 successfully, JobManager Web Interface:

Re: Flink SQL并发度设置问题

2020-12-27 Thread Shengkai Fang
hi, 如果热点是某个key的数据量较大造成的,那么re-partition依旧无法解决这个问题。 个人认为最好的解决办法是基于window的 mini-batch 以及 local-global agg,社区正在解这类问题,可以关注下[1] [1]https://issues.apache.org/jira/browse/FLINK-19604 赵一旦 于2020年12月28日周一 上午10:31写道: > 还有个问题。对于window算子来说,keyBy的partition的最大并行度会设置为下游算子的最大并行度。 > >

flink1.12????OSError: Expected IPC message of type schema but got record batch

2020-12-27 Thread ??????
??pyflink??udf Caused by: java.lang.RuntimeException: Error received from SDK harness for instruction 8: Traceback (most recent call last): File

Re: table rowtime timezome problem

2020-12-27 Thread Leonard Xu
Hi,Jiazhi > When DataStream is converted to table, eventTime is converted to > rowTime. Rowtime is 8 hours slow. How to solve this problem? The reason is that the only data type that used to define an event time in Table/SQL is TIMESTAMP(3), and TIMESTAMP type isn’t related to your

flink 1.12 插入 hive 表找不到 .staging_xxxx 文件

2020-12-27 Thread macdoor
flink 1.12 standalone cluster,定时batch 模式 insert overwrite 到 hive 表,会随机出现找不到 .staging_ 文件的错误,完整错误信息如下:org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at

flink1.12.0 HA on k8s native运行一段时间后jobmanager-leader产生大量ConfigMap问题

2020-12-27 Thread tao7
大家好,我使用native k8s方式部署flink1.12 HA到k8s一段时间后,jobmanager-leader产生了大量的ConfigMap,这些ConfigMaps都是正常需要的吗?大家都是如何清理和维护的呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Registering RawType with LegacyTypeInfo via Flink CREATE TABLE DDL

2020-12-27 Thread Danny Chan
> SQL parse failed. Encount What syntax did you use ? > TypeConversions.fromDataTypeToLegacyInfo cannot convert a plain RAW type back to TypeInformation. Did you try to construct type information by a new fresh TypeInformationRawType ? Yuval Itzchakov 于2020年12月24日周四 下午7:24写道: > An expansion

Re: Flink SQL并发度设置问题

2020-12-27 Thread 赵一旦
还有个问题。对于window算子来说,keyBy的partition的最大并行度会设置为下游算子的最大并行度。 然后假设我window的并行度为30,那么默认情况window的最大并行度是128。我在想,如果按照平均考虑,这种情况是不是从机制上就已经有大概率会导致数据倾斜了呢?设置成32对于128才可以均衡不是吗。 Shengkai Fang 于2020年12月27日周日 下午3:46写道: > 可以通过该配置[1]来设置 > > [1] > >

Re: checkpoint持久化问题

2020-12-27 Thread 赵一旦
首先,要保重在保留数量范围内。 其次,你的任务不能是stop的任务,flink会在任务stop的时候删除所有检查点。 cancel的时候不会删除。 Yun Tang 于2020年12月27日周日 下午5:55写道: > Hi > > 既然UI上已经显示成功了,一定是成功且成功保存到HDFS上了,可以看下父目录的情况,chk-x 目录可能随着新的checkpoint完成而被删除 > > 祝好 > 唐云 > > From: chen310 <1...@163.com> > Sent: Friday,

Re: Flink reads data from JDBC table only on startup

2020-12-27 Thread Danny Chan
Hi Taras ~ There is a look up cache for temporal join but it default is false, see [1]. That means, by default FLINK SQL would lookup the external databases on each record from the JOIN LHS. Did you use the temporal table join syntax or normal stream-stream join syntax ? The temporal table join

Re: Long latency when consuming a message from KAFKA and checkpoint is enabled

2020-12-27 Thread Danny Chan
Hi, Nick ~ The behavior is as expected, because Kafka source/sink relies on the Checkpoints to complement the exactly-once write semantics, a checkpoint snapshot the states on a time point which is used for recovering, the current internals for Kafka sink is that it writes to Kafka but only

Re: Command: Flink savepoint -d reported an error。

2020-12-27 Thread 赢峰
这个有解决吗?我的也是报 Missing required argument: savepoint path. Usage: bin/flink savepoint -d | | 赢峰 | | si_ji_f...@163.com | 签名由网易邮箱大师定制 On 09/30/2019 15:06,pengchengl...@163.com wrote:

Re: test

2020-12-27 Thread liang zhao
请不要发送这些无意义的邮件 > 2020年12月27日 23:19,蒋德祥 写道: > >

flink-connector-clickhouse写入ClickHouse 问题

2020-12-27 Thread DanielGu
使用了阿里的包,写入clickhouse 阿里云flink-connector-clickhouse写入ClickHouse 测试写入clickhouse ,返回如下,无报错,但并未成功写入,不知从何下手排查,请教各位大佬

test

2020-12-27 Thread 蒋德祥

flink 1.12.0 kubernetes-session部署问题

2020-12-27 Thread 陈帅
本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: git clone https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian docker build --tag flink:1.12.0-scala_2.12-java8 . cd flink-1.12.0 ./bin/kubernetes-session.sh \

PyFflink UDF Permission Denied

2020-12-27 Thread Andrew Kramer
Hi, I am using Flink in Zeppelin and trying to execute a UDF defined in Python. The problem is I keep getting the following permission denied error in the log: Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.io.IOException:

Re: checkpoint持久化问题

2020-12-27 Thread Yun Tang
Hi 既然UI上已经显示成功了,一定是成功且成功保存到HDFS上了,可以看下父目录的情况,chk-x 目录可能随着新的checkpoint完成而被删除 祝好 唐云 From: chen310 <1...@163.com> Sent: Friday, December 25, 2020 16:01 To: user-zh@flink.apache.org Subject: checkpoint持久化问题 问题: flink sql中设置了job挂掉后checkpoint保留