Re: Re: Handling "Global" Updating State

2021-05-18 Thread Yun Gao
Hi Rion, Sorry for the late reply, another simpler method might indeed be in initializeState, the operator directly read the data from the kafka to initialize the state. Best, Yun --Original Mail -- Sender:Rion Williams Send Date:Mon May 17 19:53:35 2021

请问flink 什么时候支持读写ACID的hive表

2021-05-18 Thread youngysh
hi 我们使用 flink.1.12 读取 ACID hive table 时报错(Reading or writing ACID table %s is not supported),我们尝试修改源码放开这个限制也会出现后续的一些错误如(cast转换 BytesColumnVector 为 LongColumnVector 出错)。 背景:目前我们生产想采用 flink 做 ETL 等数据迁移工作,对应的hive都是hive 3.0左右的版本或者hive

Re: 请教,有哪些连接器能让我获得一个非 update/delete,但又有主键的流呢?

2021-05-18 Thread HunterXHunter
json格式改debezium-json试试 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Root Exception can not be shown on Web UI in Flink 1.13.0

2021-05-18 Thread Gary Wu
Thanks! I have updated the detail and task manager log in https://issues.apache.org/jira/browse/FLINK-22688. Regards, -Gary On Tue, 18 May 2021 at 16:22, Matthias Pohl wrote: > Sorry, for not getting back earlier. I missed that thread. It looks like > some wrong assumption on our end. Hence,

DataStream API Batch Execution Mode restarting...

2021-05-18 Thread Marco Villalobos
I have a DataStream running in Batch Execution mode within YARN on EMR. My job failed an hour into the job two times in a row because the task manager heartbeat timed out. Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation. thank you.

Re: 如何统计n小时内,flink成功从kafka消费的数据量?

2021-05-18 Thread silence
可以在metrics 上报时或落地前对source两次上报间隔的numRecordsOut值进行相减,最后呈现的时候按时间段累计就可以了 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 如何统计n小时内,flink成功从kafka消费的数据量?

2021-05-18 Thread zzzyw
是的,我们把这个信息集成到监控系统里面,其中有一项是统计随机时间的(例如最近1天,最近1周),kafka成功消费的数据量。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 如何统计n小时内,flink成功从kafka消费的数据量?

2021-05-18 Thread yidan zhao
步长多少?随时随刻的最近24小时? zzzyw 于2021年5月18日周二 下午9:33写道: > Hi 各位, > 我需要统计出flink最近 n小时(例如24小时?) 成功从kafka中消费的数据量,有什么比较好的方案吗? > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

DataStream Batch Execution Mode and large files.

2021-05-18 Thread Marco Villalobos
Hi, I am using the DataStream API in Batch Execution Mode, and my "source" is an s3 Buckets with about 500 GB of data spread across many files. Where does Flink stored the results of processed / produced data between tasks? There is no way that 500GB will fit in memory. So I am very curious

Guidance for Integration Tests with External Technologies

2021-05-18 Thread Rion Williams
Hey all, I’ve been taking a very TDD-oriented approach to developing many of the Flink apps I’ve worked on, but recently I’ve encountered a problem that has me scratching my head. A majority of my integration tests leverage a few external technologies such as Kafka and typically a relational

Re: RabbitMQ source does not stop unless message arrives in queue

2021-05-18 Thread Austin Cawley-Edwards
Hey all, Thanks for the details, John! Hmm, that doesn't look too good either  but probably a different issue with the RMQ source/ sink. Hopefully, the new FLIP-27 sources will help you guys out there! The upcoming HybridSource in FLIP-150 [1] might also be interesting to you in finely

Re: Prometheus Reporter Enhancement

2021-05-18 Thread Chesnay Schepler
There is already a ticket for this. Note that this functionality should be implemented in a generic fashion to be usable for all reporters. https://issues.apache.org/jira/browse/FLINK-17495 On 5/18/2021 8:16 PM, Andrew Otto wrote: Sounds useful! On Tue, May 18, 2021 at 2:02 PM Mason Chen

Re: Prometheus Reporter Enhancement

2021-05-18 Thread Andrew Otto
Sounds useful! On Tue, May 18, 2021 at 2:02 PM Mason Chen wrote: > Hi all, > > Would people appreciate enhancements to the prometheus reporter to include > extra labels via a configuration, as a contribution to Flink? I can see it > being useful for adding labels that are not job specific, but

Prometheus Reporter Enhancement

2021-05-18 Thread Mason Chen
Hi all, Would people appreciate enhancements to the prometheus reporter to include extra labels via a configuration, as a contribution to Flink? I can see it being useful for adding labels that are not job specific, but infra specific. The change would be nicely integrated with the Flink’s

Re: After upgrade from 1.11.2 to 1.13.0 parameter taskmanager.numberOfTaskSlots set to 1.

2021-05-18 Thread Alexey Trenikhun
If flink-conf.yaml is readonly, flink will complain but work fine? From: Chesnay Schepler Sent: Wednesday, May 12, 2021 5:38 AM To: Alex Drobinsky Cc: user@flink.apache.org Subject: Re: After upgrade from 1.11.2 to 1.13.0 parameter

Issue reading from S3

2021-05-18 Thread Angelo G.
Hi, I'm trying to read from and write to S3 with Flink 1.12.2. I'm submitting the job to local cluster (tar.gz distribution). I do not have a Hadoop installation running in the same machine. S3 (not Amazon) is running in a remote location and I have access to it via endpoint and access/secret

Fastest way for decent lookup JOIN?

2021-05-18 Thread Theo Diefenthal
Hi there, I have the following (probably very common) usecase: I have some lookup data ( 100 million records ) which change only slowly (in the range of some thousands per day). My event stream is in the order of tens of billions events per day and each event needs to be enriched from the 100

Re: Helm chart for Flink

2021-05-18 Thread Austin Cawley-Edwards
Hey all, Yeah, I'd be interested to see the Helm pre-upgrade hook setup, though I'd agree with you, Alexey, that it does not provide enough control to be a stable solution. @Pedro Silva I don't know if there are talks for an official operator yet, but Kubernetes support is important to the

请教,有哪些连接器能让我获得一个非 update/delete,但又有主键的流呢?

2021-05-18 Thread LittleFall
背景:我想试用 flink sql 的 deduplicate 处理一个带*主键*的流,我发现 1. 如果我使用 mysql-cdc 获得一个流,它会报错 Deduplicate doesn't support consuming update and delete changes 2. 如果我使用 kafka json 获得一个流,虽然 deduplicate 不报错,但是不能设置主键,报错 The Kafka table '...' with 'json' format doesn't support defining PRIMARY KEY constraint on the

Re: SIGSEGV error

2021-05-18 Thread Till Rohrmann
Great to hear that you fixed the problem by specifying an explicit serializer for the state. Cheers, Till On Tue, May 18, 2021 at 9:43 AM Joshua Fan wrote: > Hi Till, > I also tried the job without gzip, it came into the same error. > But the problem is solved now. I was about to give up to

如何统计n小时内,flink成功从kafka消费的数据量?

2021-05-18 Thread zzzyw
Hi 各位, 我需要统计出flink最近 n小时(例如24小时?) 成功从kafka中消费的数据量,有什么比较好的方案吗? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: PyFlink UDF: No match found for function signature XXX

2021-05-18 Thread Yik San Chan
With the help from Dian and friends, it turns out the root cause is: When it `create_temporary_function`, it is in the default catalog. However, when it `execute_sql(TRANSFORM)`, it is in the "hive" catalog. A function defined as a temporary function in catalog "default" is not accessible from

Flink upgraded from 1.10.0 to 1.12.0

2021-05-18 Thread 王炳焱
When I upgraded from Flink1.10.0 to Flink1.12.0. Unable to restore SavePoint And prompt the following error 2021-05-14 22:02:44,716 WARN org.apache.flink.metrics.MetricGroup [] - The operator name Calc(select=[((CAST((log_info get_json_object2

Re:Re:Re:flink sql写mysql中文乱码问题

2021-05-18 Thread 王炳焱
那你就要看一下你数据库表的每个字段的编码格式是什么?有没有调整编码格式?我这边是可以的 在 2021-05-18 18:19:31,"casel.chen" 写道: >我的URL连接串已经使用了 useUnicode=truecharacterEncoding=UTF-8 结果还是会有乱码 > > > > > > > > > > > > > > > > > >在 2021-05-18 17:21:12,"王炳焱" <15307491...@163.com> 写道:

Flink upgraded to version 1.12.0 and started from SavePoint to report an error

2021-05-18 Thread 王炳焱
我从flink1.10升级到flink1.12时,flink1.10的SQLapi的任务无法从savePoint中进行恢复,报错如下: 2021-05-14 22:02:44,716 WARN org.apache.flink.metrics.MetricGroup [] - The operator name Calc(select=[((CAST((log_info get_json_object2 _UTF-16LE'eventTime')) / 1000) FROM_UNIXTIME

Stop command failure

2021-05-18 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, Stop command is failing with below error with apache flink 1.12.3 version. Could you pls help. log":"[Flink-RestClusterClient-IO-thread-2] org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel Force-closing a channel whose registration task was not accepted by an event loop:

Re: PyFlink UDF: No match found for function signature XXX

2021-05-18 Thread Yik San Chan
Hi Dian, I changed the udf to: ```python @udf( input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()], result_type=DataTypes.BIGINT(), ) def add(i, j): return i + j ``` But I still get the same error. On Tue, May 18, 2021 at 5:47 PM Dian Fu wrote: > Hi Yik San, > > The expected input types for

Re:Re:flink sql写mysql中文乱码问题

2021-05-18 Thread casel.chen
我的URL连接串已经使用了 useUnicode=truecharacterEncoding=UTF-8 结果还是会有乱码 在 2021-05-18 17:21:12,"王炳焱" <15307491...@163.com> 写道: >你在flinkSQL连接mysql表的时候配置url=jdbc:mysql://127.0.0.1:3306/database?useUnicode=truecharacterEncoding=UTF-8,像这样CREATE > TABLE jdbc_sink(id INT COMMENT '订单id',

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-18 Thread Ingo Bürk
Hi Jin, 1) As far as I know the order is only guaranteed for events from the same partition. If you want events across partitions to remain in order you may need to use parallelism 1. I'll attach some links here which might be useful:

(无主题)

2021-05-18 Thread maozhaolin
在yarn模式下提交flink任务总是报没有这个东西是为啥啊 org.apache.flink.metrics.MetricGroup.addGroup(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/flink/metrics/MetricGroup; | | mao18698726900 | | 邮箱:mao18698726...@163.com | 签名由 网易邮箱大师 定制

Re: Getting error in pod template

2021-05-18 Thread Yang Wang
Could you share how you are starting the Flink native k8s application with pod template? Usually it look like the following commands. And you need to have the Flink binary on your local machine. Please note that pod template is working with native K8s mode only. And you could not use the

Re: 回复: 回复:关于watermark和window

2021-05-18 Thread yidan zhao
没什么传不传的。processWatermark中应该就会直接将2窗口都处理掉了。 曲洋 于2021年5月10日周一 上午10:31写道: > > > > 对的,就是两个窗口同时存在,(3,1,2)(6,5,4)这就是两个窗口,然后watermark(7)来了,但是我不知道这个watermark是同时触发 > > 两个窗口都计算,还是触发第一个,因为watermark也是一种特殊的数据,我看源码也没有找到当一个operater进行processWatermark之后 > 会不会把这个watermark进一步传递给并行的同时存在的窗口,也来触发它。 > > > > > > > > >

Re: PyFlink UDF: No match found for function signature XXX

2021-05-18 Thread Dian Fu
Hi Yik San, The expected input types for add are DataTypes.INT, however, the schema of aiinfra.mysource is: a bigint and b bigint. Regards, Dian > 2021年5月18日 下午5:38,Yik San Chan 写道: > > Hi, > > I have a PyFlink script that fails to use a simple UDF. The full script can > be found below: >

PyFlink UDF: No match found for function signature XXX

2021-05-18 Thread Yik San Chan
Hi, I have a PyFlink script that fails to use a simple UDF. The full script can be found below: ```python from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import ( DataTypes, EnvironmentSettings, SqlDialect, StreamTableEnvironment, ) from pyflink.table.udf import udf

flink mysql cdc支持mysql的json格式吗?

2021-05-18 Thread 董建
flink mysql cdc支持mysql的json格式吗?

Re:flink sql写mysql中文乱码问题

2021-05-18 Thread 王炳焱
你在flinkSQL连接mysql表的时候配置url=jdbc:mysql://127.0.0.1:3306/database?useUnicode=truecharacterEncoding=UTF-8,像这样CREATE TABLE jdbc_sink(id INT COMMENT '订单id',goods_name VARCHAR(128) COMMENT '商品名称',price DECIMAL(32,2) COMMENT '商品价格', user_name VARCHAR(64) COMMENT '用户名称') WITH (

Re: Root Exception can not be shown on Web UI in Flink 1.13.0

2021-05-18 Thread Matthias Pohl
Sorry, for not getting back earlier. I missed that thread. It looks like some wrong assumption on our end. Hence, Yangze and Guowei are right. I'm gonna look into the issue. Matthias On Fri, May 14, 2021 at 4:21 AM Guowei Ma wrote: > Hi, Gary > > I think it might be a bug. So would you like to

Re: SIGSEGV error

2021-05-18 Thread Joshua Fan
Hi Till, I also tried the job without gzip, it came into the same error. But the problem is solved now. I was about to give up to solve it, I found the mail at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JVM-crash-SIGSEGV-in-ZIP-GetEntry-td17326.html. So I think maybe it

Re: Flink Python API and HADOO_CLASSPATH

2021-05-18 Thread Eduard Tudenhoefner
Hi Dian, thanks a lot for the explanation and help. Option 2) is what I needed and it works. Regards Eduard On Tue, May 18, 2021 at 6:21 AM Dian Fu wrote: > Hi, > > 1) The cause of the exception: > The dependencies added via pipeline.jars / pipeline.classpaths will be > used to construct user

Re: SIGSEGV error

2021-05-18 Thread Till Rohrmann
Hi Joshua, could you try whether the job also fails when not using the gzip format? This could help us narrow down the culprit. Moreover, you could try to run your job and Flink with Java 11 now. Cheers, Till On Tue, May 18, 2021 at 5:10 AM Joshua Fan wrote: > Hi all, > > Most of the posts

Re: How to change the record name of avro schema

2021-05-18 Thread 김영우
Arvid, I found a jira related to my issue. https://issues.apache.org/jira/browse/FLINK-18096 Added a comment and I think Seth's idea is way better than just renaming the current name of the record from avro schema. Thanks, Youngwoo On Mon, May 17, 2021 at 8:37 PM Youngwoo Kim (김영우) wrote: >