Late data acquisition

2020-04-28 Thread lec ssmi
Hi: can we get data later than watermark in sql ? Best Lec Ssmi

Re: ML/DL via Flink

2020-04-28 Thread Becket Qin
Hi Max, Thanks for the question and sharing your findings. To be honest, I was not aware some of the projects until I see your list. First, to answer you questions: > (i) Has anyone used them? While I am not sure about the number of users of every listed project, Alink is definitely used by

回复:flink 内存设置问题-metaspace 溢出

2020-04-28 Thread 了不起的盖茨比
谢谢大佬,我再去看看gc log -- 原始邮件 -- 发件人: Xintong Song

Re: 订阅成功了吗?

2020-04-28 Thread Forward Xu
成功了 欢迎多发问题咨询 了不起的盖茨比 <573693...@qq.com> 于2020年4月29日周三 下午12:16写道: >

Re: flink 内存设置问题-metaspace 溢出

2020-04-28 Thread Xintong Song
full GC 应该不是增大内存后才出现的,这一点可以确认一下 GC log。但是增大内存,可能会造成一次 full GC 花费的时间更长,从而导致 TM 心跳超时。 同样的,metaspace OOM 也可能是由于 GC 速度变慢造成的。JVM 有单独的线程负责 GC,通常是在 heap/direct/metaspace 这些区域用满之前达到某个阈值就开始 GC,如果 GC 速度慢于内存申请的速度,也有可能造成 OOM。 按照我们的经验,一个 TM 用 64G 也是比较大了,如果都是以 java heap 内存为主的话,那可能需要具体配一下 GC 策略。 Thank you~

?????? flink ????????????-metaspace ????

2020-04-28 Thread ??????????????
memoryfull gc??taskmanager??job??metaspace?? fullgc?? ---- ??:"Xintong

??????????????

2020-04-28 Thread ??????????????

Re: Configuring taskmanager.memory.task.off-heap.size in Flink 1.10

2020-04-28 Thread Xintong Song
Hi Jiahui, 'taskmanager.memory.task.off-heap.size' accounts for the off-heap memory reserved for your job / operators. There are other configuration options accounting for the off-heap memory usages for other purposes, e.g., 'taskmanager.memory.framework.off-heap'. The default

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-04-28 Thread Yang Wang
Hi Anuj, I think the exception you come across still because the hadoop version is 2.4.1. I have checked the hadoop code, the code line are exactly same. For 2.8.1, i also have checked the ruleParse. It could work. /** * A pattern for parsing a auth_to_local rule. */ private static final

flink sql job 并行度

2020-04-28 Thread lucas.wu
Hi all: 最近在使用flink sql 做一些计算任务,发现如果一条sql被解析成execute plan后可能会有多个job,但是这些job的并行度是一样的,目前来看,好像还不能对这些job进行并行度的调整,请问一下大家,有什么办法可能调整sql解析后的job的并行度呢?

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-28 Thread Yang Wang
Hi Averell, Hadoop could directly support S3AFileSystem. When you deploy a Flink job on YARN, the hadoop classpath will be added to JobManager/TaskManager automatically. That means you could use "s3a" schema without putting "flink-s3-fs-hadoop.jar" in the plugin directory. In K8s deployment, we

Configuring taskmanager.memory.task.off-heap.size in Flink 1.10

2020-04-28 Thread Jiahui Jiang
Hello! We are migrating our pipeline from Flink 1.8 to 1.10 in Kubernetes. In the first try, we simply copied the old 'taskmanager.heap.size' over to 'taskmanager.memory.flink.size'. This caused the cluster to OOM. Eventually we had to allocate a small amount of memory to

?????? ????????

2020-04-28 Thread ??????(Jiacheng Jiang)
java.lang.OutOfMemoryError: unable to create newnative thread??XmxMaxDirectMemorySize?? ---- ??:"LakeShen"http://apache-flink.147419.n8.nabble.com/flink-kafka-td2386.html [2]

Re: flink 内存设置问题-metaspace 溢出

2020-04-28 Thread Xintong Song
Metaspace OOM 通常是 JVM 加载的类过多导致的。TM 内存从 1568m 增大到 65536m,是否有增加 slot 的数量呢?这个有可能造成运行时加载的类变多,metaspace 大小不变的情况下也可能会触发 OOM。 目前社区已经收到许多反馈,关于 1.10.0 的默认 metaspace 大小可能不太合理,在 1.10.1 中会调大这个默认值。你这边也可以先把 taskmanager.memory.metaspace.size 调到 256m 试一试。 Thank you~ Xintong Song On Tue, Apr 28, 2020 at 7:21

Flink Weekly | 每周社区动态更新 - 2020/04/29

2020-04-28 Thread Benchao Li
大家好,本文为 Flink Weekly 的第十四期,由李本超整理、云邪Review。主要内容包括:近期社区开发进展,邮件问题答疑以及 Flink 最新社区动态及技术文章推荐。 DEVRelease Dian Fu 宣布 1.9.3 发布 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-3-released-td40730.html Yu Li 发起了1.10.1 RC1 的投票

Re: join state TTL

2020-04-28 Thread LakeShen
Thank you for the clarification. Jark Jark Wu 于2020年4月29日周三 上午10:39写道: > If 'uu' in stream A is not updated for more than 24 hours, then it will be > cleared. (blink planner) > The state expiration strategy is "not updated for more than x time". > > Best, > Jark > > On Wed, 29 Apr 2020 at

Re: join state TTL

2020-04-28 Thread Jark Wu
If 'uu' in stream A is not updated for more than 24 hours, then it will be cleared. (blink planner) The state expiration strategy is "not updated for more than x time". Best, Jark On Wed, 29 Apr 2020 at 10:19, LakeShen wrote: > Hi Jark, > > I am a little confused about how double stream

回复: flink背压问题

2020-04-28 Thread 阿华田
好的 感谢 | | 王志华 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2020年04月29日 10:29,Junzhong Qin 写道: 可以试一下Jsoniter, https://jsoniter.com/index.cn.html 阿华田 于2020年4月29日周三 上午10:07写道: 这个确实排查到了,主要是json解析那块耗时,老版本用的gson,现在改成fastjson了,解析速度提升了不少。看来大数据量的json解析还得是fastjson | | 王志华 | | a15733178...@163.com |

Re: flink背压问题

2020-04-28 Thread Junzhong Qin
可以试一下Jsoniter, https://jsoniter.com/index.cn.html 阿华田 于2020年4月29日周三 上午10:07写道: > > 这个确实排查到了,主要是json解析那块耗时,老版本用的gson,现在改成fastjson了,解析速度提升了不少。看来大数据量的json解析还得是fastjson > > > | | > 王志华 > | > | > a15733178...@163.com > | > 签名由网易邮箱大师定制 > > > 在2020年04月29日 10:02,LakeShen 写道: > Hi 阿华, > >

Re: Committing offsets to Kafka takes longer than the checkpoint interval.

2020-04-28 Thread LakeShen
Hi 首维, 你用的 Flink 版本是多少呢,然后你的 Checkpoint interval 设置的时间是多少,这两个信息提供一下。 Best, LakeShen 刘首维 于2020年4月28日周二 下午6:28写道: > Hi all, > > > > 今天发现有一个作业日志中连续打印下面这个报警 > > "Committing offsets to Kafka takes longer than the checkpoint interval. > Skipping commit of previous offsets because newer complete

回复: flink背压问题

2020-04-28 Thread 阿华田
这个确实排查到了,主要是json解析那块耗时,老版本用的gson,现在改成fastjson了,解析速度提升了不少。看来大数据量的json解析还得是fastjson | | 王志华 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2020年04月29日 10:02,LakeShen 写道: Hi 阿华, 数据延迟有可能是逻辑中某个环节比较耗时,比如查询 mysql,或者某处逻辑较复杂等等。 可以看看自己代码中,有么有比较耗时的逻辑。同时可以将自己认为比较耗时的地方,加上日志,看下处理时间。 Best, LakeShen 阿华田

Re: flink背压问题

2020-04-28 Thread LakeShen
Hi 阿华, 数据延迟有可能是逻辑中某个环节比较耗时,比如查询 mysql,或者某处逻辑较复杂等等。 可以看看自己代码中,有么有比较耗时的逻辑。同时可以将自己认为比较耗时的地方,加上日志,看下处理时间。 Best, LakeShen 阿华田 于2020年4月29日周三 上午9:21写道: > 好的 感谢大佬 > > > > | | > 王志华 > | > | > a15733178...@163.com > | > 签名由网易邮箱大师定制 > > > 在2020年04月29日 09:08,zhisheng 写道: > hi, > >

Re: Blink window and cube

2020-04-28 Thread 刘建刚
Thank you. I create an issue: https://issues.apache.org/jira/browse/FLINK-17446 > 2020年4月28日 下午7:57,Jark Wu-3 [via Apache Flink User Mailing List archive.] > 写道: > > Thanks for reporting this. I think this is a missing feature. We

回复: flink背压问题

2020-04-28 Thread 阿华田
好的 感谢大佬 | | 王志华 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2020年04月29日 09:08,zhisheng 写道: hi, 数据延迟不一定会产生背压,举个例子,Flink 写 HBase 的作业,Source 并行度为 5,Sink 并行度 10,这种情况下游写入速度很快的,可能写入速度超过 Flink 消费 Kafka 的速度,这种情况就不会出现背压的问题。 1、建议排查一下作业的并行度(可以设置和 Kafka 分区数一样); 2、背压监控是通过 Flink Web UI 监控查看的,还是通过指标来判断的?

Re: flink背压问题

2020-04-28 Thread zhisheng
hi, 数据延迟不一定会产生背压,举个例子,Flink 写 HBase 的作业,Source 并行度为 5,Sink 并行度 10,这种情况下游写入速度很快的,可能写入速度超过 Flink 消费 Kafka 的速度,这种情况就不会出现背压的问题。 1、建议排查一下作业的并行度(可以设置和 Kafka 分区数一样); 2、背压监控是通过 Flink Web UI 监控查看的,还是通过指标来判断的? 3、对于数据延迟建议还是得对 Kafka 消费的 Topic 进行消费组的监控,加上 Lag 告警,这样可以及时知道数据延迟情况 Best ! zhisheng 阿华田

Re: Publishing Sink Task watermarks outside flink

2020-04-28 Thread Shubham Kumar
Hi Timo, Yeah, I got the idea of getting access to timers through process function and had the same result which you explained that is a side output doesn't guarantee that the data is written out to sink. (so maybe Fabian in that post pointed out something else which I am missing). If I am

Re: Possible Bug

2020-04-28 Thread Robert Metzger
Hey, thanks a lot for filing a ticket! I put a link into SO. It might take a few days till there's a response on the ticket. On Tue, Apr 28, 2020 at 10:42 PM Marie May wrote: > Thanks for responding Robert. I reported the issue here: > > issues.apache.org/jira/browse/FLINK-17444 > > I do not

Possible Bug

2020-04-28 Thread Marie May
Hello I am running into the same issue this person posted here: https://stackoverflow.com/questions/61246683/flink-streamingfilesink-on-azure-blob-storage I see no one has answered so I thought maybe I could report it as a bug but the site said to mail here first if unsure its a bug or not

Re: RocksDB default logging configuration

2020-04-28 Thread Bajaj, Abhinav
Thanks Yun for your response. It seems creating the RocksDBStateBackend from the job requires providing the checkpoint URL whereas the savepoint url seems to default to “state.savepoints.dir” of the flink-conf.yaml. I was expecting similar behavior to create the RocksDBStateBackend without

Presentation - Real World Architectural Patterns with Apache Pulsar and Flink

2020-04-28 Thread Devin Bost
If anyone missed my presentation on Real-World Architectural Patterns with Apache Pulsar that covers a use case involving Apache *Flink* for distributed tracing, please check out the recording here: https://youtu.be/pmaCG1SHAW8 Devin G. Bost

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi Timo, Row will work definitely work at this point for sure, thank you for helping out. I opened a jira ticket: https://issues.apache.org/jira/browse/FLINK-17442 Gyula On Tue, Apr 28, 2020 at 6:48 PM Timo Walther wrote: > Hi Gyula, > > does `toAppendStream(Row.class)` work for you? The

Re: ML/DL via Flink

2020-04-28 Thread Timo Walther
Hi Max, as far as I know a better ML story for Flink is in the making. I will loop in Becket in CC who may give you more information. Regards, Timo On 28.04.20 07:20, m@xi wrote: Hello Flinkers, I am building a *streaming* prototype system on top of Flink and I want ideally to enable ML

Re: Publishing Sink Task watermarks outside flink

2020-04-28 Thread Timo Walther
Hi Shubham, you can call stream.process(...). The context of ProcessFunction gives you access to TimerService which let's you access the current watermark. I'm assuming your are using the Table API? As far as I remember, watermark are travelling through the stream even if there is no

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Hi Gyula, does `toAppendStream(Row.class)` work for you? The other methods take TypeInformation and might cause this problem. It is definitely a bug. Feel free to open an issue under: https://issues.apache.org/jira/browse/FLINK-12251 Regards, Timo On 28.04.20 18:44, Gyula Fóra wrote: Hi

Re: "Fill in" notification messages based on event time watermark

2020-04-28 Thread Timo Walther
Hi Manas, Reg. 1: I would recommend to use a debugger in your IDE and check which watermarks are travelling through your operators. Reg. 2: All event-time operations are only performed once the watermark arrived from all parallel instances. So roughly speaking, in machine time you can

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi Timo, I am trying to convert simply back to a DataStream. Let's say: DataStream> I can convert the DataStream into a table without a problem, the problem is getting a DataStream back. Thanks Gyula On Tue, Apr 28, 2020 at 6:32 PM Timo Walther wrote: > Hi Gyula, > > are you coming from

Re: Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Timo Walther
Hi Gyula, are you coming from DataStream API or are you trying to implement a source/sink? It looks like the array is currently serialized with Kryo. I would recommend to take a look at this class: org.apache.flink.table.types.utils.LegacyTypeInfoDataTypeConverter This is the current

Converting String/boxed-primitive Array columns back to DataStream

2020-04-28 Thread Gyula Fóra
Hi All! I have a Table with columns of ARRAY and ARRAY, is there any way to convert it back to the respective java arrays? String[] and Integer[] It only seems to work for primitive types (non null), date, time and decimal. For String for instance I get the following error: Query schema: [f0:

Re: "Fill in" notification messages based on event time watermark

2020-04-28 Thread Manas Kale
Hi David and Piotrek, Thank you both for your inputs. I tried an implementation with the algorithm Piotrek suggested and David's example. Although notifications are being generated with the watermark, subsequent transition events are being received after the watermark has crossed their timestamps.

Re: Blink window and cube

2020-04-28 Thread Jark Wu
Thanks for reporting this. I think this is a missing feature. We need to do something in the optimizer to make this possible. Could you please help to create a JIRA issue for this? Best, Jark On Tue, 28 Apr 2020 at 14:55, 刘建刚 wrote: > Hi, I find that blink planner supports CUBE. CUBE can

Re: join state TTL

2020-04-28 Thread Jark Wu
Hi Lec, StateTtlConfig in DataStream API is a configuration on specific state, not a job level configuration. TableConfig#setIdleStateRetentionTime in TableAPI is a job level configuration which will enable state ttl for all non-time-based operator states. In blink planner, the underlying of

flink ????????????-metaspace ????

2020-04-28 Thread ????
??:124G?? taskmanager.memory.process.size: 65536m es5-connector sink ??fullgc java.lang.OutOfMemoryError: Metaspace taskmanager.memory.process.size: 1568m es5-connector sink

Re: Flink Forward 2020 Recorded Sessions

2020-04-28 Thread Sivaprasanna
Awesome, thanks for the update! On Tue, Apr 28, 2020 at 3:43 PM Marta Paes Moreira wrote: > Hi again, > > You can find the first wave of recordings on Youtube already [1]. The > remainder will come over the course of the next few weeks. > > [1] >

Publishing Sink Task watermarks outside flink

2020-04-28 Thread Shubham Kumar
Hi everyone, I have a flink application having kafka sources which calculates some stats based on it and pushes it to JDBC. Now, I want to know till what timestamp is the data completely pushed in JDBC (i.e. no more data will be pushed to timestamp smaller or equal than this). There doesn't seem

Committing offsets to Kafka takes longer than the checkpoint interval.

2020-04-28 Thread 刘首维
Hi all, 今天发现有一个作业日志中连续打印下面这个报警 "Committing offsets to Kafka takes longer than the checkpoint interval. Skipping commit of previous offsets because newer complete checkpoint offsets are available. This does not compromise Flink's checkpoint integrity." 导致作业卡住无法继续消费Kafka topic

Re: Flink Forward 2020 Recorded Sessions

2020-04-28 Thread Marta Paes Moreira
Hi again, You can find the first wave of recordings on Youtube already [1]. The remainder will come over the course of the next few weeks. [1] https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7 On Fri, Apr 24, 2020 at 3:23 PM Sivaprasanna wrote: > Cool. Thanks for the

flink dataset 分组后拼接分组后内容

2020-04-28 Thread hery168
col1 col2 pid 1.0 2.0 1 2.0 2.0 1 1.0 2.0 1 3.0 2.0 1 1.0 2.0 1 1.0 2.0 2 1.0 2.0 2 1.0 2.0 2 1.0 2.0 2 1.0 2.0 2 各位大神,想问一下利用flink dataset 对pid 列进行分组,然后对分组后的col1列的内容进行拼接,如1.0#2.0#1.0#3.0 请问大家这个该怎么实现?

Re: Issue

2020-04-28 Thread Till Rohrmann
Hi Pavan, please post these kind of questions to the user ML. I've cross linked it now. Image attachments will be filtered out. Consequently, we cannot see what you have posted. Moreover, it would be good if you could provide the community with a bit more details what the custom way is and what

回复: 回复: 回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread 1101300123
结果的正确性问题?我不太明白。比如我的结果是 (true,(Mary,1)) (true,(Bob,1)) (false,(Mary,1)) (true,(Mary,2)) (true,(Liz,1)) (false,(Bob,1)) (true,(Bob,2)) 我只做upsert好像没什么问题。针对retract流处理没问题; 其实我还是不太明白 upsert 流 在2020年4月28日 14:34,Jark Wu 写道: 是能提高一定的效率。不过可能会导致结果正确性问题。 Best, Jark On Tue, 28 Apr 2020 at 14:16, 1101300123

回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread 1101300123
如果我的sink是mysql 支持主键索引,我可不可以理解处理逻辑是retract 和upsert是一样的;上游数据false标记的是失效的记录,我删除失效的或者更新失效的数据是无区别的; 其实我还是对retract流和upsert流有点疑问 https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/table/streaming/dynamic_tables.html Append-only stream: A dynamic table that is only modified by

Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-28 Thread 刘建刚
Thank you very much. It solved my problem. > 2020年4月22日 下午5:15,Jingsong Li [via Apache Flink User Mailing List archive.] > 写道: > > Hi, > > Sorry for the mistake, [1] is related, but this bug has been fixed totally in > [2], so the safe version should be 1.9.3+ and 1.10.1+, so there is

Blink window and cube

2020-04-28 Thread 刘建刚
Hi, I find that blink planner supports CUBE. CUBE can be used together with a field but not window. For example, the following SQL is not supported: SELECT A, B, sum(C) FROM person GROUP BY cube(A, B), TUMBLE(curTimestamp, interval '1' minute) The following error is reported. Is

join state TTL

2020-04-28 Thread lec ssmi
Hi: When the stream is joined with another stream , the cached stream data will be saved as a state and deleted as the watermark advances. I found that there is also a parameter that can set the state expiration time, such as StateTtlConfig in DataStream API, TableConfig in TableAPI .This

Re: 回复: 回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread Jark Wu
是能提高一定的效率。不过可能会导致结果正确性问题。 Best, Jark On Tue, 28 Apr 2020 at 14:16, 1101300123 wrote: > 我知道,我的意思是,upsert自己实现了,我可以直接丢弃false的数据不做处理这样会不会提升部分效率 > > > 在2020年4月28日 14:11,Jark Wu 写道: > UpsertSink 仍然可能会收到 delete 消息的,所以你可以看到 UpsertSink 的输入是 Tuple2 > ,当 false 时代表 delelte,true 时代表 upsert 消息。 > > Best, >

回复: 回复: 回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread 1101300123
我知道,我的意思是,upsert自己实现了,我可以直接丢弃false的数据不做处理这样会不会提升部分效率 在2020年4月28日 14:11,Jark Wu 写道: UpsertSink 仍然可能会收到 delete 消息的,所以你可以看到 UpsertSink 的输入是 Tuple2 ,当 false 时代表 delelte,true 时代表 upsert 消息。 Best, Jark On Tue, 28 Apr 2020 at 14:05, 1101300123 wrote: 我看源码这样写道: /** * Get dialect upsert statement,

Re: 回复: 回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread Jark Wu
UpsertSink 仍然可能会收到 delete 消息的,所以你可以看到 UpsertSink 的输入是 Tuple2 ,当 false 时代表 delelte,true 时代表 upsert 消息。 Best, Jark On Tue, 28 Apr 2020 at 14:05, 1101300123 wrote: > 我看源码这样写道: > /** > * Get dialect upsert statement, the database has its own upsert syntax, > such as Mysql > * using DUPLICATE KEY

回复:回复: 回复: FlinkSQL Upsert/Retraction 写入 MySQL 的问题

2020-04-28 Thread 1101300123
我看源码这样写道: /** * Get dialect upsert statement, the database has its own upsert syntax, such as Mysql * using DUPLICATE KEY UPDATE, and PostgresSQL using ON CONFLICT... DO UPDATE SET.. * * @return None if dialect does not support upsert statement, the writer will degrade to * the use of