Re: flink checkpoint savepoint问题

2020-04-21 Thread Yun Tang
Hi 原因是因为新增字段或者修改字段类型后,新的serializer无法(反)序列化原先存储的数据,对于这种有字段增改需求的场景,目前Flink社区主要借助于Pojo或者avro来实现 [1],建议对相关的state schema做重新规划,以满足这种有后续升级需求的场景。 [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html 祝好 唐云 From: xyq Sent:

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-21 Thread Yun Tang
rs/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L115 [3] https://github.com/apache/flink/blob/99cbaa929ff9f2f5c387cbf4f76a0166f83a3a8c/flink-runtime/src/main/java/org/apache/flink/runtime/state/PartitionableL

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-20 Thread Yun Tang
ts.html#allowing-non-restored-state Best Yun Tang From: Oleg Vysotsky Sent: Tuesday, April 21, 2020 6:45 To: Jacob Sevart ; Timo Walther ; user@flink.apache.org Cc: Long Nguyen ; Gurpreet Singh Subject: Re: Checkpoints for kafka source sometimes get 55 GB si

Re: flink-1.10 checkpoint 偶尔报 NullPointerException

2020-04-20 Thread Yun Tang
Hi 这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null 这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? [1]

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-20 Thread Yun Tang
say some files are not mentioned in the metadata file but are related to the checkpoint? How to judge that they are related to specific checkpoint? BTW, my name is "Yun" which means cloud in Chinese, not the delicious "Yum"  Best Yun Tang F

Re: Modelling time for complex events generated out of simple ones

2020-04-19 Thread Yun Tang
when generating them. I think there is no absolute rules and all depends on your actual scenarios. Best Yun Tang From: Salva Alcántara Sent: Monday, April 20, 2020 2:03 To: user@flink.apache.org Subject: Modelling time for complex events generated out of simple

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-18 Thread Yun Tang
" source? Best Yun Tang From: Jacob Sevart Sent: Saturday, April 18, 2020 9:22 To: Oleg Vysotsky Cc: Timo Walther ; user@flink.apache.org ; Long Nguyen Subject: Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fa

Re: Checkpoint Space Usage Debugging

2020-04-17 Thread Yun Tang
Hi Kent You can view checkpoint details via web UI to know how much checkpointed data uploaded for each operator, and you can compare the state size as time goes on to see whether they upload checkpointed data in stable range. Best Yun Tang From: Kent Murra

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Yun Tang
/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95 Best Yun Tang From: Stephen Patel Sent: Thursday, April 16, 2020 22:30 To: Yun Tang Cc: user@flink.apache.org Subject

Re: 请问有没有什么方法可以把checkpoint打到集群外的hdfs?

2020-04-16 Thread Yun Tang
Hi 如果旧作业开启了incremental checkpoint,并从那边进行恢复的话,需要注意的是旧的checkpoint目录下的文件是不能删除的,这个是incremental checkpoint语义导致的,如果想要切割掉对旧目录的依赖,需要执行一次savepoint,并启动新作业从savepoint进行恢复。 祝好 唐云 From: zhisheng Sent: Thursday, April 16, 2020 16:37 To: user-zh Subject: Re:

Re: Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Yun Tang
().getBroadcastState ? Did you pass a different operator state descriptor each time? Best Yun Tang From: Stephen Patel Sent: Thursday, April 16, 2020 2:09 To: user@flink.apache.org Subject: Streaming Job eventually begins failing during checkpointing I've got a flink (1.8.0

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-15 Thread Yun Tang
could be run well as expected. Best Yun Tang From: Till Rohrmann Sent: Wednesday, April 15, 2020 16:08 To: dev Cc: Eduardo Winpenny Tejedor ; Seth Wiesman ; Niels Basjes ; user Subject: Re: [PROPOSAL] Contribute training materials to Apache Flink Hi David

Re: Quick survey on checkpointing performance

2020-04-15 Thread Yun Tang
async time would not be too large, the most common reason is operator receiving the barrier late which lead to the end-to-end duration large. I hope you could offer the UI of your checkpoint details for further investigation. [1] https://issues.apache.org/jira/browse/FLINK-13390 Best Yun Tang

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-13 Thread Yun Tang
'_metadata' to know which files belonging to that checkpoint. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#directory-structure Best Yun Tang From: Shachar Carmeli Sent: Sunday, April 12, 2020 15:32 To: user

Re: FlinkRuntimeException: Unexpected list element deserialization failure

2020-04-09 Thread Yun Tang
org/apache/flink/contrib/streaming/state/RocksDBListState.java#L151 Best Yun Tang From: anaray Sent: Friday, April 10, 2020 1:25 To: user@flink.apache.org Subject: FlinkRuntimeException: Unexpected list element deserialization failure Hi flink team, I see below

Re: Possible memory leak in JobManager (Flink 1.10.0)?

2020-04-09 Thread Yun Tang
would occupy about 585MB memory, which is close to your observed scenario. >From my point of view, the checkpoint interval of one second is really too >often and would not make much sense in production environment. Best Yun Tang From: Till Rohrmann Sent: Th

Re: Possible memory leak in JobManager (Flink 1.10.0)?

2020-04-08 Thread Yun Tang
/ClusterEntrypoint.java#L260 [2] https://github.com/apache/flink/blob/d7e247209358779b6485062b69965b83043fb59d/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L234 Best Yun Tang From: Marc LEGER Sent: Wednesday

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-08 Thread Yun Tang
Excited to see the stateful functions release! Thanks for the great work of manager Gordon and everyone who ever contributed to this. Best Yun Tang From: Till Rohrmann Sent: Wednesday, April 8, 2020 14:30 To: dev Cc: Oytun Tez ; user Subject: Re: [ANNOUNCE

Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not?

2020-04-07 Thread Yun Tang
e "models" is just a HashMap[(String, String), Model], and I don't know why we need to couple all models to just one specific key. Best Yun Tang From: Salva Alcántara Sent: Sunday, April 5, 2020 20:22 To: user@flink.apache.org Subject: Re: Using MapStat

Re: New kafka producer on each checkpoint

2020-04-07 Thread Yun Tang
/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L871 Best Yun Tang From: Maxim Parkachov Sent: Monday, April 6, 2020 23:16 To: user@flink.apache.org Subject: New kafka producer on each checkpoint Hi everyone, I'm trying to test exactly once functionality

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-07 Thread Yun Tang
/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L96 Best Yun Tang From: Shachar Carmeli Sent: Tuesday, April 7, 2020 16:19 To: user@flink.apache.org Subject: Flink incremental checkpointing - how long does data is kept in the share folder We

Re: 关于使用RocksDBStateBackend TTL 配置的问题

2020-04-03 Thread Yun Tang
Hi 只是配置state.backend.rocksdb.ttl.compaction.filter.enabled 还需要相关的state descriptor也配置上state ttl config,不确定这里所谓的“不理想”的效果是没有及时删除,还是彻底没有删除? 目前RocksDB的后台清理确实需要依赖于compaction的执行,换言之,如果有部分数据一直没有进入compaction,确实存在理论上的可能性一直不会因为过期而删除,但是这个可能性很低不应该对你的使用体验带来很大的影响。 现在的这两种策略是更新时间戳的策略,只要不再访问,到时间都会因为TTL而自动后台清除的。

Re: Latency tracking together with broadcast state can cause job failure

2020-04-01 Thread Yun Tang
Hi Lasse Never meet this problem before, but can you share some exception stack trace so that we could take a look. The simple project to reproduce is also a good choice. Best Yun Tang From: Lasse Nedergaard Sent: Tuesday, March 31, 2020 19:10 To: user

Re: 回复: ProcessWindowFunction中如何有效清除state呢

2020-03-31 Thread Yun Tang
) + c_st),是和这个能有关系吗 --原始邮件------ 发件人:"Yun Tang"

Re: ProcessWindowFunction中如何有效清除state呢

2020-03-31 Thread Yun Tang
Hi 从代码看 if(stateDate.equals("") || stateDate.equals(date)) 无法判断究竟是从哪里获取到stateDate变量赋值,不清楚你这里里面的判断逻辑是否能生效。 其次,state.clear() 之后,再次获取时,返回值会是null,代码片段里面也没有看出来哪里有对这个的校验。 祝好 唐云 From: 守护 <346531...@qq.com> Sent: Tuesday, March 31, 2020 12:33 To: user-zh Subject:

Re: Re:Re: flink savepoint问题

2020-03-30 Thread Yun Tang
Hi 首先,如果这个问题很容易复现的话,我们需要定位到是什么导致了OOMkilled。 1. 打开block-cache usage [1] 观察metrics中block cache的使用量。 2. 麻烦回答一下几个问题,有助于进一步定位 * 单个TM有几个slot * 单个TM的managed memory配置了多少 * 一共声明了多少个keyed state,(如果使用了window,也相当于会使用一个state),其中有多少个map state,是否经常遍历那个map state *

Re: Log file environment variable 'log.file' is not set.

2020-03-29 Thread Yun Tang
understand hdfs paths. [1] https://github.com/apache/flink/blob/ae3b0ff80b93a83a358ab474060473863d2c30d6/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/BootstrapTools.java#L420 Best Yun Tang From: Vitaliy Semochkin Sent: Sunday, March 29

Re: [Third-party Tool] Flink memory calculator

2020-03-29 Thread Yun Tang
Very interesting and convenient tool, just a quick question: could this tool also handle deployment cluster commands like "-tm" mixed with configuration in `flink-conf.yaml` ? Best Yun Tang From: Yangze Guo Sent: Friday, March 27, 2020 18:00 To: user

Re: [Third-party Tool] Flink memory calculator

2020-03-29 Thread Yun Tang
Very interesting and convenient tool, just a quick question: could this tool also handle deployment cluster commands like "-tm" mixed with configuration in `flink-conf.yaml` ? Best Yun Tang From: Yangze Guo Sent: Friday, March 27, 2020 18:00 To: u

Re: savepoint - checkpoint - directory

2020-03-26 Thread Yun Tang
Hi Fanbin To resume from checkpoint, you should provide at least the directory named as /path/chk-x or /path/chk-x/_metadata. The sub-dir named as “shared” is used to store incremental checkpoint content. You could refer to [1] for more information. BTW, stop with savepoint could help reduce

Re: Question about RocksDBStateBackend Compaction Filter state cleanup

2020-03-17 Thread Yun Tang
Yun Tang From: LakeShen Sent: Tuesday, March 17, 2020 15:30 To: dev ; user-zh ; user Subject: Question about RocksDBStateBackend Compaction Filter state cleanup Hi community , I see the flink RocksDBStateBackend state cleanup,now the code like

Re: Question about RocksDBStateBackend Compaction Filter state cleanup

2020-03-17 Thread Yun Tang
Yun Tang From: LakeShen Sent: Tuesday, March 17, 2020 15:30 To: dev ; user-zh ; user Subject: Question about RocksDBStateBackend Compaction Filter state cleanup Hi community , I see the flink RocksDBStateBackend state cleanup,now the code like

Re: Understanding n LIST calls as part of checkpointing

2020-03-08 Thread Yun Tang
lls should not come from Flink if you're using Flink-1.5+ [1] https://issues.apache.org/jira/browse/FLINK-8540 Best Yun Tang From: Piyush Narang Sent: Saturday, March 7, 2020 6:15 To: user Subject: Understanding n LIST calls as part of checkpointing Hi folks,

Re: 关于在读和写频率都很高的情况下怎么优化rocksDB

2020-02-26 Thread Yun Tang
Hi 你的单节点rocksDB state size多大呢?(可以通过打开相关metrics [1] 或者登录到RocksDB所在机器观察一下RocksDB目录的size) 造成反压是如何确定一定是rocksDB 状态大导致的呢?看你的IO情况绝对值很大,但是百分比倒不是很高。是否用jstack观察过TM的进程,看一下是不是task主线程很容易打在RocksDB的get等读操作上。 RocksDB本质上还是面向磁盘的kv存储,如果是每次读写都更新的话,block

Re: The mapping relationship between Checkpoint subtask id and Task subtask id

2020-02-13 Thread Yun Tang
Hi Yes, you are right. Just simply use checkpoint subtask_id -1 would find the corresponding task subtask_id. Best Yun Tang From: 杨东晓 Sent: Friday, February 14, 2020 10:11 To: user Subject: The mapping relationship between Checkpoint subtask id and Task

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Yun Tang
Great work, thanks Gary & Yu ! Best Yun Tang From: Wyatt Chun Sent: Wednesday, February 12, 2020 21:36 To: Yu Li Cc: user Subject: Re: [ANNOUNCE] Apache Flink 1.10.0 released Sounds great. Congrats & Thanks! On Wed, Feb 12, 2020 at 9:31 PM Yu Li ma

Re: Difference between JobManager and JobMaster

2020-01-30 Thread Yun Tang
/jobmaster/JobMaster.java [3] https://github.com/apache/flink/blob/release-1.7/flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala [4] https://issues.apache.org/jira/browse/FLINK-4319 Best Yun Tang From: Lu Weizheng Sent: Friday

Re: Task-manager kubernetes pods take a long time to terminate

2020-01-30 Thread Yun Tang
://github.com/apache/flink/blob/7e1a0f446e018681cb537dd936ae54388b5a7523/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L158 Best Yun Tang From: Li Peng Sent: Thursday, January 30, 2020 9:24 To: user Subject: Task-manager kubernetes

Re: Flink RocksDB logs filling up disk space

2020-01-28 Thread Yun Tang
://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration Best Yun Tang From: Ahmad Hassan Sent: Tuesday, January 28, 2020 17:43 To: user Subject: Re: Flink RocksDB logs filling up disk space

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-27 Thread Yun Tang
Hi Alexey If possible, I think you could move some RDBMS maintenance operations to the #open method of RichFunction to reduce the possibility of blocking processing records. Best Yun Tang From: Alexey Trenikhun Sent: Tuesday, January 28, 2020 15:15 To: Yun

Re: Flink RocksDB logs filling up disk space

2020-01-27 Thread Yun Tang
of checkpoint, even you could avoid to record too many logs, and I don't think current checkpoint configuration is appropriate. Best Yun Tang From: Ahmad Hassan Sent: Monday, January 27, 2020 20:22 To: user Subject: Re: Flink RocksDB logs filling up disk space

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Yun Tang
che/flink/streaming/api/operators/co/CoBroadcastWithKeyedOperator.java#L101 Best Yun Tang From: Jin Yi Sent: Monday, January 27, 2020 14:50 To: Yun Tang Cc: user ; user-zh@flink.apache.org Subject: Re: [State Processor API] how to convert savepoint back to broadcast

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Yun Tang
che/flink/streaming/api/operators/co/CoBroadcastWithKeyedOperator.java#L101 Best Yun Tang From: Jin Yi Sent: Monday, January 27, 2020 14:50 To: Yun Tang Cc: user ; user...@flink.apache.org Subject: Re: [State Processor API] how to convert savepoint back to broadcast

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Yun Tang
Hi Yi Can the official doc of writing broad cast state [1] satisfies your request? [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#broadcast-state-1 Best Yun Tang From: Jin Yi Sent: Thursday, January 23, 2020 8:12

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Yun Tang
Hi Yi Can the official doc of writing broad cast state [1] satisfies your request? [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#broadcast-state-1 Best Yun Tang From: Jin Yi Sent: Thursday, January 23, 2020 8:12

Re: blink(基于flink1.5.1版本)可以使用两个hadoop集群吗?

2020-01-26 Thread Yun Tang
Hi Yong 首先,这封邮件就不要抄送开发者邮件列表了,中文的邮件只需要发中文邮件列表。 Flink当然可以用两个YARN集群,关键在于Flink提交作业到YARN的时候,读取的HADDOP配置是什么,其实官方文档[1] 有相关的介绍,主要是 YARN_CONF_DIR, HADOOP_CONF_DIR or HADOOP_CONF_PATH 这些环境变量的配置是什么,在你提交的终端内配置一个你搭建的集群环境变量即可。 [1]

Re: Blocking KeyedCoProcessFunction.processElement1

2020-01-26 Thread Yun Tang
because your processing logic has some problem to stuck. On the other hand, since processing checkpoint and records hold the same lock, we cannot process checkpoint when the record processing logic did not release the lock. Best Yun Tang From: Alexey Trenikhun Sent

Re: Custom Metrics outside RichFunctions

2020-01-21 Thread Yun Tang
Hi David FlinkKafkaConsumer in itself is RichParallelSourceFunction, and you could call function below to register your metrics group: getRuntimeContext().getMetricGroup().addGroup("MyMetrics").counter("myCounter") Best Yun Tang From: D

Re: Influxdb reporter not honouring the metrics scope

2020-01-21 Thread Yun Tang
, there exists no good solution for Flink-1.9 currently. [1] https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#reporter Best Yun Tang From: Gaurav Singhania Sent: Monday, January 20, 2020 13:04 To: user@flink.apache.org Subject

Re: Flink 1.6, increment Checkpoint, the shared dir stored the last year checkpoint state

2020-01-19 Thread Yun Tang
be removed by checkpoint coordinator but takes too long to complete before job shut down. 3. This file is still useful. This is possible in theory because some specific rocksDB sst file might not be selected during compactions for a long time. Best Yun Tang From

Re: Flink 1.6, increment Checkpoint, the shared dir stored the last year checkpoint state

2020-01-19 Thread Yun Tang
be removed by checkpoint coordinator but takes too long to complete before job shut down. 3. This file is still useful. This is possible in theory because some specific rocksDB sst file might not be selected during compactions for a long time. Best Yun Tang From

Re: Flink 增量 Checkpoint ,容错恢复后,随着时间的推移,之前的 Checkpoint 状态没有清理

2020-01-19 Thread Yun Tang
hdfs:xxx/taskowned > 如果有什么理解错误,请指出,非常感谢。 > > 祝好, > 沈磊 > > Yun Tang 于2020年1月19日周日 下午4:11写道: > >> Hi >> >> 目前Flink的checkpoint目录都会有一个job-id 的子目录,所有chk- 和 shared 目录都在该目录下存储。 >> 如果没有开启增量checkpoint::在确保当前作业的checkpoint有最新完成的情况下,直接删除掉其他job-id的子目录即可。 >> 如

Re: Flink 增量 Checkpoint ,容错恢复后,随着时间的推移,之前的 Checkpoint 状态没有清理

2020-01-18 Thread Yun Tang
Hi 如果你从chk-94040 进行checkpoint恢复的话,这个checkpoint是不会被删除清理的,这个行为是by design的。原因是因为从checkpoint resume在行为上被认为从Savepoint resume行为是一致的,也复用了一套代码 [1],Savepoint的生命周期由用户把控,Flink框架自行不会去删除。 因此,加载的checkpoint被赋予了savepoint的property [2]。 这个CheckpointProperties#SAVEPOINT 里面的 discardSubsumed

Re: 回复: 怎么不使用checkpoint

2020-01-16 Thread Yun Tang
Hi 默认情况下,不调用env.enableCheckpoint,也就是不会启用checkpoint的。默认情况下的restart strategy就是NoRestart,也就是不会自动failover的。 祝好 唐云 获取 Outlook for Android From: sun <1392427...@qq.com> Sent: Friday, January 17, 2020 1:43:53 PM To: user-zh Subject: 回复:

Re: 从checkpoint恢复任务失败

2020-01-16 Thread Yun Tang
Hi 确定每次恢复的时候没有其他异常么,之前有用户遇到是因为其他异常,触发cancel task的逻辑,导致清理了本地下载的文件,所以在进行硬链的时候会遇到no such file的异常。 祝好 唐云 From: claylin <1012539...@qq.com> Sent: Thursday, January 16, 2020 22:00 To: user-zh Subject: 从checkpoint恢复任务失败

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Yun Tang
Hi Stephan, I am +1 for the change which stores timers in RocksDB by default. Some users hope the checkpoint could be completed as fast as possible, which also need the timer stored in RocksDB to not affect the sync part of checkpoint. Best Yun Tang From

Re: Flink Batch mode checkpointing

2020-01-16 Thread Yun Tang
://flink.apache.org/roadmap.html#batch-and-streaming-unification Best Yun Tang From: Soheil Pourbafrani Sent: Friday, January 17, 2020 1:46 To: user Subject: Flink Batch mode checkpointing Hi, While in Streaming mode I'm using the Flink checkpointing and restart strategy, I

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Yun Tang
Congratulations, Dian! Best Yun Tang From: Benchao Li Sent: Thursday, January 16, 2020 22:27 To: Congxian Qiu Cc: d...@flink.apache.org ; Jingsong Li ; jincheng sun ; Shuo Cheng ; Xingbo Huang ; Wei Zhong ; Hequn Cheng ; Leonard Xu ; Jeff Zhang ; user

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Yun Tang
Congratulations, Dian! Best Yun Tang From: Benchao Li Sent: Thursday, January 16, 2020 22:27 To: Congxian Qiu Cc: d...@flink.apache.org ; Jingsong Li ; jincheng sun ; Shuo Cheng ; Xingbo Huang ; Wei Zhong ; Hequn Cheng ; Leonard Xu ; Jeff Zhang ; user

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread Yun Tang
(tolerableDeclinedCheckpointNumber); Best Yun Tang From: jose farfan Sent: Wednesday, January 15, 2020 23:21 To: ouywl Cc: user ; user...@flink.apache.org Subject: Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error . Hi I have the same

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread Yun Tang
(tolerableDeclinedCheckpointNumber); Best Yun Tang From: jose farfan Sent: Wednesday, January 15, 2020 23:21 To: ouywl Cc: user ; user-zh@flink.apache.org Subject: Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error . Hi I have the same

Re: 关于使用Flink RocksDBStateBackend问题

2020-01-14 Thread Yun Tang
Hi 使用自定义options factory的话,我们会认为是高级用户,自然也就完全交由用户进行配置,至于write buffer size如何配置,可以参考PredefinedOptions [1] 的使用方法。 [1]

Re: 咨询一下 RocksDB 状态后端的调优经验

2020-01-13 Thread Yun Tang
Hi Dong RocksDB无论如何都是要使用native内存的,您的YARN pmem-check相比JVM heap的buffer空间是多少,是否合适呢? FLINK-7289的基本所需task都已经完成在release-1.10 分支中了,您可以直接使用release-1.10 分支打包,最近也要发布1.10的rc版本,欢迎试用该功能。 如果你的所有checkpoint size是50GB,其实不是很大,但是如果单个state backend有50GB的话,对于Flink这种低延迟流式场景是稍大的,建议降低单并发state数据量。

Re: 回复:flink checkpoint不生成

2020-01-13 Thread Yun Tang
Hi , 你好 恭喜解决问题,不过关于社区邮件列表的使用有几点小建议: 1. 如果是全中文的邮件,就不要抄送英文社区邮件列表 (u...@flink.apache.org)了,毕竟社区有很多看不懂中文的开发者。中文邮件列表(user-zh)还是很活跃的,相信大家可以一起帮助解决问题。 2. 因为开源社区邮件列表对附件图片支持不友好,建议使用超链接的方式,有助于更快收到回复和解答。 祝好 唐云 From: 起子 Sent: Monday, January 13, 2020 18:01 To:

Re: Using redis cache in flink

2020-01-12 Thread Yun Tang
static variable with atomic reference or synchronization when calling RichFunction#open to initialize and remember to release resources when calling RichFunction#close . Best Yun Tang From: Navneeth Krishnan Sent: Monday, January 13, 2020 11:22 To: Yun Tang Cc

Re: Apache Flink - Sharing state in processors

2020-01-12 Thread Yun Tang
://github.com/apache/flink/blob/7a6ca9c03f67f488e40a114e94c389a5cfb67836/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58 Best Yun Tang From: M Singh Sent: Friday, January 10, 2020 23:29 To: User

Re: Please suggest helpful tools

2020-01-10 Thread Yun Tang
you. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/checkpoint_monitoring.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/back_pressure.html Best Yun Tang From: Eva Eva Sent: Friday, January 10, 2020 10

Re: flink遇到 valueState 自身的 NPE

2020-01-09 Thread Yun Tang
/state/ttl/AbstractTtlDecorator.java#L96 祝好 唐云 From: Kevin Liao Sent: Friday, January 10, 2020 1:08 To: Yun Tang Cc: user-zh@flink.apache.org Subject: Re: flink遇到 valueState 自身的 NPE 谢答,首先贴的代码确实是运行的程序 此外刚刚又通过打印 log 确认了 uniqMark == null 是 false 我现在的怀疑点是这个地方

Re: flink遇到 valueState 自身的 NPE

2020-01-09 Thread Yun Tang
Hi Kevin State TTL 是清理的state中的数据条目entry,不是清理state在map函数中的对象本身。所以无论如何,作为value state对象的uniqMark 是不会因为TTL而变成null的。

Re: flink算子状态查看

2020-01-08 Thread Yun Tang
Hi 没开启Checkpoint但是想知道状态存储的用量的话,对于FsStateBackend来说没有什么好办法;但是对于RocksDBStateBackend来说可以通过开启RocksDB native metrics [1] 的方式来观察memtable 以及 sst文件的 size,来近似估算整体状态存储数据量。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#rocksdb-native-metrics 祝好 唐云

Re: Using redis cache in flink

2020-01-08 Thread Yun Tang
/operators/aggregate/MiniBatchLocalGroupAggFunction.java#L89 From: Navneeth Krishnan Sent: Wednesday, January 8, 2020 15:36 To: Yun Tang Cc: user Subject: Re: Using redis cache in flink Hi Yun, Thanks, the way I want to use redis is like a cache not as state backend

Re: Using redis cache in flink

2020-01-07 Thread Yun Tang
Yun Tang From: Navneeth Krishnan Sent: Wednesday, January 8, 2020 12:33 To: user Subject: Using redis cache in flink Hi All, I want to use redis as near far cache to store data which are common across slots i.e. share data across slots. This data is required

Re: FLINK 不同 StateBackend ProcessWindowFunction的差别

2020-01-07 Thread Yun Tang
Hi 使用iterator.remove() 去除state中已经计算过的数据不是一个标准做法,标准的做法应该是 clear掉相应的state [1] 至于为什么使用MemoryStateBackend会去除数据是因为 get 返回的结果是backend中on heap直接存储的对象[2],存在修改的副作用。 而RocksDB state backend get返回的结果是反序列化的list,而不是RocksDB自身存储的数据 [3],也就不存在修改的副作用了。 [1]

Flink Weekly | 每周社区动态更新 - 2020/01/07

2020-01-06 Thread Yun Tang
大家好 2020年转眼就来了,先恭喜大家新年快乐,Flink社区也会在新的一年中继续陪伴大家,一起来把Flink做大做好。 本周社区主要新闻是Flink-1.10.0的发布进展,将blink planner设置为SQL client默认planner的讨论,以及如何支持SQL client gateway的FLIP。 Flink开发进展 == * [**Release**]

Re: 回复:使用influxdb作为flink metrics reporter

2020-01-05 Thread Yun Tang
; 定制 在2020年01月04日 00:56,Yun Tang<mailto:myas...@live.com> 写道: Hi 张江 * Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention policy. * kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 [1],在Flink-1.9 版本下可以忽略这些异常。 [1] https://issues.apache.org/jira/brow

Re: 使用influxdb作为flink metrics reporter

2020-01-03 Thread Yun Tang
Hi 张江 * Retention policy 需要现在InfluxDB端创建,InfluxDBReporter不会自行创建不存在的 retention policy. * kafka的一些metrics在使用influxDB reporter的时候,会出现一些cast exception,可以参考 [1],在Flink-1.9 版本下可以忽略这些异常。 [1] https://issues.apache.org/jira/browse/FLINK-12147 祝好 唐云 From: 张江

Re: 回复:如何获取算子处理一条数据记录的时间

2020-01-02 Thread Yun Tang
Hi,张江 Flink官方支持追踪record的latency,你可以参考[1] 启用这个功能,不过这个功能会极大地降低你的处理性能,只能用作debug使用。 如果想知道真实使用场景下的性能指标,可以参考latency的metrics [2] 来衡量operator的处理性能。 [1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#latency-tracking [2]

Re: Submit high version compiled code jar to low version flink cluster?

2019-12-30 Thread Yun Tang
/FLINK-13910 Best Yun Tang From: tison Sent: Monday, December 30, 2019 15:44 To: wangl...@geekplus.com.cn Cc: user Subject: Re: Submit high version compiled code jar to low version flink cluster? It possibly fails with incompatibility. Flink doesn't promise

Re: Connect RocksDB which created by Flink checkpoint

2019-12-30 Thread Yun Tang
. The RocksDB folder lies in Flink temporary dir [1] which looks like flink-io- and the configuration is located in the file named as 'LOG' with RocksDB directory. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#io-tmp-dirs Best Yun Tang

Re: Exactly-once ambiguities

2019-12-29 Thread Yun Tang
system should do. If you really need to the exactly once in event-time processing in this scenario, I suggest to run a batch job later to consume all data source and use that result as a credible one. For processing-time data, use Flink to generate a credible result is enough. Best Yun Tang

Re: yarn per job 模式这个报错原因是什么?随机出现

2019-12-20 Thread Yun Tang
Hi 这个异常是因为无法绑定随机端口,在出问题的JM机器上检查一下 netstat,看是不是有大量的连接占用了很多端口。一般这种问题都是因为大量对外连接未关闭导致的,找到是什么类型的进程占用了大量端口。 祝好 唐云 From: rockey...@163.com Sent: Friday, December 20, 2019 15:04 To: user-zh Subject: yarn per job 模式这个报错原因是什么?随机出现 嗨,大家好,flink per

Re: MapState with List Type for values

2019-12-18 Thread Yun Tang
/runtime/operators/join/stream/StreamingJoinOperator.java#L182 Best Yun Tang From: Aaron Langford Sent: Thursday, December 19, 2019 2:22 To: Yun Tang Cc: user@flink.apache.org Subject: Re: MapState with List Type for values So the suggestion as I read

Re: MapState with List Type for values

2019-12-18 Thread Yun Tang
us map state as < , 1> . If you could follow this logic, the previous serialize/deserialize of Seq could be greatly reduced. Best Yun Tang From: Aaron Langford Sent: Wednesday, December 18, 2019 6:47 To: user@flink.apache.org Subject: MapState with List Type

Re: Restore metrics on broadcast state after restart

2019-12-18 Thread Yun Tang
/state.html#checkpointedfunction Best Yun Tang From: Gaël Renoux Sent: Tuesday, December 17, 2019 23:22 To: user Subject: Restore metrics on broadcast state after restart Hi everyone I have an KeyedBroadcastProcessFunction with a broadcast state (a bunch

Re: Scala case class TypeInformation and Serializer

2019-12-11 Thread Yun Tang
Hi Would you please give related code? I think it might due to insufficient hint to type information. Best Yun Tang From: 杨光 Date: Wednesday, December 11, 2019 at 7:20 PM To: user Subject: Scala case class TypeInformation and Serializer Hi, I'm working on write a flink stream job

Re: flink savepoint checkpoint

2019-12-10 Thread Yun Tang
Hi Checkpoint 是自动的,你可以配置retain checkpoint[1] 然后从checkpoint 恢复[2],可以不需要一定触发Savepoint。 [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints [2]

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Yun Tang
Sure, /opt/flink/conf is mounted as a volume from the configmap. Best Yun Tang From: Li Peng Date: Wednesday, December 11, 2019 at 9:37 AM To: Yang Wang Cc: vino yang , user Subject: Re: Flink on Kubernetes seems to ignore log4j.properties 1. Hey Yun, I'm calling /opt/flink/bin/standalone

Re: Flink State 过期清除 TTL 问题

2019-12-10 Thread Yun Tang
ink run -s ,从已有 savepoint 目录中恢复的数据所有的 updateTime 都变成了当前时间? 谢谢, 王磊 wangl...@geekplus.com.cn Sender: Yun Tang Send Time: 2019-11-01 01:38 Receiver: user-zh@flink.apache.org Subject: Re: Flink State 过期清除 TTL 问题 Hi 王磊 从你的配置以及使用Flink-1.7

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Yun Tang
the taskmanager. [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html [2] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#session-cluster-resource-definitions Best Yun Tang From: Li Peng Date: Tuesday, December 10, 2019 at 10:09

Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not?

2019-12-02 Thread Yun Tang
regression as we need more steps here. Best Yun Tang On 12/2/19, 9:29 PM, "Salva Alcántara" wrote: Hi Yun, Thanks for your reply. You mention that " ‘snapshotState’ and ‘initializeState’ interfaces are used mainly to snapshot and initialize for

Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not?

2019-12-01 Thread Yun Tang
/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java#L324 Best Yun Tang From: Congxian Qiu Date: Monday, December 2, 2019 at 10:41 AM To: Salva Alcántara Cc: user Subject: Re: Using MapState clear, put methods in snapshotState within KeyedCoProcessFunction, valid or not? Hi From

Re: 回复: 回复: 本地checkpoint 文件190G了

2019-12-01 Thread Yun Tang
Hi 为什么你知道本地checkpoint文件达到190GB了,具体是哪个目录撑到了190GB? 如果没有启用 state.backend.local-recovery: * 使用FsSateBackend/Memory StateBackend, 本地不应该有什么checkpoint文件残留,因为执行checkpoint时,直接写HDFS了 * 使用 RocksDB state backend,无论是否开启incremental

Re: How to recover state from savepoint on embedded mode?

2019-11-25 Thread Yun Tang
What is the embedded mode mean here? If you refer to SQL embedded mode, you cannot resume from savepoint now; if you refer to local standalone cluster, you could use `bin/flink run -s` to resume on a local cluster. Best Yun Tang From: Reo Lei Date: Tuesday, November 26, 2019 at 12:37 AM

Re: flink session cluster ha on k8s

2019-11-25 Thread Yun Tang
-release-1.9/ops/jobmanager_high_availability.html#config-file-flink-confyaml [2] https://issues.apache.org/jira/browse/FLINK-11105 Best Yun Tang From: 曾祥才 Date: Monday, November 25, 2019 at 9:28 AM To: User-Flink Subject: flink session cluster ha on k8s hi, is there any example about ha on k8s

Re: Savepoints and checkpoints

2019-11-21 Thread Yun Tang
/flink/flink-docs-stable/ops/state/savepoints.html#what-happens-if-i-delete-an-operator-that-has-state-from-my-job Best Yun Tang From: "min@ubs.com" Date: Thursday, November 21, 2019 at 5:19 PM To: "user@flink.apache.org" Subject: Savepoints and checkpoints Hi,

Re: Cron style for checkpoint

2019-11-21 Thread Yun Tang
Hi Shuwen Conceptually, checkpoints in Flink behaves more like a system mechanism to achieve fault tolerance and transparent for users. On the other hand, savepoint in Flink behaves more like a user control behavior, can savepoint not satisfy your demands for crontab? Best Yun Tang From

Re: RocksDB state on HDFS seems not being cleanned up

2019-11-18 Thread Yun Tang
Yes, state processor API cannot read window state now, here is the track of this issue [1] [1] https://issues.apache.org/jira/browse/FLINK-13095 Best Yun Tang From: shuwen zhou Date: Monday, November 18, 2019 at 12:31 PM To: user Subject: Fwd: RocksDB state on HDFS seems not being cleanned

Re: Monitor rocksDB memory usage

2019-11-07 Thread Yun Tang
Hi Lu I think RocksDB native metrics [1] could help. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#rocksdb-native-metrics Best Yun Tang From: Lu Niu Date: Friday, November 8, 2019 at 8:18 AM To: user Subject: Monitor rocksDB memory usage Hi, I read

Re: 广播状态是否可以设置ttl过期时间

2019-11-07 Thread Yun Tang
Hi Broadcast State 可以看做一种operator state,只能在DefaultOperatorStateBackend里面创建 [1],TTL state目前仅是对keyed state来说的 [2]。 [1] https://github.com/apache/flink/blob/809533e5b5c686e2d21b64361d22178ccb92ec26/flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultOperatorStateBackend.java#L149

Re: RocksDB state on HDFS seems not being cleanned up

2019-11-07 Thread Yun Tang
Yes, just sum all file size within checkpoint meta to get the full checkpoint size (this would omit some byte stream state handles, but nearly accurate). BTW, I think user-mail list is the better place for this email-thread, already sent this mail to user-mail list. Best Yun Tang From: shuwen

<    1   2   3   4   5   6   >