回复: 回复:flink作业提交到集群执行异常

2019-11-04 Thread Zhong venb
这个文件不存在!!!不知道怎么来的,我重新提交了一下也是找不到报错里提到的这个临时文件的路径。 -邮件原件- 发件人: 李军 发送时间: 2019年11月5日 15:12 收件人: user-zh@flink.apache.org 主题: 回复:flink作业提交到集群执行异常 file:

Re: low performance in running queries

2019-11-04 Thread Piotr Nowojski
Hi, Unfortunately your VisualVM snapshot doesn’t contain the profiler output. It should look like this [1]. > Checking the timeline of execution shows that the source operation is done in > less than a second while Map and Reduce operations take long running time. It could well be that the

Re: Flink SQL GroupBy Excption

2019-11-04 Thread Terry Wang
Hi, Polarisary~ The reason should be that `uid, device_id` can not be automatically derived from the type of your kafka actionStremt, you should check it and make sure actionStream returns suitable type. Best, Terry Wang > 2019年11月5日 15:11,Polarisary 写道: > > Hi ALL, > I have a problem

回复:flink作业提交到集群执行异常

2019-11-04 Thread 李军
file: '/tmp/blobStore-0d69900e-5299-4be9-b3bc-060d06559034/job_e8fccb398c2d9de108051beb06ec64cc/blob_p-d1e0b6ace7b204eb42f56ce87b96bff39cc58289-d0b8e666fc70af746ebbd73ff8b38354' (valid JAR) 这个文件是不是和其他节点的不一样 在2019年11月5日 15:04,Zhong venb 写道: Hi,

Flink SQL GroupBy Excption

2019-11-04 Thread Polarisary
Hi ALL, I have a problem when use flink sql, my code like this: ``` tEnv.registerDataStream(“mytable", actionStream, "uid, device_id, rowtime.rowtime”); ``` actionStream is kafka consumer,but this can not run,Exception as follow: ```

flink作业提交到集群执行异常

2019-11-04 Thread Zhong venb
Hi, 现在遇到个问题:Flink消费kafka作业在IDEA上编译执行正常,但是打包后发布到集群上运行报错,已将对应的jar包放到flink的lib路径下了,提交作业无报错! 请大神帮忙分析一下原因,谢谢!!! 环境如下: Flink:1.7.2 Kafka:1.1.0 Scala:2.11.8 报错信息如下: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class:

Re: Mac操作系统下Ask timed out问題

2019-11-04 Thread tison
这个问题其实还挺常见的,问题有很多种可能。比如你看一下 log 下面 cluster 的日志看看 Dispatcher 有没有正常的起起来,flink-conf 你有没有改过导致超时过短(比如 1 ms)或者 资源不够。也有升级 jdk 小版本后就不复现的。 Best, tison. jeff kit 于2019年11月5日周二 下午2:43写道: > 你好。 > 我本地的Flink是官网提供的Binary包,非自己编译的。 > 我相信我的情况是少数,绝大多数人的Mac都是能跑的。 > > On Tue, Nov 5, 2019 at 2:24 PM Biao Liu

Re: Mac操作系统下Ask timed out问題

2019-11-04 Thread jeff kit
你好。 我本地的Flink是官网提供的Binary包,非自己编译的。 我相信我的情况是少数,绝大多数人的Mac都是能跑的。 On Tue, Nov 5, 2019 at 2:24 PM Biao Liu wrote: > 你好, > > MacOS 可以跑 Flink,我自己刚试了下,复制你的命令就可以跑。 > 建议再查一下你本地的环境,你本地的 Flink 是自己编译的吗?如果不行试一下 Flink 提供的 binary 包 [1]? > > [1] https://flink.apache.org/downloads.html > > Thanks, > Biao

Re: Mac操作系统下Ask timed out问題

2019-11-04 Thread Biao Liu
你好, MacOS 可以跑 Flink,我自己刚试了下,复制你的命令就可以跑。 建议再查一下你本地的环境,你本地的 Flink 是自己编译的吗?如果不行试一下 Flink 提供的 binary 包 [1]? [1] https://flink.apache.org/downloads.html Thanks, Biao /'bɪ.aʊ/ On Tue, 5 Nov 2019 at 12:30, jeff kit wrote: > HI,大家好: > 我在运行Flink官方的Quick >

Re: 在 Trigger 里可以知道 Window 中数据的状况吗

2019-11-04 Thread Utopia
不好意思没有描述清楚,我们业务场景是需要使用 SessionWindow的,不知道能不能在 Trigger 中获取当前 Window 中元素的。 Best  regards Utopia 2019年11月5日 +0800 14:16 Biao Liu ,写道: > 你好, > > countWindow [1] 能满足你的需求吗? > > [1] >

Re: 在 Trigger 里可以知道 Window 中数据的状况吗

2019-11-04 Thread Biao Liu
你好, countWindow [1] 能满足你的需求吗? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/streaming/api/datastream/KeyedStream.html#countWindow-long- Thanks, Biao /'bɪ.aʊ/ On Tue, 5 Nov 2019 at 14:01, Utopia wrote: > 大家好, > > 我想根据 Window 中数据的信息,比如数据的数量来决定是否

Fwd: 从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread Biao Liu
Hi, this topic should be sent to user-zh mailing list. Just forward there. Thanks, Biao /'bɪ.aʊ/ -- Forwarded message - From: Yun Tang Date: Tue, 5 Nov 2019 at 13:20 Subject: Re: 从 state 中恢复数据,更改 yarn container 个数会有影响吗 To: wangl...@geekplus.com.cn , user <

Fwd: 从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread Biao Liu
Hi, this topic should be sent to user-zh mailing list. Just forward there. Thanks, Biao /'bɪ.aʊ/ -- Forwarded message - From: Yun Tang Date: Tue, 5 Nov 2019 at 13:20 Subject: Re: 从 state 中恢复数据,更改 yarn container 个数会有影响吗 To: wangl...@geekplus.com.cn , user <

Re: 从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread Yun Tang
Hi 首先先判断作业是否在不断地failover,是否有“maximum parallelism” 相关的异常,如果有,说明因为改了并发度而不兼容,实际作业一直都没有从checkpoint正常恢复。 如果作业成功地从checkpoint恢复了,再判断是不是因为task端正在因为正在改并发而导致恢复数据中,如果你的state比较大,这一步骤可能会比较耗时,一般这种情况是source端消费了数据,但是无法向下游发送,整个作业看上去像是一直卡在那边。可以通过task端的jstak看调用栈,看是否有restore相关的栈hang住。

Mac操作系统下Ask timed out问題

2019-11-04 Thread jeff kit
HI,大家好: 我在运行Flink官方的Quick Start就遇到了问題。为了避免自己问蠢问題,我先做了很多尝试,如换Flink的版本,从1.7到1.8及至1.9都试过,在我自己的Mac OS X上这个问題是必然出现的,而换到其他操作系统例如Windows,则是正常的。 这也许不是一个常见的问題,更多是我本机的运行环境问題,但多天尝试下来仍然没有找到解决方法,才在这里求助一下。 操作步骤: 1. ./bin/start-cluster.sh # 启动flink。 2. ./bin/flink run examples/batch/WordCount.jar #

从 state 中恢复数据,更改 yarn container 个数会有影响吗

2019-11-04 Thread wangl...@geekplus.com.cn
从 RocketMQ 中消费数据做处理。 代码中最大的并行度为 8, 提交任务时指定 -ys 4 ,会自动分配 2 个 container 作为 taskMgr 运行一段时间后以 savepoint 方式停止。 再从 savepoint 恢复,此时指定 -ys 2 , 会分配 4 个container 作为 taskMgr , 但任务提交后程序不从 RocketMQ 消费数据了,消费 TPS 一直是 0,这是什么原因呢? 谢谢, 王磊 wangl...@geekplus.com.cn

Re: [metrics] metrics 中 Availability 和 Checkpointing 这两组没有显示

2019-11-04 Thread Biao Liu
你好, JM 的 metric 应该也会直接 report。 可以考虑缩小下问题范围,是 metrics 还是 reporter 的问题。 例如加个 slf4j reporter [1],看下 JM log 中有没有相应的 metrics,如果有那就是 reporter 的问题。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/metrics.html#slf4j-orgapacheflinkmetricsslf4jslf4jreporter Thanks, Biao

Re: Re: 怎样把 state 定时写到外部存储

2019-11-04 Thread Biao Liu
你好, 对你的问题描述有一些疑问 > 每一条消息来都会更改 state 值,如果每一条消息来都写外部存储下游撑不住 > 有没有什么方式可以定期读 state 写到外部存储? 这里是什么意思呢?更改 state 值和写外部系统存储应该是两个独立的事件。state 是 Flink 内部使用的,给外部系统使用的数据一般通过 sink 写出去,和 state 没有直接关系。 从你的描述中,只看到貌似是写 Mysql (是通过 sink 吗?) 扛不住。批量写一下?比如在 sink 中处理一下 如果没理解对你的问题,你可以再详细描述一下 Thanks, Biao /'bɪ.aʊ/

Re: Checkpoint in FlinkSQL

2019-11-04 Thread vino yang
Hi Simon, Absolutely, yes. Before using Flink SQL, you need to initialize a StreamExecutionEnvirnoment instance[1], then call StreamExecutionEnvirnoment#setStateBackend or StreamExecutionEnvirnoment#enableCheckpointing to specify the information what you want. [1]:

Re: Checkpoint in FlinkSQL

2019-11-04 Thread Yun Tang
Hi Simon If you are using table API, you could set state backend via environment like `env.setStateBackend()` If you just launch a cluster with SQL-client, you could configure state backend and checkpoint options [1] within `flink-conf.yaml` before launching the cluster . [1]

Checkpoint in FlinkSQL

2019-11-04 Thread Simon Su
Hi All Does current Flink support to set checkpoint properties while using Flink SQL ? For example, statebackend choices, checkpoint interval and so on ... Thanks, SImon

Re: flink SQL UpsertTable 语义问题

2019-11-04 Thread Wenlong Lyu
你可以试试在sink上攒个小batch,大部分retract 和后面的add都能合并消除掉,不会对下游产生压力 > 在 2019年11月5日,上午10:18,hb <343122...@163.com> 写道: > > > > 是因为Retract, 相比于upsert会发送很多false的 中间数据, 想用 upsert 对下游(k/v系统)效率高些, > 这个bug,下个版本会修复么? > > 在 2019-11-05 09:07:56,"Wenlong Lyu" 写道: >> Hi, >>

Re:Re: flink SQL UpsertTable 语义问题

2019-11-04 Thread hb
是因为Retract, 相比于upsert会发送很多false的 中间数据, 想用 upsert 对下游(k/v系统)效率高些, 这个bug,下个版本会修复么? 在 2019-11-05 09:07:56,"Wenlong Lyu" 写道: >Hi, >hb,这个是因为优化器目前的bug,只考虑了sink是upsertSink,没有考虑到还有filter,最后优化的结果变成agg没有发出retract(1,3)的消息了,你可以把你的sink改成RetractSink应该就不会有问题了。 > >> 在 2019年11月4日,上午11:06,hb

Re: flink SQL UpsertTable 语义问题

2019-11-04 Thread Wenlong Lyu
Hi, hb,这个是因为优化器目前的bug,只考虑了sink是upsertSink,没有考虑到还有filter,最后优化的结果变成agg没有发出retract(1,3)的消息了,你可以把你的sink改成RetractSink应该就不会有问题了。 > 在 2019年11月4日,上午11:06,hb <343122...@163.com> 写道: > > SQL 如下: > INSERT INTO upsertTable > SELECT * FROM ( > SELECT cnt0 as id, count(id) as cnt FROM >

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Becket Qin
Hi Thomas, Event time alignment is absolutely one of the important considerations of FLIP-27. That said, we are not implementing that in FLIP-27, but just make sure such feature can be easily added in the future. The design was to make the communication between SplitEnumerator and SourceReader

FLINK WEEKLY 2019/44

2019-11-04 Thread tison
FLINK WEEKLY 2019/44 用户问题 Flink State 过期清除 TTL 问题 回答包括了相关配置的设置和不同设置对应的清理时机 如何过滤异常的timestamp?

Ordered events in broadcast state

2019-11-04 Thread Filip Niksic
Hi all, The documentation for the broadcast state explicitly says that the order of broadcast events may differ across tasks, so the state updates should not depend on a particular order. [1] But what to do in the use cases where the order matters? Is there some way to enforce the order even at

Re: Using RocksDB as lookup source in Flink

2019-11-04 Thread Yun Tang
Hi Srikanth As RocksDB is a single node DB which just like InfluxDB, I recommend you could refer to an implementation of InfluxDB sink. [1] [1] https://github.com/apache/bahir-flink/tree/master/flink-connector-influxdb Best Yun Tang From: OpenInx Date: Monday, November 4, 2019 at 6:28 PM

Re: Finite source without blocking save-points

2019-11-04 Thread bupt_ljy
Oh the parallelism problem didn’t bother me because we used to set the parallelism of rule source to be one :o). Maybe a more elegant way is hashing the rule emitting by #RuntimeContext#getIndexOfThisSubtask. Best, Jiayi Liao Original Message Sender: Gaël Renoux Recipient: bupt_ljy Cc:

Re: Finite source without blocking save-points

2019-11-04 Thread Gaël Renoux
Hi Jiayi, This would allow me to call the Kafka producer without risking a race condition, but it comes with its own problem: unless the source has a parallelism of 1, it will trigger multiple times. I can create a specific source that doesn't produce anything, has a parallelism of 1, and calls

Re: Finite source without blocking save-points

2019-11-04 Thread bupt_ljy
Hi Gael, I had a similar situation before. Actually you don’t need to accomplish this in such a complicated way. I guess you’ve already had a rules source and you can send rules in #open function for a startup if your rules source inherit from #RichParallelSourceFunction. Best, Jiayi Liao

Finite source without blocking save-points

2019-11-04 Thread Gaël Renoux
Hello everyone, I have a job which runs continuously, but it also needs to send a single specific Kafka message on startup. I tried the obvious approach to use StreamExecutionEnvironment.fromElements and add a Kafka sink, however that's not possible: the source being finished, it becomes

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Thomas Weise
Hi Becket, Thanks for the reply, it is good to know that there is activity on FLIP-27. A while ago I was wondering if event time alignment is on the radar [1], can you please clarify that? There is a parallel discussion of adding it to the existing Kafka consumer [2], could you please take a

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #2

2019-11-04 Thread Becket Qin
Hi Steven and Thomas, Sorry about missing the update of FLIP-27. I am working on the implementation of FLIP-27 at this point. It is about 70% done. Right now I am integrating the source coordinator to the job master. Hopefully I can get the basics of Kafka connector work from end to end by this

Re: low performance in running queries

2019-11-04 Thread Habib Mostafaei
Hi, On 11/1/2019 4:40 PM, Piotr Nowojski wrote: Hi, More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful. Enclosed is a snapshot of VisualVM. From the attached

Re: is Flink a database ?

2019-11-04 Thread Piotr Nowojski
Hi :) What do you mean by “a database”? A SQL like query engine? Flink is already that [1]. A place where you store the data? Flink kind of is that as well [2] and many users are using Flink as the source of truth, not just as a data processing framework. With Flink Table API/SQL [1], you can

is Flink a database ?

2019-11-04 Thread Hanan Yehudai
This seems like a controversial subject.. on purpose  I have my data lake in parquet files – should I use Flink batch mode to query historical batch ad Hoc queries ? or should I use a dedicated “database” eg Drill / Dremio / Hiveand their likes ? what advantage will Flink give me

Re: Using RocksDB as lookup source in Flink

2019-11-04 Thread OpenInx
Hi The Kafka table source & sink connector has been implemented (at least flink1.9 support this), but the RocksDB connector not support yet, you may need to implement it by yourself. Here[1] we have a brief wiki to show what interfaces we need to implement, but seems it's not detailed enough

Using RocksDB as lookup source in Flink

2019-11-04 Thread srikanth flink
Hi there, Can someone help me implement Flink source Kafka to Flink Sink RocksDB, while I could use UDF for lookup RocksDB in SQL queries? Context: I get a list of IPaddresses in a stream which I wish to store in RocksDB. Therefore the other stream perform a lookup to match the IPaddress.

Re: Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED

2019-11-04 Thread OpenInx
Hi I met the same problem before. After some digging, I find that the idea will detect the JDK version and choose whether to use the jdk11 option to run the flink maven building. if you are in jdk11 env, then it will add the option --add-exports when maven building in IDEA. For my case, I was

Re: Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED

2019-11-04 Thread Till Rohrmann
Try to reimport that maven project. This should resolve this issue. Cheers, Till On Mon, Nov 4, 2019 at 10:34 AM 刘建刚 wrote: > Hi, I am using flink 1.9 in idea. But when I run a unit test in idea. > The idea reports the following error:"Error:java: 无效的标记: >

Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED

2019-11-04 Thread 刘建刚
Hi, I am using flink 1.9 in idea. But when I run a unit test in idea. The idea reports the following error:"Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED". Everything is ok when I use flink 1.6. I am using jdk 1.8. Is it related to the java version?

Re: [DISCUSS] Semantic and implementation of per-job mode

2019-11-04 Thread tison
Hi Peter, I checked out you proposal FLIP-85 and think that we are in the very similar direction. For any reason in your proposal we can create PackagedProgram in server side and thus if we can configure environment properly we can directly invoke main method. In addition to your design

回复: Checkpoint failed all the time

2019-11-04 Thread sllence
Thanks -邮件原件- 发件人: Yun Tang 发送时间: 2019年11月4日 15:26 收件人: user-zh@flink.apache.org 主题: Re: Checkpoint failed all the time Sure, this feature has been implemented in FLINK-12364 [1], all you need do is set the tolerable checkpoint failure numbers via like