Re: Re: flink hive Streaming查询不到数据的问题

2020-10-28 Thread Jingsong Li
注意时区哦,SQL层默认使用UTC的long值 On Thu, Oct 29, 2020 at 12:12 PM hdxg1101300...@163.com < hdxg1101300...@163.com> wrote: > 我把sink.partition-commit.trigger 设置成process-time 可以看到数据; > 但是我后来设置source 产生出watermark 还是不行; > val dataStream = streamEnv.addSource(new MySource) > >

Re: Re: flink hive Streaming查询不到数据的问题

2020-10-28 Thread hdxg1101300...@163.com
我把sink.partition-commit.trigger 设置成process-time 可以看到数据; 但是我后来设置source 产生出watermark 还是不行; val dataStream = streamEnv.addSource(new MySource) .assignTimestampsAndWatermarks(WatermarkStrategy.forMonotonousTimestamps[UserInfo]() .withTimestampAssigner(new

Re: flink1.11 kafka connector

2020-10-28 Thread Jark Wu
目前还不支持,可以去社区开个 issue,看能不能赶上1.12 Best, Jark On Thu, 29 Oct 2020 at 11:26, Dream-底限 wrote: > hi、 > 我看了一下官方提供的kafka sink,对于数据发送方式为两种:对于第二种情况,有办法保证对于指定主键的变化过程发送到同一个kafka > partiiton吗,或者说社区对于原生hash(key)到kafka分区映射的方式有支持计划吗 > >- fixed:每个Flink分区最多只能有一个Kafka分区。 >-

Re: How to deploy dynamically generated flink jobs?

2020-10-28 Thread Alexander Bagerman
I did try it but this option seems to be for a third party jar. In my case I would need to specify/ship a jar that contains the code where job is being constracted. I'm not clear of 1. how to point to the containg jar 2. how to test such a submission from my project running in Eclipse Alex On

Re: How to deploy dynamically generated flink jobs?

2020-10-28 Thread Yun Gao
Hi Alexander, The signature of the createRemoteEnvironment is public static StreamExecutionEnvironment createRemoteEnvironment( String host, int port, String... jarFiles); Which could also ship the jars to execute to remote cluster. Could you have a try to also pass the jar files to the

Re: No pooled slot available and request to ResourceManager for new slot failed

2020-10-28 Thread Yangze Guo
Hi, 你job的并发是多少?一共请求了多少个slot? 方便的话最好发一下jm的日志来帮助排查 Best, Yangze Guo On Thu, Oct 29, 2020 at 10:07 AM marble.zh...@coinflex.com.INVALID wrote: > > 大家好。 > > 只有一个job,设置了jm/tm各总内存为3G,一个taskmanager,总共10个slot,为什么还是报这个错? > > Caused by: >

How to deploy dynamically generated flink jobs?

2020-10-28 Thread Alexander Bagerman
Hi, I am trying to build a functionality to dynamically configure a flink job (Java) in my code based on some additional metadata and submit it to a flink running in a session cluster. Flink version is 1.11.2 The problem I have is how to provide a packed job to the cluster. When I am trying the

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Jingsong Li
+1 to remove the Bucketing Sink. Thanks for the effort on ORC and `HadoopPathBasedBulkFormatBuilder`, I think it's safe to get rid of the old Bucketing API with them. Best, Jingsong On Thu, Oct 29, 2020 at 3:06 AM Kostas Kloudas wrote: > Thanks for the discussion! > > From this thread I do

Re: How to understand NOW() in SQL when using Table & SQL API to develop a streaming app?

2020-10-28 Thread Jark Wu
issue created: https://issues.apache.org/jira/browse/FLINK-19861 On Wed, 28 Oct 2020 at 11:00, Danny Chan wrote: > Our behavior also conflicts with the SQL standard, we should also mention > this in the document. > > Till Rohrmann 于2020年10月27日周二 下午10:37写道: > >> Thanks for the clarification.

No pooled slot available and request to ResourceManager for new slot failed

2020-10-28 Thread marble.zh...@coinflex.com.INVALID
大家好, 我已经分配了8个taskmanager.numberOfTaskSlots,但还是遇到如下exception, 我为job/task分配了每个3G的总内存。有没有什么建议?, 谢谢 Caused by: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request 294b93e601744edd7be66dec41e8d8ed. Requested resource profile

No pooled slot available and request to ResourceManager for new slot failed

2020-10-28 Thread marble.zh...@coinflex.com.INVALID
大家好。 只有一个job,设置了jm/tm各总内存为3G,一个taskmanager,总共10个slot,为什么还是报这个错? Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: No pooled slot available and request to ResourceManager for new slot failed ... 27 more 有没有一些建议,谢谢。 -- Sent from:

flink??????????????????

2020-10-28 Thread ????????????
?? Linuxflink??31.9.3 ??1?? ./start-cluster.sh Name or service not knownname ce-hjjcgl-svr-02??21??3?? [appuser@ce-hjjcgl-svr-01 bin]$ ./start-cluster.sh Starting cluster.

维表选择

2020-10-28 Thread zjfpla...@hotmail.com
Hi, 请问各位,维表选择问题:Temporal table和RichAsyncFunction各自适用领域,优缺点,以及任务停止后启动,sv,cv会不会有什么问题 最好有生产环境的实际使用情况来说下,非常感谢 zjfpla...@hotmail.com

Re: Kubernetes Job Cluster, does it autoterminate?

2020-10-28 Thread Matthias Pohl
Hi Ruben, thanks for reaching out to us. Flink's native Kubernetes Application mode [1] might be what you're looking for. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application On Wed, Oct 28, 2020 at

Re: Could you add some example about this document? Thanks`1

2020-10-28 Thread Robert Metzger
Hi, from the messages you've sent on the user@ mailing list in the recent weeks, I see that you are in the process of learning Flink. The Flink community won't be able to provide you with full, runnable examples for every method Flink provides. Rather, we have a few running examples, and

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Kostas Kloudas
Thanks for the discussion! >From this thread I do not see any objection with moving forward with removing the sink. Given this I will open a voting thread tomorrow. Cheers, Kostas On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen wrote: > > +1 to remove the Bucketing Sink. > > It has been very

Re: [EXTERNAL] Re: Native K8S Jobmanager restarts and job never recovers

2020-10-28 Thread Bohinski, Kevin
Hi Yang, Thanks again for all the help! We are still seeing this with 1.11.2 and ZK. Looks like others are seeing this as well and they found a solution https://translate.google.com/translate?hl=en=zh-CN=https://cloud.tencent.com/developer/article/1731416=search Should this solution be added

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Stephan Ewen
+1 to remove the Bucketing Sink. It has been very common in the past to remove code that was deprecated for multiple releases in favor of reducing baggage. Also in cases that had no perfect drop-in replacement, but needed users to forward fit the code. I am not sure I understand why this case is

Re:LocalBufferPoo死锁

2020-10-28 Thread hailongwang
Hi, 这个应该是下游算子有压力,可以根据 Inpool 指标查看哪个算子有瓶颈,然后对应的进行处理。 Best, Hailong Wang 在 2020-10-27 18:57:55,"1548069580" <1548069...@qq.com> 写道 >各位好: >最近遇到一个问题,上游有反压的情况下任务运行一段时间后出现上下游数据均停滞的情况,通过jstack命令发现,source算子阻塞了,同时观察到下游也在等待数据。堆栈如下: >"Legacy Source Thread - Source: Custom Source (1/2)" #95 prio=5

Re:flinkSQL针对join操作设置不同窗口

2020-10-28 Thread hailongwang
Hi s_hongliang, 1、如果用 DataStream API 的话,可以需要使用 State 对需要被关联的表进行存储,并且设置 TTL。 2、如果使用 SQL 的话: 2.1、可以将需要被关联的数据存入Hbase 或者 Mysql,然后保证只有当天的数据,在 SQL 中使用 Temporal Table[1] 关联。 2.2、使用 temporal-table-function[2] ,设置StateRetentionTime同时过滤掉关联上昨天的数据。 [1]

Re:关于并行下watermark不生成

2020-10-28 Thread hailongwang
Hi BenChen, 1. 可以保证需要 watermark 算子之前的算子和前面的算子不是 Forward 。 2. 如果是自己实现的 Connector 的话,可能定时检测调用 SourceFunction#markAsTemporarilyIdle 来标记为 idle,我看目前 Kafka 是刚启动时候进行检测。 Best, Hailong Wang 在 2020-10-28 17:54:22,"BenChen" 写道: >Hi

Re:tumbling的窗口更新随着job运行时间越长,delay越久,sliding不会

2020-10-28 Thread hailongwang
Hi marble, 看到你是在 window 内一直使用 agg 累加的,所以可以使用 filesystem backend 加速,但是可能内存会相对耗的比较多。因为rocksdb backend的话,每一条数据都会有一次put 和 get 的 IO 操作,故会比较慢些。 至于你提到的为什么 24h size,2s slide 的窗口没有延迟,5 min,1s 的连续 trigger 缺延迟了。这两者的行为不一样,其实没有什么可比的。 对于第二种,trigger 是依靠 timer 注册触发的,这样的话每秒都需要进行触发(如果是 process time),这样可能会太密集了。

Re: NoResourceAvailableException

2020-10-28 Thread Khachatryan Roman
Hi Alexander, Thanks for sharing, I see a lot of exceptions in the logs, particularly *Caused by: java.net.BindException: Could not start actor system on any port in port range 6123 which means that there's probably more than one instance running and is likely the root cause. So it makes sense

Re: Building Flink on VirtualBox VM failing

2020-10-28 Thread Khachatryan Roman
The values printed by the OOM killer seem indeed strange. But from the line above the memory usage seems fine: rss=2440960. Running the given command I see only one forked process. Probably, this is an issue of OOM killer running in VM on Wwindows host. Can you try with OOM killer disabled?

Re: 官方后续会有支持kafka lag metric的计划吗

2020-10-28 Thread silence
hi zhisheng 我找到两篇相关的参考博客你看一下 https://blog.csdn.net/a1240466196/article/details/107853926 https://www.jianshu.com/p/c7515bdde1f7 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 官方后续会有支持kafka lag metric的计划吗

2020-10-28 Thread zhisheng
hi, silence 对于你提到的第一种方案,我觉得在 flink 里面是做不到的,因为 flink 只可以拿得到消费数据的 offset 信息,但是拿不到 kafka 里面该 topic 具体分区最新的 offset 值,那么也就无法相减得到每个分区的 lag,从而无法获取整个 topic 的 lag。 对于第二种方案,我觉得是可行的,可以在自己作业里面埋点(当前系统时间与消费到的数据的事件时间的差值),然后每个并行度分别上报,最后监控页面可以看到作业分区延迟最大是多长时间。 Best! zhisheng silence 于2020年10月28日周三 下午7:55写道: >

Re: 关于并行下watermark不生成

2020-10-28 Thread zhisheng
hi,Benchen 可以考虑在 source 算子后面加一个 rebalance() Best! zhisheng Shubin Ruan 于2020年10月28日周三 下午7:36写道: > 可以考虑在数据源处进行处理: > > > 设置个时间阈值,若检测到某个 key 下的数据超过时间阈值还未更新,则根据系统的 processing time 按照某种逻辑生成1条水印发送到下游。 > 在 2020-10-28 18:54:22,"BenChen" 写道: > >Hi >

Re: Flink是否可以动态调整任务并行度

2020-10-28 Thread zhisheng
应该不支持 ZT.Ren <18668118...@163.com> 于2020年10月28日周三 下午3:53写道: > 印象中,Flink1.9之后的某个版本支持动态调整并行度,但忘记怎么使用了。有没有哪位同学帮忙指点下,多谢

官方后续会有支持kafka lag metric的计划吗

2020-10-28 Thread silence
目前消费kafka会有lag的情况发生,因此想基于flink metric进行上报监控kakfa的消费延时情况 主要是两种情况: 1、当前group消费的offset和对应topic最大offset之间的差值,也就是积压的数据量 2、当前消费的最新记录的timestamp和系统时间之间的差值,也就是消费的时间延时 kafka lag的监控对实时任务的稳定运行有着非常重要的作用, 网上也检索到了一些基于源码修改的实现,但修改源码的话也不利于后面flink版本的升级,还是希望官方可以考虑支持一下 -- Sent from:

Re:通过算子的构造方法传递变量失效

2020-10-28 Thread Shubin Ruan
hi,可以把部分代码贴出来看一下吗? 在 2020-10-28 17:29:58,"freeza1...@outlook.com" 写道: >hi all: >我定义了1个flatMap,通过构造方法传递了1个int类型的变量, 我在最外层定义了2条流,流定义的时候.flatMap(int)传入了这个变量, > 目前有2个不同的flatmap,构造方法传入的这个int变量为2个不同的值, >当有数据流过这2个算子的时候,发现该int变量并没有发生变化,请如何给算子传递变量。 > > > >freeza1...@outlook.com

Re:关于并行下watermark不生成

2020-10-28 Thread Shubin Ruan
可以考虑在数据源处进行处理: 设置个时间阈值,若检测到某个 key 下的数据超过时间阈值还未更新,则根据系统的 processing time 按照某种逻辑生成1条水印发送到下游。 在 2020-10-28 18:54:22,"BenChen" 写道: >Hi >all,在Flink1.11里面新增了WatermarkStetagy来处理某个并行度下没有数据导致watermark不触发的问题,在1.10里面Flink有什么机制解决这个问题吗?谢谢 > > >| | >BenChen >| >| >haibin...@163.com >| >签名由网易邮箱大师定制 >

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Chesnay Schepler
Then we can't remove it, because there is no way for us to ascertain whether anyone is still using it. Sure, the user ML is the best we got, but you can't argue that we don't want any users to be affected and then use an imperfect mean to find users. If you are fine with relying on the user

Re:退订

2020-10-28 Thread Shubin Ruan
发送邮件到 user-zh-unsubscr...@flink.apache.org 即可完成退订。 在 2020-10-28 19:20:27,"李国鹏" 写道: >退订

Re:退订

2020-10-28 Thread hailongwang
Hi, 退订需要发送邮件到 user-zh-unsubscr...@flink.apache.org 更多详细情况可以参考[1] [1] https://flink.apache.org/community.html#mailing-lists Best, Hailong Wang 在 2020-10-28 18:20:27,"李国鹏" 写道: >退订

退订

2020-10-28 Thread 李国鹏
退订

Re: RestClusterClient and classpath

2020-10-28 Thread Flavio Pompermaier
I'm runnin the code from Eclipse, the jar exists and it contains the classes Flink is not finding..maybe I can try to use IntelliJ in the afternoon On Wed, Oct 28, 2020 at 12:13 PM Chesnay Schepler wrote: > @Kostas: Ah, I missed that. > > @Flavio: the only alternative I can think your jar does

Checkpoint size的问题

2020-10-28 Thread gsralex
Hi, All Checkpoint 一般Web UI显示的是400MB左右,但是查看HDFS实际的大小,不到1MB(_metadata) ,想问下这之间size的偏差为什么这么大?

Re: RestClusterClient and classpath

2020-10-28 Thread Chesnay Schepler
@Kostas: Ah, I missed that. @Flavio: the only alternative I can think your jar does not contain the classes, or does not exist at all on the machine your application is run on. On 10/28/2020 12:08 PM, Kostas Kloudas wrote: Hi all, I will have a look in the whole stack trace in a bit.

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Chesnay Schepler
The alternative could also be to use a different argument than "no one uses it", e.g., we are fine with removing it at the cost of friction for some users because there are better alternatives. On 10/28/2020 10:46 AM, Kostas Kloudas wrote: I think that the mailing lists is the best we can do

Re: RestClusterClient and classpath

2020-10-28 Thread Kostas Kloudas
Hi all, I will have a look in the whole stack trace in a bit. @Chesnay Schepler I think that we are setting the correct classloader during jobgraph creation [1]. Is that what you mean? Cheers, Kostas [1]

Re: adding core-site xml to flink1.11

2020-10-28 Thread Shachar Carmeli
10x On 2020/10/27 10:42:40, Robert Metzger wrote: > Hi, > > it seems that this is what you have to do for now. However, I see that it > would be nice if Flink would allow reading from multiple configuration > files, so that you can have a "common configuration" and a "per cluster" >

关于并行下watermark不生成

2020-10-28 Thread BenChen
Hi all,在Flink1.11里面新增了WatermarkStetagy来处理某个并行度下没有数据导致watermark不触发的问题,在1.10里面Flink有什么机制解决这个问题吗?谢谢 | | BenChen | | haibin...@163.com | 签名由网易邮箱大师定制

Fwd: Kubernetes Job Cluster, does it autoterminate?

2020-10-28 Thread Ruben Laguna
Hi, First time user , I'm just evaluating Flink at the moment, and I was reading https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#deploy-job-cluster and I don't fully understand if a Job Cluster will autoterminate after the job is completed (for at batch job)

Re: RestClusterClient and classpath

2020-10-28 Thread Flavio Pompermaier
Always the same problem. Caused by: java.lang.ClassNotFoundException: it.test.XXX at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) at

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-28 Thread Ori Popowski
Hi Xintong, See here: # Top memory users ps auxwww --sort -rss | head -10 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND yarn 20339 35.8 97.0 128600192 126672256 ? Sl Oct15 5975:47 /etc/alternatives/jre/bin/java -Xmx54760833024 -Xms54760833024 -XX:Max root

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Kostas Kloudas
I think that the mailing lists is the best we can do and I would say that they seem to be working pretty well (e.g. the recent Mesos discussion). Of course they are not perfect but the alternative would be to never remove anything user facing until the next major release, which I find pretty

Re: Native memory allocation (mmap) failed to map 1006567424 bytes

2020-10-28 Thread Xintong Song
Hi Ori, The error message suggests that there's not enough physical memory on the machine to satisfy the allocation. This does not necessarily mean a managed memory leak. Managed memory leak is only one of the possibilities. There are other potential reasons, e.g., another process/container on

Re: RestClusterClient and classpath

2020-10-28 Thread Chesnay Schepler
hmm..it appears as if PackagedProgramUtils#createJobGraph does some things outside the usercode classlodaer (getPipelineFromProgram()), specifically the call to the main method. @klou This seems like wrong behavior? @Flavio What you could try in the meantime is wrap the call to

tumbling的窗口更新随着job运行时间越长,delay越久,sliding不会

2020-10-28 Thread marble.zh...@coinflex.com.INVALID
大家好。 我用的tumbling window, ds.keyBy(CandleView::getMarketCode) .timeWindow(Time.minutes(5L)) .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(1))) .aggregate(new OhlcAggregateFunction(), new OhlcWindowFunction()) .addSink(new

通过算子的构造方法传递变量失效

2020-10-28 Thread freeza1...@outlook.com
hi all: 我定义了1个flatMap,通过构造方法传递了1个int类型的变量, 我在最外层定义了2条流,流定义的时候.flatMap(int)传入了这个变量, 目前有2个不同的flatmap,构造方法传入的这个int变量为2个不同的值, 当有数据流过这2个算子的时候,发现该int变量并没有发生变化,请如何给算子传递变量。 freeza1...@outlook.com

Re: RestClusterClient and classpath

2020-10-28 Thread Flavio Pompermaier
Any help here? How can I understand why the classes inside the jar are not found when creating the PackagedProgram? On Tue, Oct 27, 2020 at 11:04 AM Flavio Pompermaier wrote: > In the logs I see that the jar is the classpath (I'm trying to debug the > program from the IDE)..isn'it? > >

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Chesnay Schepler
If the conclusion is that we shouldn't remove it if _anyone_ is using it, then we cannot remove it because the user ML obviously does not reach all users. On 10/28/2020 9:28 AM, Kostas Kloudas wrote: Hi all, I am bringing the up again to see if there are any users actively using the

Re: flink1.11日志上报

2020-10-28 Thread m13162790856
我们这边也是这样搜集日志上报 es 保留最近一个月的数据不回保留全部数据 在 2020年10月27日 20:48,zhisheng 写道: 弱弱的问一下,你们集群作业数量大概多少?因为用户可能打印原始数据在日志里面,这个数据量确实还是很大的,全部将日志打到 ES 每月需要多少成本啊? Storm☀️ 于2020年10月27日周二 下午8:37写道: > 我们也是用的kafkaappender进行日志上报,然后在ES中提供日志检索 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ >

the remote task manager was lost

2020-10-28 Thread guanxianchun
flink版本: flink-1.11 taskmanager memory: 8G jobmanager memory: 2G akka.ask.timeout:20s akka.retry-gate-closed-for: 5000 client.timeout:600s 运行一段时间后报the remote task manager was lost ,错误信息如下: 2020-10-28 00:25:30,608 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Completed

Re: [BULK]Re: [SURVEY] Remove Mesos support

2020-10-28 Thread Till Rohrmann
Hi Oleksandr, yes you are right. The biggest problem is at the moment the lack of test coverage and thereby confidence to make changes. We have some e2e tests which you can find here [1]. These tests are, however, quite coarse grained and are missing a lot of cases. One idea would be to add a

Re: flink hive Streaming查询不到数据的问题

2020-10-28 Thread Jingsong Li
Hi, 你的Source看起来并没有产出watermark,所以: 你可以考虑使得Source产出正确的watermark,或者使用'sink.partition-commit.trigger'的默认值proc-time。 Best, Jingsong On Wed, Oct 28, 2020 at 4:13 PM hdxg1101300...@163.com < hdxg1101300...@163.com> wrote: > 你好: > 我现在在使用flink 1.11.2版本 hive1.1.0 版本; > 当我在使用flink hive

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-28 Thread Kostas Kloudas
Hi all, I am bringing the up again to see if there are any users actively using the BucketingSink. So far, if I am not mistaken (and really sorry if I forgot anything), it is only a discussion between devs about the potential problems of removing it. I totally understand Chesnay's concern about

flink hive Streaming查询不到数据的问题

2020-10-28 Thread hdxg1101300...@163.com
你好: 我现在在使用flink 1.11.2版本 hive1.1.0 版本; 当我在使用flink hive streaming的使用发现按照 示例写数据到hive 可以看到指定目录下已经生产了文件,但是使用hive查询没有数据; 好像是分区信息没有提交到hive meta store;但是官网已经说实现了这个功能;我操作却不行 下面是我的代码 object StreamMain { def main(args: Array[String]): Unit = { val streamEnv =

回复: Re: 关于flink-sql 维表join问题

2020-10-28 Thread 史 正超
我最近的写的业务和你差不多,不过我关联的是两张表,一张mysql的维表,一张binlog的流表。最开始我都是left join ,发现只有binlog流表有数据时才计算。 后面 我做嵌套的查询,先与mysql维表inner join(直接join),然后再套一层query 再与流表left join,现在情况正常。就算binlog的流表没有数据也有计算到。 发件人: Jark Wu 发送时间: 2020年10月28日 7:24 收件人: user-zh 主题: Re: Re: 关于flink-sql 维表join问题

flinkSQL针对join操作设置不同窗口

2020-10-28 Thread 奔跑的小飞袁
hello 我们这有一种业务场景是关于两个动态表的join,其中一张表是完全的动态表,去关联另一张动态表中当天的数据,请问这种情况的下join场景支持吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Flink是否可以动态调整任务并行度

2020-10-28 Thread ZT.Ren
印象中,Flink1.9之后的某个版本支持动态调整并行度,但忘记怎么使用了。有没有哪位同学帮忙指点下,多谢

Re: flinksql 不支持 % 运算

2020-10-28 Thread Danny Chan
%是非标准的 SQL 语法,不推荐使用。 Benchao Li 于2020年10月26日周一 下午9:26写道: > 1.11的话通过配置是无法实现的。可以把这个pr[1] cherry-pick到1.11的分支上编译一下来实现1.11上使用% > > [1] https://github.com/apache/flink/pull/12818 > > 夜思流年梦 于2020年10月26日周一 下午4:16写道: > > > flink 版本1.11 > > 目前flink-sql 好像不支持取余运算,会报错: > > 比如:SELECT * FROM Orders WHERE a

Re: 请问批处理有反压嘛?

2020-10-28 Thread Danny Chan
有的,反压机制借助于 runtime 的网络 buffer,和批流无关。 请叫我雷锋 <854194...@qq.com> 于2020年10月27日周二 下午8:02写道: > 如题

Re: Working with bounded Datastreams - Flink 1.11.1

2020-10-28 Thread Danny Chan
In SQL, you can use the over window to deduplicate the messages by the id [1], but i'm not sure if there are same semantic operators in DataStream. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication s_penakalap...@yahoo.com 于2020年10月28日周三

Re: Re: 关于flink-sql 维表join问题

2020-10-28 Thread Jark Wu
因为我理解你是想一天一个全量用户绩效结果表,不覆盖前一天的绩效结果,在我看来这就是批的需求了,因为需要重新读取 source 数据,而这从需求上就是有一天的 delay。 如果你能接受覆盖之前的绩效,那么可能可以使用 mysql-cdc connector 去获取 users 的全量+增量流式 source。 比如 mysql_cdc_users 就是去读取 mysql 中的 users 表,然后可以 left join 上绩效流等等,覆盖更新用户的绩效。 insert into mysql_users_latest_kpi select * from mysql_cdc_users

Re:Re: 关于flink-sql 维表join问题

2020-10-28 Thread 夜思流年梦
批处理的确是可以解决这类问题,只不过不是实时的了,主要是想使用flink-sql解决这类报表问题; 另外问一句, flink-sql 有打算解决这类问题吗?我感觉这个场景还挺多的呢 维表 left join 一张流表, 维表全量数据关联流表,既能获取到实时流表的统计数据,又能保证维表的数据是一个实时更新(或者定期更新)的状态 在 2020-10-27 17:24:05,"Jark Wu" 写道: >我觉得这个更像是一个周期性调度的批处理需求。因为你流处理,只能一直读取员工表的增量,没法每天读个全量。 >是不是用 flink batch +