Re: Re:flink sql 处理时间 时区问题

2020-05-11 Thread 刘大龙
Hi, 一种简单的方式,自己手动在proctime上加8个小时,目前这个时区设置好像是不起作用的。 > -原始邮件- > 发件人: hb <343122...@163.com> > 发送时间: 2020-05-12 13:02:44 (星期二) > 收件人: user-zh@flink.apache.org > 抄送: > 主题: Re:flink sql 处理时间 时区问题 > > 有人能帮忙看下这个问题么 > > > > 在 2020-05-01 16:26:35,"hb" <343122...@163.com> 写道: > > > > ``` 代码

Re: Flink Session Window在窗口创建的时候输出一条数据

2020-05-11 Thread gang.gou
好的,我再研究研究,谢谢 在 2020/5/12 下午12:01,“Benchao Li” 写入: 看起来你使用的应该不是blink planner,因为blink planner里的Trigger跟DataStream里面的Trigger是不一样的。 所以如果你用的是legacy planner,可能这个方法是可行的。(我对legacy planner不熟悉) 但是如果你以后转到blink planner,这个应该是搞不了的。 gang.gou 于2020年5月12日周二 上午11:18写道: >

Re:flink sql 处理时间 时区问题

2020-05-11 Thread hb
有人能帮忙看下这个问题么 在 2020-05-01 16:26:35,"hb" <343122...@163.com> 写道: ``` 代码 val env = StreamExecutionEnvironment.getExecutionEnvironment val settings: EnvironmentSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build() val tEnv: StreamTableEnvironment =

Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jark Wu
Yes, that's right. On Tue, 12 May 2020 at 10:55, Jiahui Jiang wrote: > Thank you for confirming! > > Just want to make sure my understanding of the internal implementation is > correct: > > When applying an over window and ordered by processing time using SQL, the > datastream plan it

Re: Flink Session Window在窗口创建的时候输出一条数据

2020-05-11 Thread Benchao Li
看起来你使用的应该不是blink planner,因为blink planner里的Trigger跟DataStream里面的Trigger是不一样的。 所以如果你用的是legacy planner,可能这个方法是可行的。(我对legacy planner不熟悉) 但是如果你以后转到blink planner,这个应该是搞不了的。 gang.gou 于2020年5月12日周二 上午11:18写道: > 我想通过trigger的方式来实现这个需求,想法是重写EventTimeTrigger > ,在首条记录进入系统时,触发一下purge,通过ValueStatus >

Re: Broadcast state vs data enrichment

2020-05-11 Thread Manas Kale
Sure. Apologies for not making this clear enough. > each operator only stores what it needs. Lets imagine this setup : BROADCAST STREAM config-stream | |

Re: Prometheus pushgateway 监控 Flink metrics的问题

2020-05-11 Thread 李佳宸
十分感谢~~~但我确实RandomJobNameSuffix为true时没有问题,很奇怪。 另外,我使用prometheus reporter发现比pushgateway少了特别多的metrics,不知道您有这种情况吗? 972684638 于2020年5月12日周二 上午10:22写道: > 我不清楚这算不算BUG,但是你说的问题,我确实遇到过,并经历了一段时间的排查,最终得以解决。 > >

Re: Flink Session Window在窗口创建的时候输出一条数据

2020-05-11 Thread gang.gou
我想通过trigger的方式来实现这个需求,想法是重写EventTimeTrigger ,在首条记录进入系统时,触发一下purge,通过ValueStatus 记录状态。但是现在遇到的问题是,会重复触发多少。而且窗口关闭时的触发,WindowFunction Function收到的记录也不对。请问是我使用方式不对吗? 自定义的trigger: public class SessionComputeTrigger extends Trigger { private static final long serialVersionUID = 1L; static

Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jiahui Jiang
Thank you for confirming! Just want to make sure my understanding of the internal implementation is correct: When applying an over window and ordered by processing time using SQL, the datastream plan it translates into doesn't actually have an order by logic. It just sequentially process all

Re: 1.10 使用 flinkSQL 的row_number()函数实现top1 出现数组越界,求助社区大佬

2020-05-11 Thread Jark Wu
多谢汇报问题,这应该是一个 bug,我开了一个 issue 来跟进这个问题: https://issues.apache.org/jira/browse/FLINK-17625 Best, Jark On Tue, 12 May 2020 at 09:25, 1101300123 wrote: > > > 我的SQL语句如下,部分字段省略 > select >a.contact_id, >... >a.code_contact_channel > from >( >select > contact_id, >

Re: Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Jark Wu
Thanks for the investigation, and I think yes, this is a bug and is going to be fixed in FLINK-16160. Best, Jark On Tue, 12 May 2020 at 02:28, Khachatryan Roman wrote: > Hi Yuval, > > Thanks for reporting this issue. I'm pulling in Timo and Jark who are > working on the SQL component. They

??????Prometheus pushgateway ???? Flink metrics??????

2020-05-11 Thread 972684638
BUG?? metrics.reporter.promgateway.randomJobNameSuffixpushgateway??GET??POST

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Xintong Song
Hi Jacky, Could you search for "Application Master start command:" in the debug log and post the result and a few lines before & after that? This is not included in the clip of attached log file. Thank you~ Xintong Song On Tue, May 12, 2020 at 5:33 AM Jacky D wrote: > hi, Robert > > Thanks

Re: Preserve record orders after WINDOW function

2020-05-11 Thread Jark Wu
Hi Jiahui, Yes, if they arrive at the same millisecond, they are perserved in the arriving order. Best, Jark On Mon, 11 May 2020 at 23:17, Jiahui Jiang wrote: > Hello! I'm writing a SQL query with a OVER window function ordered by > processing time. > > I'm wondering since timestamp is only

回复: Flink 严重背压问题排查

2020-05-11 Thread 1101300123
下游算子的处理代码排查,往往是下游处理能力不足 在2020年5月12日 09:32,shimin huang 写道: Hello aven.wu: 可以看下各个operator的metrics的指标,比如它的buffers.outPoolUsage、buffers.inPoolUsage、buffers.inputFloatingBuffersUsage、buffers.inputExclusiveBuffersUsage, - 如果一个 Subtask 的发送端 Buffer 占用率很高,则表明它被下游反压限速了;如果一个 Subtask 的接受端 Buffer

Re: Flink 严重背压问题排查

2020-05-11 Thread shimin huang
Hello aven.wu: 可以看下各个operator的metrics的指标,比如它的buffers.outPoolUsage、buffers.inPoolUsage、buffers.inputFloatingBuffersUsage、buffers.inputExclusiveBuffersUsage, - 如果一个 Subtask 的发送端 Buffer 占用率很高,则表明它被下游反压限速了;如果一个 Subtask 的接受端 Buffer 占用很高,则表明它将反压传导至上游。 - outPoolUsage 和 inPoolUsage 同为低或同为高分别表明当前

1.10 使用 flinkSQL 的row_number()函数实现top1 出现数组越界,求助社区大佬

2020-05-11 Thread 1101300123
我的SQL语句如下,部分字段省略 select a.contact_id, ... a.code_contact_channel from ( select contact_id, service_no, ... code_contact_channel, row_number() over(partition by contact_id,service_no order by operate_time desc) as rn from table1 )a join (

Prometheus pushgateway 监控 Flink metrics的问题

2020-05-11 Thread 李佳宸
您好! 我在使用prometheus监控flink时发现一个问题不知是不是bug,反映如下 版本信息 Flink 1.9.1 Prometheus 2.18 pushgateway 1.2.0 问题: 配置 metrics.reporter.promgateway.randomJobNameSuffix为false后,部分metrics不能正确的push到pushgateway里。具体表现是,部分metrics(主要是jobmanager相关,如 flink_jobmanager_Status_JVM_CPU_Load

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, Robert Thanks so much for quick reply , I changed the log level to debug and attach the log file . Thanks Jacky Robert Metzger 于2020年5月11日周一 下午4:14写道: > Thanks a lot for posting the full output. > > It seems that Flink is passing an invalid list of arguments to the JVM. > Can you > -

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hello Roman, PFB my response - As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? Yes, correct. distinct events and devices. Each device emits these event. > Based on data protocol I have 4-5

ProducerRecord with Kafka Sink for 1.8.0

2020-05-11 Thread Nick Bendtner
Hi guys, I use 1.8.0 version for flink-connector-kafka. Do you have any recommendations on how to produce a ProducerRecord from a kafka sink. Looking to add support to kafka headers therefore thinking about ProducerRecord. If you have any thoughts its highly appreciated. Best, Nick.

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread Khachatryan Roman
Hi Hemant, As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? > Based on data protocol I have 4-5 topics. Currently the data for a single event is being pushed to a partition of the kafka

Re: Not able to implement an usecase

2020-05-11 Thread Jaswin Shah
If I go with table apis, can I write the streams to hive or it is only for batch processing as of now. Get Outlook for Android From: Khachatryan Roman Sent: Tuesday, May 12, 2020 1:49:10 AM To: Jaswin Shah Cc: user@flink.apache.org

Re: Not able to implement an usecase

2020-05-11 Thread Khachatryan Roman
Hi Jaswin, Currently, DataStream API doesn't support outer joins. As a workaround, you can use coGroup function [1]. Hive is also not supported by DataStream API though it's supported by Table API [2]. [1]

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Robert Metzger
Thanks a lot for posting the full output. It seems that Flink is passing an invalid list of arguments to the JVM. Can you - set the root log level in conf/log4j-yarn-session.properties to DEBUG - then launch the YARN session - share the log file of the yarn session on the mailing list? I'm

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
Hi,Robert Yes , I tried to retrieve more log info from yarn UI , the full logs showing below , this happens when I try to create a flink yarn session on emr when set up jitwatch configuration . 2020-05-11 19:06:09,552 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Robert Metzger
Hey Jacky, The error says "The YARN application unexpectedly switched to state FAILED during deployment.". Have you tried retrieving the YARN application logs? Does the YARN UI / resource manager logs reveal anything on the reason for the deployment to fail? Best, Robert On Mon, May 11, 2020

Fwd: Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
-- Forwarded message - 发件人: Jacky D Date: 2020年5月11日周一 下午3:12 Subject: Re: Flink Memory analyze on AWS EMR To: Khachatryan Roman Hi, Roman Thanks for quick response , I tried without logFIle option but failed with same error , I'm currently using flink 1.6

Re: Flink Memory analyze on AWS EMR

2020-05-11 Thread Khachatryan Roman
Hi Jacky, Did you try it without -XX:LogFile=${FLINK_LOG_PREFIX}.jit ? Probably, Flink can't write to this location. Also, you can try other tools described at https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html Regards, Roman On Mon, May 11, 2020 at

Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hi, I have different events from a device which constitutes different metrics for same device. Each of these event is produced by the device in interval of few milli seconds to a minute. Event1(Device1) -> Stream1 -> Metric 1 Event2 (Device1) -> Stream2 -> Metric 2 ... .. ...

Re: Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Khachatryan Roman
Hi Őrhidi, Can you please provide some details about the errors you get? Regards, Roman On Mon, May 11, 2020 at 9:32 AM Őrhidi Mátyás wrote: > Dear Community, > > I'm having troubles testing jobs against a secure Hadoop cluster. Is that > possible? The mini cluster seems to not load any

Re: Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Khachatryan Roman
Hi Yuval, Thanks for reporting this issue. I'm pulling in Timo and Jark who are working on the SQL component. They might be able to help you with your problem. Regards, Roman On Mon, May 11, 2020 at 9:10 AM Yuval Itzchakov wrote: > Hi, > While migrating from Flink 1.9 -> 1.10 and from the

Re: Broadcast state vs data enrichment

2020-05-11 Thread Khachatryan Roman
Hi Manas, The approaches you described looks the same: > each operator only stores what it needs. > each downstream operator will "strip off" the config parameter that it needs. Can you please explain the difference? Regards, Roman On Mon, May 11, 2020 at 8:07 AM Manas Kale wrote: > Hi, > I

Re: MongoSink

2020-05-11 Thread Khachatryan Roman
Hi Aissa, What is BSONWritable you pass from map to sink? I guess it's not serializable which causes Flink to use kryo, which fails. Regards, Roman On Sun, May 10, 2020 at 10:42 PM Aissa Elaffani wrote: > Hello Guys, > I am trying to sink my data to MongoDB, But i got some errors. I am >

Not able to implement an usecase

2020-05-11 Thread Jaswin Shah
Hi, I want to implement the below use case in my application: I am doing an interval join between two data streams and then, in process function catching up the discrepant results on joining. Joining is done on key orderId. Now, I want to identify all the messages in both datastreams which are

Preserve record orders after WINDOW function

2020-05-11 Thread Jiahui Jiang
Hello! I'm writing a SQL query with a OVER window function ordered by processing time. I'm wondering since timestamp is only millisecond granularity. For a query using over window and sorted on processing time column, for example, ``` SELECT col1, max(col2) OVER (PARTITION BY col1, ORDER

Flink Memory analyze on AWS EMR

2020-05-11 Thread Jacky D
hi, All I'm encounter a memory issue with my flink job on AWS EMR(current flink version 1.6.2) , I would like to find the root cause so I'm trying JITWatch on my local standalone cluster but I can not use it on EMR . after I add following config on flink-conf.yaml : env.java.opts:

Flink 严重背压问题排查

2020-05-11 Thread aven . wu
Hello 大家好 今天遇到个Flink 背压的问题,导致程序完全卡主不在处理数据,从监控页面看是应该是 keyProcess-> sink :alarm state   处理数据有问题,导致上游 ruleProcess 出现背压。 KeyProcess 是中定义了一个MapState,每来一条数据会读取和更新state中的内容。Sink 是写入kafka已排除不是kafka的问题 http://qiniu.lgwen.cn/13F2DE58-98C1-4851-B54A-6BDC3C646169.png, http://qiniu.lgwen.cn/image/jvm.png

Re: 对flink源码中watermark对齐逻辑的疑惑

2020-05-11 Thread Yun Tang
Hi 正是因为取各个input channel的最小值,所以如果某一个上游一直没有获取到真实数据,发送下来的watermark一直都是Long.MIN_VALUE,这样会导致无法触发window,社区采用idle source [1]的方式walk around该问题 [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_time.html#idling-sources 祝好 唐云 From: Benchao Li Sent:

Re: Flink REST API side effect?

2020-05-11 Thread Chesnay Schepler
yes, that is correct. On 11/05/2020 14:28, Tomasz Dudziak wrote: Thanks for reply. So do I understand correctly if I say that whenever you query job status it gets cached for a configurable amount of time and subsequent queries within that time period will not show any change?

RE: Flink REST API side effect?

2020-05-11 Thread Tomasz Dudziak
Thanks for reply. So do I understand correctly if I say that whenever you query job status it gets cached for a configurable amount of time and subsequent queries within that time period will not show any change? From: Chesnay Schepler Sent: 11 May 2020 13:20 To: Tomasz Dudziak ;

Re: Flink REST API side effect?

2020-05-11 Thread Chesnay Schepler
This is expected, the backing data structure is cached for a while so we never hammer the JobManager with requests. IIRC this is controlled via "web.refresh-interval", with the default being 3 seconds. On 11/05/2020 14:10, Tomasz Dudziak wrote: Hi, I found an interesting behaviour of the

Flink REST API side effect?

2020-05-11 Thread Tomasz Dudziak
Hi, I found an interesting behaviour of the REST API in my automated system tests using that API where I was getting status of a purposefully failing job. If you query job details immediately after job submission, subsequent queries will return its status as RUNNING for a moment until Flink's

?????? Re: flink-1.10 on yarn????????????

2020-05-11 Thread 1101300123
??yarn ?? ??2020??5??11?? 18:17??1193216154<1193216...@qq.com> ?? conflog4j ---- ??:"1101300123"

?????? Re: flink-1.10 on yarn????????????

2020-05-11 Thread 1193216154
conflog4j ---- ??:"1101300123"

回复: Re: flink-1.10 on yarn日志输出问题

2020-05-11 Thread 1101300123
我也是被这个日志搞得不行 根本找不到日志在哪;err里面也没有我自己打印的 在2020年5月11日 14:15,guaishushu1...@163.com 写道: Yarn webUI 也查不到日志内容,日志都输出到.err文件里面了,flink和yarn查不到日志。 guaishushu1...@163.com 发件人: LakeShen 发送时间: 2020-05-09 11:18 收件人: user-zh 主题: Re: flink-1.10 on yarn日志输出问题 Yarn 日志的话,直接根据 任务的 Application ID ,去 Yarn 的 Web UI

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread Tzu-Li (Gordon) Tai
In that case, the most possible cause would be https://issues.apache.org/jira/browse/FLINK-16313, which is included in Flink 1.10.1 (to be released) The release candidates for Flink 1.10.1 is currently ongoing, would it be possible for you to try that out and see if the error still occurs? On

Re: 对flink源码中watermark对齐逻辑的疑惑

2020-05-11 Thread Benchao Li
Hi, 我觉得你理解的是正确的。watermark就是取各个input channel的最小值作为当前subtask的watermark的。 1193216154 <1193216...@qq.com> 于2020年5月11日周一 下午3:17写道: > 大家好,最近在看watermark传递的源码解析的时候,对watermark对齐逻辑有一些疑惑。代码如下 > > public void inputWatermark(Watermark watermark, int channelIndex) { > // ignore the input

Re: Flink Session Window在窗口创建的时候输出一条数据

2020-05-11 Thread Benchao Li
这个暂时应该是没有办法做到这一点。或者你可以用两个query来实现这个? 比如一个query是统计first_value;第二个是真正的session window。 gang.gou 于2020年5月11日周一 下午3:07写道: > Hi, > > 我想使用Flink的Session Window > 去统计一个用户的在线访问时长,和统计当前活跃的用户数量,目前只能在window结束的时候才输出一条包含窗口开始和结束时间的记录,怎么在窗口创建的时候就先输出一条记录,结束的时候再去更新这条记录呢,谢谢! > > -- Benchao Li School of

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread luisfaamaral
Thanks Gordon and Seth for the reply. So.. the main project contains the below flink dependencies... And the state processor project contains the following: 1.9.0 At the first sight I may say all the libraries match to 1.9.0 flink libraries within both projects. -- Sent from:

Flink PB序列化疑惑

2020-05-11 Thread chanamper
大家好,请教一下Flink序列化相关的问题。 DataStream windowCounts = env .addSource() .transform(); 我flink任务程序中的对象为Record,Record下的Head字段为PB协议的数据,对Head类注册了PB序列化方式。 env.getConfig().registerTypeWithKryoSerializer( Head.class, ProtobufSerializer.class);

回复:flink on yarn消费开启kerberos的kafka

2020-05-11 Thread Jacky Lau
从开启kerberos 的kafka 消费环境需要加入三个kafka 参数到ddl的properties 中 发自我的iPhone -- 原始邮件 -- 发件人: 蒋佳成(Jiacheng Jiang) <920334...@qq.com 发送时间: 2020年5月9日 16:06 收件人: user-zh

Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Őrhidi Mátyás
Dear Community, I'm having troubles testing jobs against a secure Hadoop cluster. Is that possible? The mini cluster seems to not load any security modules. Thanks, Matyas

??flink??????watermark??????????????

2020-05-11 Thread 1193216154
watermarkwatermark public void inputWatermark(Watermark watermark, int channelIndex) { // ignore the input watermark if its input channel, or all input channels are idle (i.e. overall the valve is idle).

Blink Planner fails to generate RowtimeAttribute for custom TableSource

2020-05-11 Thread Yuval Itzchakov
Hi, While migrating from Flink 1.9 -> 1.10 and from the old planner to Blink, I've ran into an issue where the Blink Planner doesn't take into account the RowtimeAttribute defined on a custom table source. I've opened an issue: https://issues.apache.org/jira/browse/FLINK-17600 and was wondering if

Flink Session Window在窗口创建的时候输出一条数据

2020-05-11 Thread gang.gou
Hi,    我想使用Flink的Session Window 去统计一个用户的在线访问时长,和统计当前活跃的用户数量,目前只能在window结束的时候才输出一条包含窗口开始和结束时间的记录,怎么在窗口创建的时候就先输出一条记录,结束的时候再去更新这条记录呢,谢谢!

Re: flink1.9,FSstatebackend,checkpoint失败

2020-05-11 Thread Congxian Qiu
Hi 从你给的日志看,错误原因是“No lease on /flink/checkpoints/OnlineOrderAddressMatch/8183afb213bfbd32b49e3a6bab977f7c/chk-25/d993f40c-a8a1-4582-ad7b-ac47a2b163a0 (inode 25791993): File does not exist. Holder DFSClient_NONMAPREDUCE_1212990289_95 does not have any open files.” 也就是文件不存在了,你需要看看为什么文件不存在了,或许你可以从

Re: MySQL写入并行度参数设置问题

2020-05-11 Thread Leonard Xu
Hi, 目前应该是没有的,SQL还不支持对单独的operator设置并行度, 只能针对作业设置。 Best, Leonard Xu > 在 2020年5月11日,14:36,Senior.Hu <463302...@qq.com> 写道: > > Hi All, 在使用Flink1.10 SQL写MySQL数据源时,经常出现deadlock问题。 > 目前想通过在CREATE TABLE定义DDL时,限制写入时并行度解决此问题,但是在官网上JDBC > Connector没找到对应可设置参数,想问下目前有没有支持此参数设置?

MySQL??????????????????????

2020-05-11 Thread Senior.Hu
Hi All?? ??Flink1.10 SQL??MySQL??deadlock?? CREATE TABLEDDLJDBC Connector??

Re: Re: flink-1.10 on yarn日志输出问题

2020-05-11 Thread guaishushu1...@163.com
Yarn webUI 也查不到日志内容,日志都输出到.err文件里面了,flink和yarn查不到日志。 guaishushu1...@163.com 发件人: LakeShen 发送时间: 2020-05-09 11:18 收件人: user-zh 主题: Re: flink-1.10 on yarn日志输出问题 Yarn 日志的话,直接根据 任务的 Application ID ,去 Yarn 的 Web UI 上面看吧。 Best, LakeShen guaishushu1...@163.com 于2020年5月8日周五 下午3:43写道: >

Broadcast state vs data enrichment

2020-05-11 Thread Manas Kale
Hi, I have a single broadcast message that contains configuration data consumed by different operators. For eg: config = { "config1" : 1, "config2" : 2, "config3" : 3 } Operator 1 will consume config1 only, operator 2 will consume config2 only etc. - Right now in my implementation the config