Re: Implementation of setBufferTimeout(timeoutMillis)

2020-08-30 Thread Yun Gao
Hi Pankaj, I think it should be in org.apache.flink.runtime.io.network.api.writer.RecordWriter$OutputFlusher. Best, Yun -- Sender:Pankaj Chand Date:2020/08/31 02:40:15 Recipient:user Theme:Implementation of

Packaging multiple Flink jobs from a single IntelliJ project

2020-08-30 Thread Manas Kale
Hi, I have an IntelliJ project that has multiple classes with main() functions. I want to package this project as a JAR that I can submit to the Flink cluster and specify the entry class when I start the job. Here are my questions: - I am not really familiar with Maven and would appreciate

runtime memory management

2020-08-30 Thread lec ssmi
HI: Generally speaking, when we submitting the flink program, the number of taskmanager and the memory of each tn will be specified. And the smallest real execution unit of flink should be operator. Since the calculation logic corresponding to each operator is different, some need to save the

Re: 关于flink任务的日志收集到kafka,可以在logback配置文件中,加如每个job的id或者name吗?

2020-08-30 Thread Jim Chen
我也是flink1.10.1的版本的,如果按照你的方法,每次启动一个任务,都要在flink-conf.yaml中修改一下`env.java.opts: -Djob.name=xxx`吗?这样的话,是不是太麻烦了 zilong xiao 于2020年8月31日周一 下午12:08写道: > 想问下你用的flink哪个版本呢? > 如果是Flink 1.10-版本,可以在shell脚本中加上 -yD > jobName=xxx,然后在logback自定义PatternLayout中用环境变量`_DYNAMIC_PROPERTIES`获取 > 如果是Flink

Re: 关于flink任务的日志收集到kafka,可以在logback配置文件中,加如每个job的id或者name吗?

2020-08-30 Thread zilong xiao
想问下你用的flink哪个版本呢? 如果是Flink 1.10-版本,可以在shell脚本中加上 -yD jobName=xxx,然后在logback自定义PatternLayout中用环境变量`_DYNAMIC_PROPERTIES`获取 如果是Flink 1.10+版本,则上述方式不可行,因为1.10+版本在作业启动执行 launch_container.sh

回复: flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread sllence
Hi Zou Dan: 可以尝试下立刻语句是否可行 https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table CREATE TABLE Orders ( user BIGINT, product STRING, order_time TIMESTAMP(3) ) WITH ( 'connector' = 'kafka', 'scan.startup.mode' = 'earliest-offset' );

Re: Why consecutive calls of orderBy are forbidden?

2020-08-30 Thread hongfanxo
Hi, Thanks for your reply. I'll look in to the Blink planner later. For the workaround you mentioned, in the actual usage, the second orderBy is wrapped in a function. So I've no idea what has been done for the input Table. -- Sent from:

Re: 关于flink任务的日志收集到kafka,可以在logback配置文件中,加如每个job的id或者name吗?

2020-08-30 Thread Jim Chen
我现在是用shell脚本提交per job模式的任务,现在只能拿到yarn的applicationId,自定义的任务名,拿不到 zilong xiao 于2020年8月27日周四 下午7:24写道: > 如果是用CLI方式提交作业的话是可以做到的 > > Jim Chen 于2020年8月27日周四 下午6:13写道: > > > 如果是自动以PatternLayout的话,我有几点疑问: > > > > >

Re: flink1.11连接mysql问题

2020-08-30 Thread Danny Chan
这个问题已经有 issue 在追踪了 [1] [1] https://issues.apache.org/jira/browse/FLINK-12494 Best, Danny Chan 在 2020年8月28日 +0800 PM3:02,user-zh@flink.apache.org,写道: > > CommunicationsException

Re: flink1.11时间函数

2020-08-30 Thread Danny Chan
对应英文的 deterministic function 可以更好理解些 ~ Best, Danny Chan 在 2020年8月29日 +0800 PM6:23,Dream-底限 ,写道: > 哦哦,好吧,我昨天用NOW的时候直接报错告诉我这是个bug,让我提交issue,我以为这种标示的都是函数功能有问题的 > > Benchao Li 于2020年8月28日周五 下午8:01写道: > > > 不确定的意思是,这个函数的返回值是动态的,每次调用返回可能不同。 > > 对应的是确定性函数,比如concat就是确定性函数,只要输入是一样的,它的返回值就永远都是一样的。 > >

Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大

2020-08-30 Thread Danny Chan
能否提供下完整的 query,方便追踪和排查 ~ Best, Danny Chan 在 2020年8月31日 +0800 AM10:58,zhuyuping <1050316...@qq.com>,写道: > 同样出现了这个问题,SQL 使用中,请问是什么原因,翻转tumble窗口当使用mapview 进行操作时候,状态不断的增长 > 好像不能清理一样,因为正常的window 窗口 窗口结束后会清理状态,现在的情况是1秒的翻转tumble窗口,满满的从最开始的1m 过一个小时变成了1g > 不断的无限增长下去 > > > > -- > Sent from:

Re: flink watermark strategy

2020-08-30 Thread Danny Chan
Watermark mainly serves for windows for the late arrive data, it actually reduces your performance. Best, Danny Chan 在 2020年8月29日 +0800 AM3:09,Vijayendra Yadav ,写道: > Hi Team, > > For regular unbounded streaming application streaming through kafka, which  > does not use any event time or window

Re: Options for limiting state size in TableAPI

2020-08-30 Thread Danny Chan
Thanks for the share ~ The query you gave is actually an interval join[1] , a windowed join is two windowed stream join together, see [2]. Theoretically, for interval join, the state would be cleaned periodically based on the watermark and allowed lateness when the range of RHS had been

Re: flink-sql-gateway还会更新吗

2020-08-30 Thread shougou
感谢各位的辛苦付出,今天就准备试一下。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大

2020-08-30 Thread zhuyuping
同样出现了这个问题,SQL 使用中,请问是什么原因,翻转tumble窗口当使用mapview 进行操作时候,状态不断的增长 好像不能清理一样,因为正常的window 窗口 窗口结束后会清理状态,现在的情况是1秒的翻转tumble窗口,满满的从最开始的1m 过一个小时变成了1g 不断的无限增长下去 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大

2020-08-30 Thread zhuyuping
我这边出现同样的问题,我换成了filesystem 发现state 还是一样缓慢增大,所以应该跟rocksdb 无关 -- Sent from: http://apache-flink.147419.n8.nabble.com/

flink1.10 sql state 聚合函数 窗口统计,状态越来越大,无法自动清理问题

2020-08-30 Thread zhuyuping
http://apache-flink.147419.n8.nabble.com/flink1-10-1-1-11-1-sql-group-td5491.html 跟这个问题类似 Filesystem rocksdb 都试过,rowtime proctime 窗口统计都试过。 CREATE VIEW cpd_expo_feature_collect_view as select imei,incrmentFeatureCollect(CAST(serverTime AS INT),adId) as feature from

来自邮件帮助中心的邮件

2020-08-30 Thread 邮件帮助中心

flink1.10 sql state 聚合函数 窗口统计,状态越来越大,无法自动清理问题

2020-08-30 Thread zhuyuping
现在flink 使用 如下sql: 我创建了一个聚合函数,就是mapview 简单的put ,然后返回string 然后我使用翻转窗口10s ,1s,1分钟进行统计, 但是出现了每隔3个 ,状态就会增大。最后状态会越来越大,导致checkpoint失败,任务重启, 刚开始以为是反压。最后我使用insert into discardSink ,也是出现同样的问题 sql: CREATE VIEW

Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大

2020-08-30 Thread zhuyuping
我也出现了这个问题, 我使用的是窗口函数进行group by 发现state 不会清空,还是10m 到后面 几G 缓慢增长,大概每3个checkpoint 增长

Re: flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread Rui Li
Hi, 这个场景目前还是不支持的。定义watermark需要在DDL里做,hive表本身没有这个概念,所以DDL里定义不了。以后也许可以通过额外的参数来指定watermark。 On Sun, Aug 30, 2020 at 10:16 PM me wrote: > 如果是直接连接的hive catalog呢,是hive中已存在的表,直接去流式的连接读取? > 您那有什么可解决的想法吗? > > > 原始邮件 > 发件人: Zou Dan > 收件人: user-zh > 发送时间: 2020年8月30日(周日) 21:55 > 主题: Re: flink1.11

Re: How to use Flink IDE

2020-08-30 Thread Piper Piper
Thank you, Narasimha and Arvid! On Sun, Aug 30, 2020 at 3:09 PM Arvid Heise wrote: > Hi Piper, > > to step into Flink source code, you don't need to import Flink sources > manually or build Flink at all. It's enough to tell IntelliJ to also > download sources for Maven dependencies. [1] > >

Re: How to use Flink IDE

2020-08-30 Thread Arvid Heise
Hi Piper, to step into Flink source code, you don't need to import Flink sources manually or build Flink at all. It's enough to tell IntelliJ to also download sources for Maven dependencies. [1] Flink automatically uploads the source code for each build. For example, see the 1.11.1 artifacts of

Implementation of setBufferTimeout(timeoutMillis)

2020-08-30 Thread Pankaj Chand
Hello, The documentation gives the following two sample lines for setting the buffer timeout for the streaming environment or transformation. *env.setBufferTimeout(timeoutMillis);env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis);* I have been trying to find where

Re: How to use Flink IDE

2020-08-30 Thread Ardhani Narasimha Swamy
Hi Piper, Welcome to Flink Community. Import flink project like any other project into IDE, only difference while running is you have click on "Include dependencies with "Provided" scope" in the main class run configurations. This bundles the Flink dependencies in the artifact, making it a fat

How to use Flink IDE

2020-08-30 Thread Piper Piper
Hi, Till now, I have only been using Flink binaries. How do I setup Flink in my IntelliJ IDE so that while running/debugging my Flink application program I can also step into the Flink source code? Do I first need to import Flink's source repository into my IDE and build it? Thanks, Piper

Re: Resource leak in DataSourceNode?

2020-08-30 Thread Robert Metzger
Hi Mark, from the discussion in the JIRA ticket, it seems that we've found somebody in the community who's going to fix this. I don't think calling close() is necessary in the DataSourceNode. The problem is that the connection should not be established in configure() but in open(). Thanks again

Re: Flink OnCheckpointRollingPolicy streamingfilesink

2020-08-30 Thread Vijayendra Yadav
Thank You Andrey. Regards, Vijay > On Aug 29, 2020, at 3:38 AM, Andrey Zagrebin wrote: > >  > Hi Vijay, > > I would apply the same judgement. It is latency vs throughput vs spent > resources vs practical need. > > The more concurrent checkpoints your system is capable of handling, the >

Re: Resource leak in DataSourceNode?

2020-08-30 Thread Mark Davis
Hi Robert, Thank you for confirming that there is an issue. I do not have a solution for it and would like to hear the committer insights what is wrong there. I think there are actually two issues - the first one is the HBase InputFormat does not close a connection in close(). Another is

Re: flink-sql-gateway还会更新吗

2020-08-30 Thread godfrey he
已更新至flink1.11.1 godfrey he 于2020年8月24日周一 下午9:45写道: > 我们会在这周让flink-sql-gateway支持1.11,请关注 > 另外,sql-client支持gateway模式,据我所知目前还没计划。 > > shougou <80562...@qq.com> 于2020年8月24日周一 上午9:48写道: > >> 也有同样的问题,同时也问一下,sql client 计划在哪个版本支持gateway模式?多谢 >> >> >> >> -- >> Sent from:

Re: flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread me
如果是直接连接的hive catalog呢,是hive中已存在的表,直接去流式的连接读取? 您那有什么可解决的想法吗? 原始邮件 发件人: Zou Dan 收件人: user-zh 发送时间: 2020年8月30日(周日) 21:55 主题: Re: flink1.11 流式读取hive怎么设置 process_time 和event_time? Event time 是通过 DDL 中 watermark 语句设置的,具体可以参考文档 [1] [1]

Re: flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread Zou Dan
Event time 是通过 DDL 中 watermark 语句设置的,具体可以参考文档 [1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table Best, Dan Zou > 2020年8月30日 下午9:42,me

flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread me
flink1.11 可以使用在使用select语句时,显式的指定是流式读取,流式的读出出来之后如果想使用实时计算中的特性窗口函数然后指定时间语义 事件时间和处理时间,但是flink sql需要显示的定义数据中的时间字段才能识别为 event_time,求问这个怎么去设置。

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey again David! I tried your proposed change of setting the paralilism higher. This worked, but why does this fix the behavior? I don't understand why this would fix it. The only thing that happens to the query plan is that a "remapping" node is added. Thanks for the fix, and for any

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey David! I tried what you said, but it did not solve the problem. The job still has to wait until the very end before outputting anything. I mentioned in my original email that I had set the parallelism to 1 job wide, but when I reran the task, I added your line. Are there any

Re?? ??????????????????UV??????????????MapState??BloomFilter,??checkpoint????????????????????

2020-08-30 Thread Yichao Yang
Hi, ??Longuv?? RoaringBitMap[1] [1]https://mp.weixin.qq.com/s/jV0XmFxXFnzbg7kcKiiDbA Best, Yichao Yang ---- ??:

Re:启动任务异常, Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1

2020-08-30 Thread RS
测试出来了, rowtime参数需要是最后一个参数, $(timeField).rowtime() 但是这个报错也太隐晦了吧 . 在 2020-08-30 14:54:15,"RS" 写道: >Hi, 请教下 > > >启动任务的时候抛异常了, 但是没看懂报错原因, 麻烦各位大佬帮看下 >这里我是先创建了一个DataStreamSource, 然后配置转为view, 配置EventTime, 后面再用SQL DDL进行数据处理 >DataStreamSource source = env.addSource(consumer); >

启动任务异常, Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1

2020-08-30 Thread RS
Hi, 请教下 启动任务的时候抛异常了, 但是没看懂报错原因, 麻烦各位大佬帮看下 这里我是先创建了一个DataStreamSource, 然后配置转为view, 配置EventTime, 后面再用SQL DDL进行数据处理 DataStreamSource source = env.addSource(consumer); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); tableEnv.createTemporaryView(table_name,