Re: 关于窗口函数不闭合的问题

2020-04-01 Thread Jark Wu
Hi Fei, 可以带上你的 SQL 代码吗?你用的版本和planner 是哪个? 以及具体描述下”当前数据量无法统计,只能到下次才能统计到“,这个现象?不是很理解。 Best, Jark On Wed, 1 Apr 2020 at 18:00, Fei Han wrote: > Hi,大家好: > 在做窗口统计的时候,用count over和sum over出现当前数据量无法统计,只能到下次才能统计到。 > 是参数写错了,还是另有其他函数,数据过来应该类似闭区间,现在是开区间的。请大家给个建议,谢谢啦?

Perform processing only when watermark updates, buffer data otherwise

2020-04-01 Thread Manas Kale
Hi, I want to perform some processing on events only when the watermark is updated. Otherwise, for all other events, I want to keep buffering them till the watermark arrives. The main motivation behind doing this is that I have several operators that emit events/messages to a downstream operator.

Re: Flink 1.10.0 HiveModule 函数问题

2020-04-01 Thread Jingsong Li
Hi, GenericUDTFExplode是一个UDTF。 Flink中使用UDTF的方式是标准SQL的方式: "select x from db1.nested, lateral table(explode(a)) as T(x)" 你试下。 [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/functions/udfs.html#table-functions Best, Jingsong Lee On Thu, Apr 2, 2020 at 11:22 AM Yaoting Gong

Flink 1.10.0 HiveModule 函数问题

2020-04-01 Thread Yaoting Gong
大家后, 我们项目目前 集成了HiveModule,遇到一些问题。请教下大家。 在集成 Hive Module 之前,substr,split 都是无法使用的。集成后,验证都是可以的。 比如:select split('1,2,2,4',',') 但是在使用 explode 函数,select explode(split('1,2,2,4',',')); 有如下错误: The main method caused an error: SQL validation failed. From line 1, column 8 to line 1, column 36: *No

回复: Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread 111
Hi, 了解了,那我知道怎么解决了。我这边使用的是sql-gateway,看样子得在sql-gateway里面加一种表定义的语法了。 多谢多谢 Best, Xinghalo 在2020年04月2日 10:52,Benchao Li 写道: 你写的不是维表join的语法,维表join现在用的是temporal table[1] 的方式来实现的,需要特殊的join语法: SELECT o.amout, o.currency, r.rate, o.amount * r.rateFROM Orders AS o JOIN LatestRates FOR SYSTEM_TIME AS OF

Re: End to End Latency Tracking in flink

2020-04-01 Thread zoudan
Hi, I think we may add latency metric for each operator, which can reflect consumption ability of each operator. Best, Dan Zou > 在 2020年3月30日,18:19,Guanghui Zhang 写道: > > Hi. > At flink source connector, you can send $source_current_time - $event_time > metric. > In the meantime, at flink

Re: Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread Benchao Li
你写的不是维表join的语法,维表join现在用的是temporal table[1] 的方式来实现的,需要特殊的join语法: SELECT o.amout, o.currency, r.rate, o.amount * r.rateFROM Orders AS o JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r ON r.currency = o.currency 此外,你看的JDBCTableSource是一个普通的bounded source,或者是batch

Re: Flink实时写入hive异常

2020-04-01 Thread Jingsong Li
是的,有关的,这个umbrella issue就是FLIP-115. Best, Jingsong Lee On Wed, Apr 1, 2020 at 10:57 PM 叶贤勋 wrote: > Hi jingsong, > 我看这个issue[1] 你提了关于支持hive streaming sink的两个pr,这部分代码是否跟Flip-115相关? > > > [1] https://issues.apache.org/jira/browse/FLINK-14255 > > > | | > 叶贤勋 > | > | > yxx_c...@163.com > | >

回复: Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread 111
Hi, 试验了下貌似不行,我的sql: select s.*, item.product_name from ( select member_id, uuid, page, cast(SPLIT_INDEX(page, '10.pd.item-', 1) as string) as item_id, `time` from tgs_topic_t1 where page like '10.pd.item-%’ ) s inner join PROD_ITEM_MALL_ACTIVITY_PRODUCT item on cast(item.id as string) =

Re: flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread deadwind4
registerTableSource 被标记了@Deprecated 在flink 1.10,我这种情况是继续沿用过期的API(registerTableSource)吗? 原始邮件 发件人: deadwind4 收件人: user-zh 发送时间: 2020年4月2日(周四) 10:30 主题: Re: flink 1.10 createTemporaryTable丢失proctime问题 修改前 tEnv.connect().withFormat().withSchema( xxx.proctime() ).registerTableSource(“foo”);

Re: flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread deadwind4
修改前 tEnv.connect().withFormat().withSchema( xxx.proctime() ).registerTableSource(“foo”); 修改后 tEnv.connect().withFormat().withSchema( xxx.proctime() ).createTemporaryTable(“foo”); 修改后.proctime()就失效了,所以我proctime window也用不了了。 原始邮件 发件人: deadwind4 收件人: user-zh 发送时间: 2020年4月2日(周四) 10:22 主题:

Re: flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread deadwind4
tEnv.connect().withFormat().withSchema().registerTableSource(“foo”); tEnv.connect().withFormat().withSchema().createTemporaryTable(“foo”); 原始邮件 发件人: Jark Wu 收件人: user-zh 发送时间: 2020年4月2日(周四) 10:18 主题: Re: flink 1.10 createTemporaryTable丢失proctime问题 Hi, 你能描述下你的改动前后的代码吗?据我所知 TableEnvironment

回复: Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread 111
Hi benchao, 原来如此,我这边只是做了普通查询,并没有走join。 我加上join条件再试下哈 Best, Xinghalo 在2020年04月2日 10:11,Benchao Li 写道: Hi, 能否把你的SQL也发出来呢? 正常来讲,维表关联用的是join的等值条件作为关联的条件去mysql查询,然后如果还有其他的filter,会在关联之后的结果之上在做filter。如果你发现每次都是扫描全表,很有可能是你的维表join的条件写的有问题导致的。 111 于2020年4月2日周四 上午9:55写道: Hi, 想确认下MySQL JDBC

Re: flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread Jark Wu
Hi, 你能描述下你的改动前后的代码吗?据我所知 TableEnvironment 上没有 createTemporaryTable 方法,只有createTemporaryView方法,而且 registerTableSource 和 createTemporaryView 的参数是不一样的。 Best, Jark > 2020年4月1日 23:13,deadwind4 写道: > > 我其实是想用processing time window 但是我把过期的API >

Re: Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread Benchao Li
Hi, 能否把你的SQL也发出来呢? 正常来讲,维表关联用的是join的等值条件作为关联的条件去mysql查询,然后如果还有其他的filter,会在关联之后的结果之上在做filter。如果你发现每次都是扫描全表,很有可能是你的维表join的条件写的有问题导致的。 111 于2020年4月2日周四 上午9:55写道: > Hi, > 想确认下MySQL JDBC Connector的实现细节,目前了解到的实现是: > 1 获取查询sql中的字段和表名,拼接成select a, b, c from t > 2

Flink SQL1.10使用Mysql作为维表关联,数据量太大导致task manager time out

2020-04-01 Thread 111
Hi, 想确认下MySQL JDBC Connector的实现细节,目前了解到的实现是: 1 获取查询sql中的字段和表名,拼接成select a, b, c from t 2 创建并行输入,如果制定了分区字段,会按照分区字段和并行度切分生成where条件,如where id between xxx and xxx 3 执行sql,加载到内存(不确定后面缓存的实现细节) 目前我们遇到的问题是,我们把mysql作为维表(表的量级在一千500万左右,并行度为10),没有指定分区条件(只有一个slot执行查询,其他的都没有查询任务)。 结果导致只有一个分区查询数据,查询的sql为select

Re: Savepoint Location from Flink REST API

2020-04-01 Thread Aaron Langford
All, it looks like the actual return structure from the API is: 1. Success > { > "status": { > "id": "completed" > }, > *"operation"*: { > "location": "string" > } > } 2. Failure > { > "status": { > "id": "completed" > }, > *"operation"*: { > "failure-cause": { >

Re: Log file environment variable 'log.file' is not set.

2020-04-01 Thread Vitaliy Semochkin
Hello Robert, Thank you for the help, *tried to access method org.apache.hadoop.yarn.client.**ConfiguredRMFailoverProxyProvi* was the root of the issue. I managed to fix issue with the "hadoop version is 2.4.1", it was because previous flink version I was using referred to

Re: Latency tracking together with broadcast state can cause job failure

2020-04-01 Thread Lasse Nedergaard
HiI have attached a simple project with a test that reproduce the problem. The normal fault is a mixed string but you can also EOF exception. Please let me know if you have any questions to the solution. Med venlig hilsen / Best regardsLasse Nedergaard

Re: flink-shaded-hadoop2 for flink 1.10

2020-04-01 Thread Vitaliy Semochkin
Thank you very much Chesnary! On Tue, Mar 31, 2020 at 1:32 AM Chesnay Schepler wrote: > flink-shaded-hadoop2 was released as part of Flink until 1.8 (hence why it > followed the Flink version scheme), after which it was renamed to > flink-shaded-hadoop-2 and is now being released separately

回复: Flink实时写入hive异常

2020-04-01 Thread 叶贤勋
Hi jingsong, 我看这个issue[1] 你提了关于支持hive streaming sink的两个pr,这部分代码是否跟Flip-115相关? [1] https://issues.apache.org/jira/browse/FLINK-14255 | | 叶贤勋 | | yxx_c...@163.com | 签名由网易邮箱大师定制 在2020年04月1日 16:28,111 写道: Hi jingsong, 那厉害了,相当于Flink内部做了一个数据湖的插件了。 Best, Xinghalo

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
Hi, Sorry I missed that. But yes, it looks like you are running two JobManagers :) You can always check the yarn logs for more information what is being executed. Piotrek > On 1 Apr 2020, at 16:44, Antonio Martínez Carratalá > wrote: > > Hi Piotr, > > I don't have 2 task managers, just one

Re: Flink in EMR configuration problem

2020-04-01 Thread Antonio Martínez Carratalá
Hi Piotr, I don't have 2 task managers, just one with 2 slots. That would be ok according to my calculations, but as Craig said I need one more instance for the cluster master. I was guessing the job manager was running in the master and the task manager in the slave, but both job manager and

Re: [Third-party Tool] Flink memory calculator

2020-04-01 Thread Yangze Guo
@Marta Thanks for the tip! I'll do that. Best, Yangze Guo On Wed, Apr 1, 2020 at 8:05 PM Marta Paes Moreira wrote: > > Hey, Yangze. > > I'd like to suggest that you submit this tool to Flink Community Pages [1]. > That way it can get more exposure and it'll be easier for users to find it. > >

Re: [Third-party Tool] Flink memory calculator

2020-04-01 Thread Yangze Guo
@Marta Thanks for the tip! I'll do that. Best, Yangze Guo On Wed, Apr 1, 2020 at 8:05 PM Marta Paes Moreira wrote: > > Hey, Yangze. > > I'd like to suggest that you submit this tool to Flink Community Pages [1]. > That way it can get more exposure and it'll be easier for users to find it. > >

Re: Flink SQL GROUP BY后写入postgresql数据库主键问题

2020-04-01 Thread Jark Wu
Hi Longfei, 非常抱歉当前确实支持不了,不过这个问题将在 FLIP-95 提供的新 TableSink 接口后解决,有望在 1.11 中解决。 Best, Jark On Wed, 1 Apr 2020 at 19:12, Longfei Zhou wrote: > 问题: > SQL中对时间窗口和PRODUCT_ID进行了Group > By聚合操作,PG数据表中的主键须设置为WINDOW_START/WINDOW_END和PRODUCT_ID,否则无法以upinsert方式写出数据,但是这样却无法满足业务场景的需求,业务上应以RANK_ID >

Re: flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread Jark Wu
Hi, proctime 的含义是机器时间,不等价于 now()或者 current_timestamp() 函数,该字段只有在真正使用的才会物化(即去取 System.currentTimeMillis)。 能请描述下你想用 createTemporaryTable 做什么呢?当前哪里不能满足你的需求呢? Best, Jark On Wed, 1 Apr 2020 at 18:56, deadwind4 wrote: > >

Re: [Third-party Tool] Flink memory calculator

2020-04-01 Thread Marta Paes Moreira
Hey, Yangze. I'd like to suggest that you submit this tool to Flink Community Pages [1]. That way it can get more exposure and it'll be easier for users to find it. Thanks for your contribution! [1] https://flink-packages.org/ On Tue, Mar 31, 2020 at 9:09 AM Yangze Guo wrote: > Hi, there. >

Re: State & Generics

2020-04-01 Thread Aljoscha Krettek
Hi Laurent! On 31.03.20 10:43, Laurent Exsteens wrote: Yesterday I managed to find another solution: create the type information outside of the class and pass it to the constructor. I can retrieve the type information from DataStream.getType() (whiich. This works well, and is acceptable in my

Re: [DISCUSS] Change default planner to blink planner in 1.11

2020-04-01 Thread Aljoscha Krettek
+1 to making Blink the default planner, we definitely don't want to maintain two planners for much longer. Best, Aljoscha

Re: some subtask taking too long

2020-04-01 Thread Piotr Nowojski
Hey, The thread you are referring to is about DataStream API job and long checkpointing issue. While from your message it seems like you are using Table API (SQL) to process a batch data? Or what exactly do you mean by: > i notice that there are one or two subtasks that take too long to

Re: Flink Kafka Consumer Throughput reduces over time

2020-04-01 Thread Piotr Nowojski
Hi, I haven’t heard about Flink specific problem like that. Have you checked that the records are not changing over time? That they are not for example twice as large or twice as heavy to process? Especially that you are using rate limiter with 12MB/s. If your records grew to 60KB in size,

Re: Flink in EMR configuration problem

2020-04-01 Thread Piotr Nowojski
Hey, Isn’t explanation of the problem in the logs that you posted? Not enough memory? You have 2 EMR nodes, 8GB memory each, while trying to allocate 2 TaskManagers AND 1 JobManager with 6GB heap size each? Piotrek > On 31 Mar 2020, at 17:01, Antonio Martínez Carratalá > wrote: > > Hi, I'm

Flink SQL GROUP BY??????postgresql??????????????

2020-04-01 Thread Longfei Zhou
?? SQL??PRODUCT_ID??Group By??PG??WINDOW_START/WINDOW_END??PRODUCT_IDupinsert??RANK_ID +WINDOW_START/WINDOW_ENDFlink ?? ??Top3

flink 1.10 createTemporaryTable丢失proctime问题

2020-04-01 Thread deadwind4
我使用1.10版本的createTemporaryTable发现proctime字段全是null但是换成过时的registerTableSource就可以。 如果我想使用createTemporaryTable该怎么办。 并且我debug了createTemporaryTable的源码没有发现对proctime的处理。

关于窗口函数不闭合的问题

2020-04-01 Thread Fei Han
Hi,大家好: 在做窗口统计的时候,用count over和sum over出现当前数据量无法统计,只能到下次才能统计到。 是参数写错了,还是另有其他函数,数据过来应该类似闭区间,现在是开区间的。请大家给个建议,谢谢啦?

Re:Re: rocksdb作为状态后端任务重启时,恢复作业失败Could not restore keyed state backend for KeyedProcessOperator

2020-04-01 Thread chenxyz
Hi, 从贤, 我查看了下HDFS, /data/flink1_10/tmp/flink-io-01229972-48d4-4229-ac8c-33f0edfe5b7c/job_5ec178dc885a8f1a64c1925e182562e3_op_KeyedProcessOperator_da2b90ef97e5c844980791c8fe08b926__1_2__uuid_772b4663-f633-4ed5-a67a-d1904760a160下面是空的,也没有db这一层目录。 在 2020-04-01 16:50:13,"Congxian Qiu" 写道: >Hi

Re: rocksdb作为状态后端任务重启时,恢复作业失败Could not restore keyed state backend for KeyedProcessOperator

2020-04-01 Thread Congxian Qiu
Hi Restore 可以大致分为两部分,1)下载文件;2)从下载的文件恢复 从 TM 日志看像下载出错了,你可以看下 /data/flink1_10/tmp/flink-io-01229972-48d4-4229-ac8c-33f0edfe5b7c/job_5ec178dc885a8f1a64c1925e182562e3_op_KeyedProcessOperator_da2b90ef97e5c844980791c8fe08b926__1_2__uuid_772b4663-f633-4ed5-a67a-d1904760a160/db/001888.sst 这个文件是不是存在 double

回复: Flink实时写入hive异常

2020-04-01 Thread 111
Hi jingsong, 那厉害了,相当于Flink内部做了一个数据湖的插件了。 Best, Xinghalo

Re: Flink实时写入hive异常

2020-04-01 Thread Jingsong Li
Hi 111, 虽然数据湖可以扩展一些事情,但是流写Hive也是Hive数仓重要的一环。 文件数的问题: - 取决于checkpoint间隔,如果checkpoint间隔内,能写到128MB的文件,对HDFS来说就是很合适的文件大小了。 - 流写,也可以引入files compact等功能,FLIP-115里面也有讨论。 Best, Jingsong Lee On Wed, Apr 1, 2020 at 4:06 PM 111 wrote: > > > Hi, > 流写入hive,其实是属于数据湖的概念范畴。 >

回复: Flink实时写入hive异常

2020-04-01 Thread 111
Hi, 流写入hive,其实是属于数据湖的概念范畴。 因为流往hive里面写,会造成很多的碎片文件,对hdfs造成性能影响,因此一般不会在流场景下直接写入hive。 详细的可以了解 Delta lake 或 hudi。 在2020年04月1日 15:05,sunfulin 写道: Hi, 场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。 我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。

Re: flink 1.10 support LONG as watermark?

2020-04-01 Thread Jingsong Li
Hi jingjing, If seconds precision is OK for you. You can try "to_timestamp(from_unixtime(your_time_seconds_long))". Best, Jingsong Lee On Wed, Apr 1, 2020 at 8:56 AM jingjing bai wrote: > Thanks a lot! > > Jark Wu 于2020年4月1日周三 上午1:13写道: > >> Hi Jing, >> >> I created

Re: Latency tracking together with broadcast state can cause job failure

2020-04-01 Thread Yun Tang
Hi Lasse Never meet this problem before, but can you share some exception stack trace so that we could take a look. The simple project to reproduce is also a good choice. Best Yun Tang From: Lasse Nedergaard Sent: Tuesday, March 31, 2020 19:10 To: user

Re: Re: Re: Flink实时写入hive异常

2020-04-01 Thread Jingsong Li
不幸的是,FlinkSQL的确一直不支持。。 是的,这是1.11的重要目标之一。 Best, Jingsong Lee On Wed, Apr 1, 2020 at 3:05 PM sunfulin wrote: > Hi, > 场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。 > 我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。 > OK吧。想问下FLIP-115计划是在哪个release版本支持哈?1.11么? > > > > > > > 在

Re:Re: Re: Flink实时写入hive异常

2020-04-01 Thread sunfulin
Hi, 场景其实很简单,就是通过Flink实时将kafka数据做个同步到hive。hive里创建了分区表。 我感觉这个场景很常见吧。之前以为是支持的,毕竟可以在通过hivecatalog创建kafka table。但是创建了不能写,有点不合理。 OK吧。想问下FLIP-115计划是在哪个release版本支持哈?1.11么? 在 2020-04-01 15:01:32,"Jingsong Li" 写道: Hi, Batch模式来支持Kafka -> Hive,也是不推荐的哦,FLIP-115后才可以在streaming模式支持这类场景。

rocksdb作为状态后端任务重启时,恢复作业失败Could not restore keyed state backend for KeyedProcessOperator

2020-04-01 Thread chenxyz
任务启用rocksdb作为状态后端,任务出现异常重启时经常失败Could not restore keyed state backend for KeyedProcessOperator。这个问题怎么解决呢? 版本:1.10 standalone 配置信息: state.backend: rocksdb state.checkpoints.dir: hdfs://nameservice1/data/flink1_10/checkpoint state.backend.incremental: true

Re: Re: Flink实时写入hive异常

2020-04-01 Thread Jingsong Li
Hi, Batch模式来支持Kafka -> Hive,也是不推荐的哦,FLIP-115后才可以在streaming模式支持这类场景。 你可以描述下详细堆栈、应用场景、SQL吗? Best, Jingsong Lee On Wed, Apr 1, 2020 at 2:56 PM sunfulin wrote: > > 我使用batch mode时,又抛出了如下异常:感觉一步一个坑。。sigh > > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not > enough rules to

Re:Re: Flink实时写入hive异常

2020-04-01 Thread sunfulin
我使用batch mode时,又抛出了如下异常:感觉一步一个坑。。sigh org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties 在 2020-04-01 14:49:41,"Jingsong Li" 写道: >Hi, > >异常的意思是现在hive sink还不支持streaming模式,只能用于batch模式中。功能正在开发中[1] >

Re: [DISCUSS] Change default planner to blink planner in 1.11

2020-04-01 Thread Jingsong Li
+1 In 1.10, we have set default planner for SQL Client to Blink planner[1]. Looks good. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Set-default-planner-for-SQL-Client-to-Blink-planner-in-1-10-release-td36379.html Best, Jingsong Lee On Wed, Apr 1, 2020 at 11:39 AM

Re: Flink实时写入hive异常

2020-04-01 Thread Jingsong Li
Hi, 异常的意思是现在hive sink还不支持streaming模式,只能用于batch模式中。功能正在开发中[1] [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table Best, Jingsong Lee On Wed, Apr 1, 2020 at 2:32 PM sunfulin wrote: > Hi, > 我这边在使用Flink消费Kafka数据写入hive。配置连接都OK,但是在实际执行insert into >

Flink实时写入hive异常

2020-04-01 Thread sunfulin
Hi, 我这边在使用Flink消费Kafka数据写入hive。配置连接都OK,但是在实际执行insert into xxx_table时,报了如下异常。这个看不懂啥原因,求大神指教。 cc @Jingsong Li @Jark Wu org.apache.flink.table.api.TableException: Stream Tables can only be emitted by AppendStreamTableSink, RetractStreamTableSink, or UpsertStreamTableSink. at