blink planner支持将多sink的query优化成尽量复用重复计算部分。 1.11里用StatementSet来提交job,1.11之前用sqlUpdate/insertInto + execute提交任务
kandy.wang <kandy1...@163.com> 于2020年8月4日周二 下午5:20写道: > FLINK SQL view相关问题: > create view order_source > > as > > select order_id, order_goods_id, user_id,... > > from ( > > ...... proctime,row_number() over(partition by order_id, > order_goods_id order by proctime desc) as rownum > > from hive.temp_dw.dm_trd_order_goods/*+ > OPTIONS('properties.group.id'='flink_etl_kafka_hbase', > 'scan.startup.mode'='latest-offset') */ > > ) where rownum = 1 and price > 0; > > > > > insert into hive.temp_dw.day_order_index select rowkey, ROW(cast(saleN as > BIGINT),) > > from > > ( > > select order_date as rowkey, > > sum(amount) as saleN, > > from order_source > > group by order_date > > ); > > > > > insert into hive.temp_dw.day_order_index select rowkey, ROW(cast(saleN as > BIGINT)) > > from > > ( > > select order_hour as rowkey, sum(amount) as saleN, > > > > from order_source > > group by order_hour > > ); > 问题:同一个view,相同的消费group,不同的sink,产生 2个job。 这样的话,相当于2个job公用一个consumer group。 > 最后生成的job是 : a. order_source -> sink 1 b. order_source -> sink > 2 > > > 本意是想通过view order_source > (view里需要对订单数据去重)复用同一份source全量数据,对应底层可以复用同一份state数据 ,如何做到 ? > >