看起来像是因为 "dt = cast(CURRENT_DATE  as string)" 推导 dt 这个字段是个常量,进而被优化掉了。

将 CURRENT_DATE 优化为常量的行为应该只在 batch 模式下才是这样的,你这个 SQL 是跑在 batch 模式下的嘛?

℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年5月19日周日 01:01写道:
>
> create view tmp_view as
> SELECT
>     dt, -- 2
>     uid, -- 0
>     uname, -- 1
>     uage -- 3
> from
>     kafkaTable
> where dt = cast(CURRENT_DATE  as string);
>
> insert into printSinkTable
> select
>     dt, uid, uname, sum(uage)
> from tmp_view
> group by
>     dt,
>     uid,
>     uname;
>
>
>
> sql 比较简单,首先根据 dt = current_date 条件进行过滤,然后 按照dt、uid、uname 三个字段进行聚合求和操作。
> 但是,经过优化后,生成的 物理结构如下:
> == Optimized Execution Plan ==
> Sink(table=[default_catalog.default_database.printSinkTable], fields=[dt, 
> uid, uname, EXPR$3])
> +- Calc(select=[CAST(CAST(CURRENT_DATE())) AS dt, uid, uname, EXPR$3])
> &nbsp; &nbsp;+- GroupAggregate(groupBy=[uid, uname], select=[uid, uname, 
> SUM(uage) AS EXPR$3])
> &nbsp; &nbsp; &nbsp; +- Exchange(distribution=[hash[uid, uname]])
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;+- Calc(select=[uid, uname, uage], 
> where=[(dt = CAST(CURRENT_DATE()))])
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; +- 
> TableSourceScan(table=[[default_catalog, default_database, kafkaTable]], 
> fields=[uid, uname, dt, uage])
>
>
>
> 请问,这个时候,怎么实现按照 dt\uid\uname 三个字段聚合求和。感谢



-- 

Best,
Benchao Li

回复