Hi, casel.

一般情况下,维表侧也都会尽量做下推的。

比如对于:

```

SELECT * FROM MyTable AS T

JOIN LookupTable FOR SYSTEM_TIME AS OF T.proctime AS D

ON T.a = D.id AND D.age = 10

WHERE T.c > 1000

```

会优化成

```

Calc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, 
name, CAST(10 AS INTEGER) AS age])

+- LookupJoin(table=[default_catalog.default_database.LookupTable], 
joinType=[InnerJoin], lookup=[age=10, id=a], where=[(age = 10)], select=[a, b, 
c, proctime, rowtime, id, name])

   +- Calc(select=[a, b, c, proctime, rowtime], where=[(c > 1000)])

      +- DataStreamScan(table=[[default_catalog, default_database, MyTable]], 
fields=[a, b, c, proctime, rowtime])

```

可以看到age = 10也会作为lookup join的key去维表查询。

可以把你的plan贴出来看看吗?




--

    Best!
    Xuyang





在 2024-11-06 19:47:21,"Hongshun Wang" <loserwang1...@gmail.com> 写道:
> 谓词下推取决于connector实现
>
>On Tue, Nov 5, 2024 at 2:48 PM casel.chen <casel_c...@126.com> wrote:
>
>> 场景是使用flink
>> sql流表lookup关联维表(一对多,部分数据会出现一条关联出上千条结果)后只取其中一条,这会导致维表查询压力非常大,像这种有没有办法在flink
>> sql层面进行谓词下推?
>> 从flink UI上看该lookup join算子的输出数据量是输入数据量的几百倍

回复