Re: Accessing columns from input stream table during Window operations

Sumeet Malhotra Mon, 19 Apr 2021 00:25:30 -0700

Thanks Guowei! Regarding "input.c.avg" you're right, that doesn't cause any
issues. It's only when I want to use "input.b".


My use case is to basically emit "input.b" in the final sink as is, and not
really perform any aggregation on that column - more like pass through from
input to sink. What's the best way to achieve this? I was thinking that
making it part of the select() clause would do it, but as you said there
needs to be some aggregation performed on it.

Thanks,
Sumeet


On Mon, Apr 19, 2021 at 12:26 PM Guowei Ma <guowei....@gmail.com> wrote:

> Hi, Sumeet
>       For "input.b" I think you should aggregate the non-group-key
> column[1].
> But I am not sure why the "input.c.avg.alias('avg_value')"  has resolved
> errors. Would you mind giving more detailed error information?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#group-windows
>
> Best,
> Guowei
>
>
> On Mon, Apr 19, 2021 at 2:24 PM Sumeet Malhotra <sumeet.malho...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a use case where I'm creating a Tumbling window as follows:
>>
>> "input" table has columns [Timestamp, a, b, c]
>>
>> input \
>>     .window(Tumble.over(lit(10).seconds).on(input.Timestamp).alias("w")) \
>>     .group_by(col('w'), input.a) \
>>     .select(
>>         col('w').start.alias('window_start'),
>>         col('w').end.alias('window_end'),
>>         input.b,
>>         input.c.avg.alias('avg_value')) \
>>     .execute_insert('MySink') \
>>     .wait()
>>
>> This throws an exception that it cannot resolve the fields "b" and "c"
>> inside the select statement. If I mention these column names inside the
>> group_by() statement as follows:
>>
>> .group_by(col('w'), input.a, input.b, input.c)
>>
>> then the column names in the subsequent select statement can be resolved.
>>
>> Basically, unless the column name is explicitly made part of the
>> group_by() clause, the subsequent select() clause doesn't resolve it. This
>> is very similar to the example from Flink's documentation here [1]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#overview--examples,
>> where a similar procedure works.
>>
>> Any idea how I can access columns from the input stream, without having
>> to mention them in the group_by() clause? I really don't want to group the
>> results by those fields, but they need to be written to the sink eventually.
>>
>> Thanks,
>> Sumeet
>>
>

Re: Accessing columns from input stream table during Window operations

Reply via email to