源表三个字段  name, color, ts
按时间窗口聚合后想根据name group by取colors数组


create table source_table (
   name  STRING,
   color  STRING,
   ts TIMESTAMP,
   WATERMARK ts for ts
) 


create table sink_table (
  name  STRING,
   colors  ARRAY<STRING>
)


1. 请问这个select语句要怎么写?
select name, collect(color) as colors from source_table group by tumble(ts, 
interval '5' seconds)
这里collect(color)返回的是multiset类型,怎样转成Array类型呢?


2. 如果array元素很多,我只想取其中N个,该怎么写flink sql? 

3, 若取出现次数最多的前N个,又该怎么写flink sql?
select name, collect(color) as colors from (
select name, color from (
  select *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY color_cnt desc) AS 
row_num from (
select name, color, count(*) as color_cnt group by name, color, tumble(ts, 
interval '5' seconds)
  ) 
) where row_num < 5
);
是这样写么?

回复