hi,你好
我想基于现有的flinksql的join实现这种情况,当维表更新慢的时候,事实数据会放在状态中等待。


| |
Jason_H
|
|
hyb_he...@163.com
|
---- Replied Message ----
| From | RS<tinyshr...@163.com> |
| Date | 11/15/2022 09:07 |
| To | user-zh@flink.apache.org<user-zh@flink.apache.org> |
| Subject | Re:flinksql join |
Hi,
我的理解是后插入的维表数据,关联不到是正常现象,
如果要实现AAAA=3的话,应该要手动重新跑历史数据,然后更新现有数据,


Thanks






在 2022-11-11 11:10:03,"Jason_H" <hyb_he...@163.com> 写道:


hi,大家好
我正在使用flink的sql实现一个维表join的逻辑,数据源为kafka(交易数据),维表为mysql(账号),现在遇到一个问题:当kafka有数据进来时,没有在维表中找到账号,这时我手动插入该账号,在下一条数据进来时可以匹配上对应的账号信息,但是,输出的累计结果就会缺失没有匹配上的那条数据,举例如下:
kakfa输入:
账号 金额 笔数
1111 100 1  -> 未匹配
1111 100 1  -> 未匹配
1111 100 1  -> 匹配上

维表
账号  企业
2222  BBBB
1111  AAAA   -> 后插入的账号信息
实际输出结果
企业  金额  笔数
AAAA 100   1


我想要的结果:
企业  金额  笔数
AAAA 300   3





sql如下:
String sql2 =  "insert into dws_b2b_trade_year_index\n" +
"WITH temp AS (\n" +
"select \n" +
"  ta.gmtStatistical as gmtStatistical,\n" +
"  ta.paymentMethod as paymentMethod,\n" +
"  tb.CORP_ID as outCorpId,\n" +
"  tc.CORP_ID as inCorpId,\n" +
"  sum(ta.tradeAmt) as tranAmount,\n" +
"  sum(ta.tradeCnt) as tranNum \n" +
"from dws_a2a_trade_year_index ta \n" +
"left join dob_dim_account for system_time as of ta.proc as tb on 
ta.outAcctCode = tb.ACCT_CODE \n" +
"left join dob_dim_account for system_time as of ta.proc as tc on ta.inAcctCode 
= tc.ACCT_CODE \n" +
"group by \n" +
" ta.gmtStatistical, \n" +
" ta.paymentMethod, \n" +
" tb.CORP_ID, \n" +
" tc.CORP_ID \n" +
") \n" +
"SELECT \n" +
"   DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd HH:mm:ss') as gmtUpdate, \n" +
"   gmtStatistical, \n" +
"   paymentMethod, \n" +
"   outCorpId, \n" +
"   inCorpId, \n" +
"   tranAmount, \n" +
"   tranNum \n" +
"FROM temp";

| |
Jason_H
|
|
hyb_he...@163.com
|

回复