Hi Fabian, So maybe I can not join a table that generate from a window, because the table is getting larger and larger as the time goes, maybe the system will crash one day.
I am working on a system that calculate the “score" of article, which is consist of the count of article praise, the count of article response, etc Because I can not use flink to save all the article, I decide to update the score of the article that created in 3 days. I have two choises, 1. join the article table and praise table, response table then window select a.article_id, count(p.praise_id) as pCount, count(r.response_id) as rCount from article a left join praise p on a.article_id = p.article_id left join response r on a.article_id = r.article_id group by hop(updated_time, interval '1' minute,interval '3' day) , article_id 2. window the article table, window the priase table, window the response table ,then join them together select aAggr.article_id, pAggr.pCount, rAggr.rCount (select article_id from article group by hop(updated_time, interval '1' minute,interval '3' day) , article_id) aAggr left join (select article_id,count(praise_id) as pCount from praise group by hop(updated_time, interval '1' minute,interval '3' day) , article_id) pAggr on aAggr.article_id=pAggr.article_id left join (select article_id,count(response_id) as rCount from response group by hop(updated_time, interval '1' minute,interval '3' day) , article_id) rAggr on aAggr.article_id=rAggr.article_id Maybe I should choose 1, join then window, but not window then join. Please correct me if I am wrong. I have some worries when choose 1, I do not know how Flink works internally, it seems that in the sql , table article ,table praise, table response is growing as the time goes by, will it introduce performance issue? Best, Henry > 在 2018年8月21日,下午9:29,Hequn Cheng <chenghe...@gmail.com> 写道: > > Hi Henry, > > praiseAggr is an append table, so it contains "100,101,102,100,101,103,100". > 1. if you change your sql to s"SELECT article_id FROM praise GROUP BY > article_id", the answer is "101,102,103" > 2. if you change your sql to s"SELECT last_value(article_id) FROM praise", > the answer is "100" > > Best, Hequn > > On Tue, Aug 21, 2018 at 8:52 PM, 徐涛 <happydexu...@gmail.com > <mailto:happydexu...@gmail.com>> wrote: > Hi Fabian, > Thanks for your response. This question puzzles me for quite a long > time. > If the praiseAggr has the following value: > window-1 100,101,102 > window-2 100,101,103 > window-3 100 > > the last time the article table joins praiseAggr, which of the > following value does praiseAggr table has? > 1— 100,101,102,100,101,103,100 collect all the element > of all the window > 2— 100 the element > of the latest window > 3— 101,102,103 the distinct value > of all the window > > > Best, > Henry > > >> 在 2018年8月21日,下午8:02,Fabian Hueske <fhue...@gmail.com >> <mailto:fhue...@gmail.com>> 写道: >> >> Hi, >> >> The semantics of a query do not depend on the way that it is used. >> praiseAggr is a table that grows by one row per second and article_id. If >> you use that table in a join, the join will fully materialize the table. >> This is a special case because the same row is added multiple times, so the >> state won't grow that quickly, but the performance will decrease because for >> each row from article will join with multiple (a growing number) of rows >> from praiseAggr. >> >> Best, Fabian >> >> 2018-08-21 12:19 GMT+02:00 徐涛 <happydexu...@gmail.com >> <mailto:happydexu...@gmail.com>>: >> Hi All, >> var praiseAggr = tableEnv.sqlQuery(s"SELECT article_id FROM praise >> GROUP BY HOP(updated_time, INTERVAL '1' SECOND,INTERVAL '3' MINUTE) , >> article_id" ) >> tableEnv.registerTable("praiseAggr", praiseAggr) >> var finalTable = tableEnv.sqlQuery(s”SELECT 1 FROM article a join >> praiseAggr p on a.article_id=p.article_id" ) >> tableEnv.registerTable("finalTable", finalTable) >> I know that praiseAggr, if written to sink, is append mode , so if a >> table joins praiseAggr, what the table “see”, is a table contains the latest >> value, or a table that grows larger and larger? If it is the later, will it >> introduce performance problem? >> Thanks a lot. >> >> >> Best, >> Henry >> > >