退订
At 2022-06-13 22:44:24, "cao zou" <zoucao...@gmail.com> wrote: >Hi godfrey, thanks for your detail explanation. >After explaining and glancing over the FLIP-231, I think it is >really need, +1 for this and looking forward to it. > >best >zoucao > >godfrey he <godfre...@gmail.com> 于2022年6月13日周一 14:43写道: > >> Hi Ingo, >> >> The semantics does not distinguish batch and streaming, >> It works for both batch and streaming, but the result of >> unbounded sources is meaningless. >> Currently, I throw exception for streaming mode, >> and we can support streaming mode with bounded source >> in the future. >> >> Best, >> Godfrey >> >> Ingo Bürk <airbla...@apache.org> 于2022年6月13日周一 14:17写道: >> > >> > Hi Godfrey, >> > >> > thank you for the explanation. A SELECT is definitely more generic and >> > will work for all connectors automatically. As such I think it's a good >> > baseline solution regardless. >> > >> > We can also think about allowing connector-specific optimizations in the >> > future, but I do like your idea of letting the optimizer rules perform a >> > lot of the work here already by leveraging existing optimizations. >> > Similarly things like non-null counts of non-nullable columns would (or >> > at least could) be handled by the optimizer rules already. >> > >> > So as far as that point goes, +1 to the generic approach. >> > >> > One more point, though: In general we should avoid supporting features >> > only in specific modes as it breaks the unification promise. Given that >> > ANALYZE is a manual and completely optional operation I'm OK with doing >> > that here in principle. However, I wonder what will happen in the >> > streaming / unbounded case. Do you plan to throw an error? Or do we >> > complete the command as successful but without doing anything? >> > >> > >> > Best >> > Ingo >> > >> > On 13.06.22 05:50, godfrey he wrote: >> > > Hi Ingo, >> > > >> > > Thanks for the inputs. >> > > >> > > I think converting `ANALYZE TABLE` to `SELECT` statement is >> > > more generic approach. Because query plan optimization is more generic, >> > > we can provide more optimization rules to optimize not only `SELECT` >> statement >> > > converted from `ANALYZE TABLE` but also the `SELECT` statement written >> by users. >> > > >> > >> JDBC connector can get a row count estimate without performing a >> > >> SELECT COUNT(1) >> > > To optimize such cases, we can implement a rule to push aggregate into >> > > table source. >> > > Currently, there is a similar rule: SupportsAggregatePushDown, which >> > > supports only pushing >> > > local aggregate into source now. >> > > >> > > >> > > Best, >> > > Godfrey >> > > >> > > Ingo Bürk <airbla...@apache.org> 于2022年6月10日周五 17:15写道: >> > >> >> > >> Hi Godfrey, >> > >> >> > >> compared to the solution proposed in the FLIP (using a SELECT >> > >> statement), I wonder if you have considered adding APIs to catalogs / >> > >> connectors to perform this task as an alternative? >> > >> I could imagine that for many connectors, statistics could be >> > >> implemented in a less expensive way by leveraging the underlying >> system >> > >> (e.g. a JDBC connector can get a row count estimate without >> performing a >> > >> SELECT COUNT(1)). >> > >> >> > >> >> > >> Best >> > >> Ingo >> > >> >> > >> >> > >> On 10.06.22 09:53, godfrey he wrote: >> > >>> Hi all, >> > >>> >> > >>> I would like to open a discussion on FLIP-240: Introduce "ANALYZE >> > >>> TABLE" Syntax. >> > >>> >> > >>> As FLIP-231 mentioned, statistics are one of the most important >> inputs >> > >>> to the optimizer. Accurate and complete statistics allows the >> > >>> optimizer to be more powerful. "ANALYZE TABLE" syntax is a very >> common >> > >>> but effective approach to gather statistics, which is already >> > >>> introduced by many compute engines and databases. >> > >>> >> > >>> The main purpose of discussion is to introduce "ANALYZE TABLE" >> syntax >> > >>> for Flink sql. >> > >>> >> > >>> You can find more details in FLIP-240 document[1]. Looking forward to >> > >>> your feedback. >> > >>> >> > >>> [1] >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481 >> > >>> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240 >> > >>> >> > >>> >> > >>> Best, >> > >>> Godfrey >>