Hi Xuefu, Welcome to the Flink community and thanks for starting this discussion! Better Hive integration would be really great! Can you go into details of what you are proposing? I can think of a couple ways to improve Flink in that regard:
* Support for Hive UDFs * Support for Hive metadata catalog * Support for HiveQL syntax * ??? Best, Fabian Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu < xuef...@alibaba-inc.com>: > Hi all, > > Along with the community's effort, inside Alibaba we have explored Flink's > potential as an execution engine not just for stream processing but also > for batch processing. We are encouraged by our findings and have initiated > our effort to make Flink's SQL capabilities full-fledged. When comparing > what's available in Flink to the offerings from competitive data processing > engines, we identified a major gap in Flink: a well integration with Hive > ecosystem. This is crucial to the success of Flink SQL and batch due to the > well-established data ecosystem around Hive. Therefore, we have done some > initial work along this direction but there are still a lot of effort > needed. > > We have two strategies in mind. The first one is to make Flink SQL > full-fledged and well-integrated with Hive ecosystem. This is a similar > approach to what Spark SQL adopted. The second strategy is to make Hive > itself work with Flink, similar to the proposal in [1]. Each approach bears > its pros and cons, but they don’t need to be mutually exclusive with each > targeting at different users and use cases. We believe that both will > promote a much greater adoption of Flink beyond stream processing. > > We have been focused on the first approach and would like to showcase > Flink's batch and SQL capabilities with Flink SQL. However, we have also > planned to start strategy #2 as the follow-up effort. > > I'm completely new to Flink(, with a short bio [2] below), though many of > my colleagues here at Alibaba are long-time contributors. Nevertheless, I'd > like to share our thoughts and invite your early feedback. At the same > time, I am working on a detailed proposal on Flink SQL's integration with > Hive ecosystem, which will be also shared when ready. > > While the ideas are simple, each approach will demand significant effort, > more than what we can afford. Thus, the input and contributions from the > communities are greatly welcome and appreciated. > > Regards, > > > Xuefu > > References: > > [1] https://issues.apache.org/jira/browse/HIVE-10712 > [2] Xuefu Zhang is a long-time open source veteran, worked or working on > many projects under Apache Foundation, of which he is also an honored > member. About 10 years ago he worked in the Hadoop team at Yahoo where the > projects just got started. Later he worked at Cloudera, initiating and > leading the development of Hive on Spark project in the communities and > across many organizations. Prior to joining Alibaba, he worked at Uber > where he promoted Hive on Spark to all Uber's SQL on Hadoop workload and > significantly improved Uber's cluster efficiency. > > >