Re: [DISCUSS] Hudi is the data lake platform

2021-04-19 Thread Vinoth Chandar
Looks like we have consensus here! Will share the blog PR here once ready. Thanks all! On Fri, Apr 16, 2021 at 8:43 PM Sivabalan wrote: > totally +1 on clarifying Hudi's vision. > > On Wed, Apr 14, 2021 at 3:43 AM nishith agarwal > wrote: > > > +1 > > > > I also believe Hudi is a Data

Re: PR Tracker board

2021-04-19 Thread Vinoth Chandar
Updated all open PRs into the following columns. *Opened PRs* => New PRs, PRs with open issues, unclear problem statements, non-ideal solution approaches *Ready for Review* => PRs in final reviewable shape *Review in progress* => PRs being actively reviewed On Mon, Apr 19, 2021 at 9:40 AM

Re: [DISCUSS] Refactor the Hudi configuration framework

2021-04-19 Thread Vinoth Chandar
Biggest difference from PR 1094 and the current PR open, is the addition of fallback support and that no moving around of configs in the same PR. This would make this effort straightforward IMO. >HoodieBootstrapConfig.BOOTSTRAP_BASE_PATH_PROP in their client code, they need to either replace it

Re: [DISCUSS] Refactor the Hudi configuration framework

2021-04-19 Thread Vinoth Chandar
+1 from me. Long time coming. On Mon, Apr 19, 2021 at 12:02 PM Ding, Wenning wrote: > Hi, > I planned to refactor the current Hudi configuration framework. lamberken< > https://github.com/lamberken> did similar things before: > https://github.com/apache/hudi/pull/1094 and I’d like to continue

[DISCUSS] Refactor the Hudi configuration framework

2021-04-19 Thread Ding, Wenning
Hi, I planned to refactor the current Hudi configuration framework. lamberken did similar things before: https://github.com/apache/hudi/pull/1094 and I’d like to continue this work and add more features in ConfigOption class. The motivation of this change is, as

PR Tracker board

2021-04-19 Thread Vinoth Chandar
Hi all, I know we have a build up of great contributions :) [great problem to have], that kind of exceeded our existing triaging processes. So, in order to generate more transparency into the review process and understand where PRs are in the pipeline, made a tracker board here

Re: [DISCUSS] Incremental computation pipeline for HUDI

2021-04-19 Thread Danny Chan
Thanks @Sivabalan ~ I agree that parquet and log files should keep sync in metadata columns in case there are confusions and special handling in some use cases like compaction. I also agree add a metadata column is more ease to use for SQL connectors. We can add a metadata column named