Re: [DISCUSS] Simplification of terminologies

2020-01-07 Thread Vinoth Chandar
Howdy all, I have written up a first full version of https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture , based on style set by Nick on the cWiki. Please contribute to making this more elaborate.

Re: [DISCUSS] Simplification of terminologies

2019-11-13 Thread Vinoth Chandar
Will review the POC in cwiki. +1 Based on this feedback, I will proceed with the changes. Thanks all! On Tue, Nov 12, 2019 at 10:47 PM Semantic Beeng wrote: > @vc, I think of it as elaborating the #ubiquitouslanguage in DDD. > See private email with references to a small POC in wiki and

Re: [DISCUSS] Simplification of terminologies

2019-11-12 Thread Vinoth Chandar
Thanks everyone for the feedback. Looks like we are in general agreement. I am inclined to just do 1 & 2 and leave COPY_ON_WRITE as is based on great points Ethan and Shiyan raised. Makes sense.. Will wait for 1-2 days still to close this thread. @semanticbeeing Thats a great idea. Is it more

Re: [DISCUSS] Simplification of terminologies

2019-11-12 Thread Y. Ethan Guo
+1 on [1] and [2]. For [3], I have similar doubts as Shiyan. For the naming, I can understand the original intent of the analogy for COW which is to make another "copy" of columnar/parquet file upon the modification/update to the records in the file. From the system design point of view, it's

Re: [DISCUSS] Simplification of terminologies

2019-11-12 Thread nishith agarwal
+1 on the first two, don't feel strongly about (3). Thanks, Nishith On Tue, Nov 12, 2019 at 5:03 AM leesf wrote: > [1] +1. `views` indeed confused me a lot. > [2] +1. `snapshot` is more reasonable. > [3] I don't feel very strong to rename it, the current name `COPY_ON_WRITE` > is reasonable

Re: [DISCUSS] Simplification of terminologies

2019-11-12 Thread leesf
[1] +1. `views` indeed confused me a lot. [2] +1. `snapshot` is more reasonable. [3] I don't feel very strong to rename it, the current name `COPY_ON_WRITE` is reasonable considering the cost to rename and the behavior that new version parquet file will be created and seems to be copied from old

Re: [DISCUSS] Simplification of terminologies

2019-11-11 Thread Balaji Varadarajan
Agree with all 3 changes. The naming now looks more consistent than earlier. +1 on them Depending on whether we are renaming Input formats for (1) and (2) - this could require some migration steps for Balaji.V On Mon, Nov 11, 2019 at 7:38 PM vino yang wrote: > Hi Vinoth, > > Thanks for

Re: [DISCUSS] Simplification of terminologies

2019-11-11 Thread vino yang
Hi Vinoth, Thanks for bringing these proposals. +1 on all three. Especially, big +1 on the third renaming proposal. When I was a newbie. The "COPY_ON_WRITE" term confused me a lot. It easily mislead users on the "copy" term. And make users compare it with the `CopyOnWriteArrayList` data

Re: [DISCUSS] Simplification of terminologies

2019-11-11 Thread Shiyan Xu
[1] +1; "query" indeed sounds better [2] +1 on the term "snapshot"; so basically we follow the convention that when we say "snapshot", it means "give me the most up-to-date facts (lowest data latency) even if it takes some query time" [3] Though I agree with the renaming, I have a different

Re: [DISCUSS] Simplification of terminologies

2019-11-11 Thread Bhavani Sudha
+1 on all three rename proposals. I think this would make the concepts super easy to follow for new users. If changing [3] seems to be a stretch, we should definitely do [1] & [2] at the least IMO. I will be glad to help out on the renames to whatever extent possible should the Hudi community

[DISCUSS] Simplification of terminologies

2019-11-11 Thread Vinoth Chandar
Hello all, I wanted to raise an important topic with the community around whether we should rename some of our terminologies in code/docs to be more user-friendly and understandable.. Let me also provide some context for each, since I am probably guilty of introducing most of them in the first