Re: [DISCUSS] Readiness for graduation to TLP

2020-04-29 Thread hddong
+1 Thomas Weise 于2020年4月30日周四 上午2:58写道: > +1 > > On Wed, Apr 29, 2020 at 10:39 AM Luciano Resende > wrote: > > > +1 > > > > On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar > wrote: > > > > > > Hello all, > > > > > > I would like to start a discussion on our readiness to pursue > graduation >

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-29 Thread Thomas Weise
+1 On Wed, Apr 29, 2020 at 10:39 AM Luciano Resende wrote: > +1 > > On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar wrote: > > > > Hello all, > > > > I would like to start a discussion on our readiness to pursue graduation > to > > TLP and potentially follow up with a VOTE with a formal

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-29 Thread Luciano Resende
+1 On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar wrote: > > Hello all, > > I would like to start a discussion on our readiness to pursue graduation to > TLP and potentially follow up with a VOTE with a formal resolution. To seed > the discussion, our community's achievements since entering

Re: [DISCUSS] Bug bash?

2020-04-29 Thread Sivabalan
I could lend a hand if we need any help in organizing this. LMK. On Tue, Apr 28, 2020 at 3:19 AM Bhavani Sudha wrote: > On Mon, Apr 27, 2020 at 10:21 PM Vinoth Chandar wrote: > > > Great! I will prep the bugs and do uniform assignment across people in > > this thread :) > > > > Sudha (RM for

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread sblack...@apache.org
+1. Very impressive community and technical work. On Apr 28, 2020, 10:45 AM -0500, vbal...@apache.org , wrote: > +1. I strongly think we are ready for graduation. > On Tuesday, April 28, 2020, 07:38:16 AM PDT, lamberken > wrote: > > +1 > > On 2020/04/28 05:05:44, Vinoth Chandar wrote: > > Hello

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread vbal...@apache.org
+1. I strongly think we are ready for graduation. On Tuesday, April 28, 2020, 07:38:16 AM PDT, lamberken wrote: +1 On 2020/04/28 05:05:44, Vinoth Chandar wrote: > Hello all, > > I would like to start a discussion on our readiness to pursue graduation to > TLP and potentially follow

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread lamberken
+1 On 2020/04/28 05:05:44, Vinoth Chandar wrote: > Hello all, > > I would like to start a discussion on our readiness to pursue graduation to > TLP and potentially follow up with a VOTE with a formal resolution. To seed > the discussion, our community's achievements since entering the

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Vinoth Chandar
Edit : Entering the incubator in early 2019 On Tue, Apr 28, 2020 at 7:23 AM Nishith wrote: > +1 > > Sent from my iPhone > > > On Apr 28, 2020, at 4:43 AM, Sivabalan wrote: > > > > Def yes from my side  > > > >> On Tue, Apr 28, 2020 at 7:41 AM leesf wrote: > >> > >> +1 > >> > >> vino yang

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Nishith
+1 Sent from my iPhone > On Apr 28, 2020, at 4:43 AM, Sivabalan wrote: > > Def yes from my side  > >> On Tue, Apr 28, 2020 at 7:41 AM leesf wrote: >> >> +1 >> >> vino yang 于2020年4月28日周二 下午5:39写道: >> >>> +1 >>> >>> Best, >>> Vino >>> >>> Bhavani Sudha 于2020年4月28日周二 下午2:48写道: >>>

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Sivabalan
Def yes from my side  On Tue, Apr 28, 2020 at 7:41 AM leesf wrote: > +1 > > vino yang 于2020年4月28日周二 下午5:39写道: > > > +1 > > > > Best, > > Vino > > > > Bhavani Sudha 于2020年4月28日周二 下午2:48写道: > > > > > +1 to pursue graduation. I certainly think we are ready. Will chime in > on > > > voting

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread leesf
+1 vino yang 于2020年4月28日周二 下午5:39写道: > +1 > > Best, > Vino > > Bhavani Sudha 于2020年4月28日周二 下午2:48写道: > > > +1 to pursue graduation. I certainly think we are ready. Will chime in on > > voting thread when you start it. > > > > Thanks, > > Sudha > > > > On Mon, Apr 27, 2020 at 10:06 PM Vinoth

Re: [DISCUSS] Bug bash?

2020-04-28 Thread Bhavani Sudha
On Mon, Apr 27, 2020 at 10:21 PM Vinoth Chandar wrote: > Great! I will prep the bugs and do uniform assignment across people in > this thread :) > > Sudha (RM for next release), please co-ordinate the timing of this based on > the release timeline. > >> Sounds good. > > if this works well, we

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Bhavani Sudha
+1 to pursue graduation. I certainly think we are ready. Will chime in on voting thread when you start it. Thanks, Sudha On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar wrote: > Hello all, > > I would like to start a discussion on our readiness to pursue graduation to > TLP and potentially

Re: [DISCUSS] Bug bash?

2020-04-27 Thread Vinoth Chandar
Great! I will prep the bugs and do uniform assignment across people in this thread :) Sudha (RM for next release), please co-ordinate the timing of this based on the release timeline. if this works well, we can make this a Hudi release tradition :) On Thu, Apr 23, 2020 at 6:45 PM Mehrotra,

Re: [DISCUSS] Next Release timeline

2020-04-26 Thread Vinoth Chandar
Given enough time has passed, we can proceed this way, with Sudha as RM . Please respond if anyone has more to add On Sun, Apr 26, 2020 at 1:12 PM Balaji Varadarajan wrote: > +1 on Sudha being RM and targeting next release for mid may. > > Balaji.V > > On 2020/04/23 14:27:46, Vinoth Chandar

Re: [DISCUSS] Next Release timeline

2020-04-26 Thread Balaji Varadarajan
+1 on Sudha being RM and targeting next release for mid may. Balaji.V On 2020/04/23 14:27:46, Vinoth Chandar wrote: > Thanks all. Encourage everyone to chime in more, so we can make a decision > here! > > On Thu, Apr 23, 2020 at 6:29 AM Sivabalan wrote: > > > sounds good. We could go with a

Re: [DISCUSS] Bug bash?

2020-04-23 Thread Mehrotra, Udit
+1 Happy to participate On 4/23/20, 6:32 PM, "vino yang" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. +1 Shiyan Xu 于2020年4月24日周五

Re: [DISCUSS] Bug bash?

2020-04-23 Thread vino yang
+1 Shiyan Xu 于2020年4月24日周五 上午9:11写道: > +1 would like to participate > > On Thu, Apr 23, 2020 at 5:51 PM Dongdong Hong > wrote: > > > +1 sounds great! > > > > Sivabalan 于2020年4月23日周四 下午9:30写道: > > > > > +1 > > > > > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken wrote: > > > > > > > > > > > > >

Re: [DISCUSS] Support popular metrics reporter

2020-04-23 Thread Shiyan Xu
Thank you all for the approval! Filed https://issues.apache.org/jira/browse/HUDI-836 On Thu, Apr 23, 2020 at 5:40 PM dongdong hong wrote: > +1 > >

Re: [DISCUSS] Bug bash?

2020-04-23 Thread Shiyan Xu
+1 would like to participate On Thu, Apr 23, 2020 at 5:51 PM Dongdong Hong wrote: > +1 sounds great! > > Sivabalan 于2020年4月23日周四 下午9:30写道: > > > +1 > > > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken wrote: > > > > > > > > > > > > > > Wow, challenging job, +1 > > > > > > > > > Best, > > >

Re: [DISCUSS] Bug bash?

2020-04-23 Thread Dongdong Hong
+1 sounds great! Sivabalan 于2020年4月23日周四 下午9:30写道: > +1 > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken wrote: > > > > > > > > > Wow, challenging job, +1 > > > > > > Best, > > Lamber-Ken > > > > At 2020-04-23 04:51:01, "Vinoth Chandar" wrote: > > >Just floating a very random idea here. :) > >

Re: [DISCUSS] Support popular metrics reporter

2020-04-23 Thread dongdong hong
+1

Re: [DISCUSS] Next Release timeline

2020-04-23 Thread Vinoth Chandar
Thanks all. Encourage everyone to chime in more, so we can make a decision here! On Thu, Apr 23, 2020 at 6:29 AM Sivabalan wrote: > sounds good. We could go with a major by mid may. > > On Wed, Apr 22, 2020 at 12:58 PM Vinoth Chandar wrote: > > > +1 on Sudha being the RM > > > > My preference

Re: [DISCUSS] Bug bash?

2020-04-23 Thread Sivabalan
+1 On Wed, Apr 22, 2020 at 7:29 PM lamber-ken wrote: > > > > Wow, challenging job, +1 > > > Best, > Lamber-Ken > > At 2020-04-23 04:51:01, "Vinoth Chandar" wrote: > >Just floating a very random idea here. :) > > > >Would there be interest in doing a bug bash for a week, where we >

Re: [DISCUSS] Next Release timeline

2020-04-23 Thread Sivabalan
sounds good. We could go with a major by mid may. On Wed, Apr 22, 2020 at 12:58 PM Vinoth Chandar wrote: > +1 on Sudha being the RM > > My preference would be to do a major release as well, targeting mid may > (which means code freeze in 3 weeks?) > This gives us enough time to land some major

Re: [DISCUSS] Support popular metrics reporter

2020-04-22 Thread cooper
+1 Balaji Varadarajan 于2020年4月23日周四 上午12:12写道: > +1 > On Wednesday, April 22, 2020, 08:35:30 AM PDT, leesf < > leesf0...@gmail.com> wrote: > > +1 > > Vinoth Chandar 于2020年4月22日周三 下午2:24写道: > > > +1 from me as well > > > > On Mon, Apr 20, 2020 at 9:37 PM vino yang wrote: > > > > > Hi

Re:[DISCUSS] Bug bash?

2020-04-22 Thread lamber-ken
Wow, challenging job, +1 Best, Lamber-Ken At 2020-04-23 04:51:01, "Vinoth Chandar" wrote: >Just floating a very random idea here. :) > >Would there be interest in doing a bug bash for a week, where we >aggressively close out some pesky bugs that have been lingering around.. If >enough

Re: [DISCUSS] Bug bash?

2020-04-22 Thread Balaji Varadarajan
+1. Would also be great if folks sign-up for testing/trying out the master branch in their real environments  On Wednesday, April 22, 2020, 02:48:13 PM PDT, Bhavani Sudha wrote: +1 Sounds like a good idea On Wed, Apr 22, 2020 at 1:51 PM Vinoth Chandar wrote: > Just floating a very

Re: [DISCUSS] Bug bash?

2020-04-22 Thread Bhavani Sudha
+1 Sounds like a good idea On Wed, Apr 22, 2020 at 1:51 PM Vinoth Chandar wrote: > Just floating a very random idea here. :) > > Would there be interest in doing a bug bash for a week, where we > aggressively close out some pesky bugs that have been lingering around.. If > enough committers and

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread Vinoth Chandar
Great! Just sharing the prior conversation on this. We were hoping to replace the ill-maintained activity page here https://hudi.apache.org/activity.html with a blog section and move stuff there. We should already have all the tools/markups for code highlighting etc.. On Wed, Apr 22, 2020 at

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread Prashant Wason
I can help drive this. Let me take a look at some other projects and suggest how to go about it. Thanks Prashant On Wed, Apr 22, 2020, 9:31 AM Vinoth Chandar wrote: > Any volunteers to drive this? (also may be a small section in contribution > guide for contributing a blog) :) > > On Wed, Apr

Re: [DISCUSS] Next Release timeline

2020-04-22 Thread Vinoth Chandar
+1 on Sudha being the RM My preference would be to do a major release as well, targeting mid may (which means code freeze in 3 weeks?) This gives us enough time to land some major features as well as stabilize them as much as possible. On Wed, Apr 22, 2020 at 3:21 AM Pratyaksh Sharma wrote: >

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread Vinoth Chandar
Any volunteers to drive this? (also may be a small section in contribution guide for contributing a blog) :) On Wed, Apr 22, 2020 at 9:11 AM vbal...@apache.org wrote: > +1 on moving blogs to website. > On Wednesday, April 22, 2020, 08:35:02 AM PDT, leesf < > leesf0...@gmail.com> wrote: > >

Re: [DISCUSS] Support popular metrics reporter

2020-04-22 Thread Balaji Varadarajan
+1  On Wednesday, April 22, 2020, 08:35:30 AM PDT, leesf wrote: +1 Vinoth Chandar 于2020年4月22日周三 下午2:24写道: > +1 from me as well > > On Mon, Apr 20, 2020 at 9:37 PM vino yang wrote: > > > Hi Raymond, > > > > Thanks for opening this discussion. > > > > IMHO, as Hudi's user base grows,

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread vbal...@apache.org
+1 on moving blogs to website. On Wednesday, April 22, 2020, 08:35:02 AM PDT, leesf wrote: +1 vino yang 于2020年4月22日周三 下午1:50写道: > +1 from my side. > > Pratyaksh Sharma 于2020年4月22日周三 下午1:38写道: > > > +1 > > > > I have seen other Apache projects having blogs on their website like >

Re: [DISCUSS] Support popular metrics reporter

2020-04-22 Thread leesf
+1 Vinoth Chandar 于2020年4月22日周三 下午2:24写道: > +1 from me as well > > On Mon, Apr 20, 2020 at 9:37 PM vino yang wrote: > > > Hi Raymond, > > > > Thanks for opening this discussion. > > > > IMHO, as Hudi's user base grows, we need to enhance our metrics reporter. > > From an ecological point of

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread leesf
+1 vino yang 于2020年4月22日周三 下午1:50写道: > +1 from my side. > > Pratyaksh Sharma 于2020年4月22日周三 下午1:38写道: > > > +1 > > > > I have seen other Apache projects having blogs on their website like > Apache > > Pinot. > > > > On Wed, Apr 22, 2020 at 11:05 AM Bhavani Sudha Saktheeswaran > > wrote: > > >

Re: [DISCUSS] Next Release timeline

2020-04-22 Thread Pratyaksh Sharma
Major release looks good to me. On Wed, Apr 22, 2020 at 2:29 PM Bhavani Sudha wrote: > Hello all, > > I wanted to kick start the discussion on timeline and logistics for the > next release. Here are couple things we need to figure out. > >1. Should the next release be a minor or major

Re: [DISCUSS] Support popular metrics reporter

2020-04-22 Thread Vinoth Chandar
+1 from me as well On Mon, Apr 20, 2020 at 9:37 PM vino yang wrote: > Hi Raymond, > > Thanks for opening this discussion. > > IMHO, as Hudi's user base grows, we need to enhance our metrics reporter. > From an ecological point of view, this is also very important. > > So, +1 from my side. > >

Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread vino yang
+1 from my side. Pratyaksh Sharma 于2020年4月22日周三 下午1:38写道: > +1 > > I have seen other Apache projects having blogs on their website like Apache > Pinot. > > On Wed, Apr 22, 2020 at 11:05 AM Bhavani Sudha Saktheeswaran > wrote: > > > +1 > > > > On Tue, Apr 21, 2020 at 10:23 PM tison wrote: > >

Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Pratyaksh Sharma
+1 I have seen other Apache projects having blogs on their website like Apache Pinot. On Wed, Apr 22, 2020 at 11:05 AM Bhavani Sudha Saktheeswaran wrote: > +1 > > On Tue, Apr 21, 2020 at 10:23 PM tison wrote: > > > Hi Vinoth, > > > > +1 for moving blogs. > > > > cwiki looks belong to

Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Bhavani Sudha Saktheeswaran
+1 On Tue, Apr 21, 2020 at 10:23 PM tison wrote: > Hi Vinoth, > > +1 for moving blogs. > > cwiki looks belong to developer's scope and the first experience of users > is more likely our website. > > Best, > tison. > > > Vinoth Chandar 于2020年4月22日周三 下午1:09写道: > > > Hi community, > > > > What

Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread tison
Hi Vinoth, +1 for moving blogs. cwiki looks belong to developer's scope and the first experience of users is more likely our website. Best, tison. Vinoth Chandar 于2020年4月22日周三 下午1:09写道: > Hi community, > > What does everyone feel about moving blogs we have on cwiki now over to > site so

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-21 Thread nishith agarwal
+1, thanks for starting this effort Satish! -Nishith On Fri, Apr 17, 2020 at 2:26 PM Vinoth Chandar wrote: > Thanks Satish! > > On Fri, Apr 17, 2020 at 11:32 AM Satish Kotha > > wrote: > > > Thanks for interesting discussion. I will start RFC as suggested and > > discuss points brought up in

Re: [DISCUSS] Support popular metrics reporter

2020-04-20 Thread vino yang
Hi Raymond, Thanks for opening this discussion. IMHO, as Hudi's user base grows, we need to enhance our metrics reporter. >From an ecological point of view, this is also very important. So, +1 from my side. Best, Vino Shiyan Xu 于2020年4月21日周二 上午10:59写道: > Hi all, > > I'd like raise the topic

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-17 Thread Vinoth Chandar
Thanks Satish! On Fri, Apr 17, 2020 at 11:32 AM Satish Kotha wrote: > Thanks for interesting discussion. I will start RFC as suggested and > discuss points brought up in this thread. > > > On Thu, Apr 16, 2020 at 11:44 AM Balaji Varadarajan > wrote: > > > > > >A new file slice (empty parquet)

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-17 Thread Satish Kotha
Thanks for interesting discussion. I will start RFC as suggested and discuss points brought up in this thread. On Thu, Apr 16, 2020 at 11:44 AM Balaji Varadarajan wrote: > > >A new file slice (empty parquet) is indeed generated for every file group > in a partition. > >> we could just reuse

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Balaji Varadarajan
>A new file slice (empty parquet) is indeed generated for every file group in a partition. >> we could just reuse the existing file groups right? probably is bit hacky... Sorry for the confusion. I meant to say the empty file slice is only for file-groups which does not have any incoming

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Vinoth Chandar
>A new file slice (empty parquet) is indeed generated for every file group in a partition. we could just reuse the existing file groups right? probably is bit hacky... >we can encode some MAGIC in the write-token component for Hudi readers to skip these files so that they can be safely removed.

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread vbal...@apache.org
Satish, Thanks for the proposal. I think a RFC would be useful here. Let me know your thoughts. It would be good to nail other details like whether/how to deal with external index management with this API. Thanks,Balaji.V On Thursday, April 16, 2020, 10:46:19 AM PDT, Balaji Varadarajan

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Balaji Varadarajan
+1 from me. This is a really cool feature.  Yes, A new file slice (empty parquet) is indeed generated for every file group in a partition.  Regarding cleaning these "empty" file slices eventually by cleaner (to avoid cases where there are too many of them lying around) in a safe way, we can

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Vinoth Chandar
Hi Satish, Thanks for starting this.. Your use-cases do sounds very valuable to support. So +1 from me. IIUC, you are implementing a partition level overwrite, where existing filegroups will be retained, but instead of merging, you will just reuse the file names and write the incoming records

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-09 Thread Vinoth Chandar
Thanks Raymond.. We can continue engaging on the ticket! On Thu, Apr 9, 2020 at 4:54 PM Shiyan Xu wrote: > Filed! > https://issues.apache.org/jira/browse/HUDI-779 > > On Wed, Apr 8, 2020 at 11:05 PM Vinoth Chandar wrote: > > > +1 on an umbrella task.. > > > > We can do the RFC for overhaul of

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-09 Thread Shiyan Xu
Filed! https://issues.apache.org/jira/browse/HUDI-779 On Wed, Apr 8, 2020 at 11:05 PM Vinoth Chandar wrote: > +1 on an umbrella task.. > > We can do the RFC for overhaul of tests (mocking more tests, cleaning up > test data gen and so on).. > For adding junit5 itself and doing the initial work,

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-09 Thread Vinoth Chandar
+1 on an umbrella task.. We can do the RFC for overhaul of tests (mocking more tests, cleaning up test data gen and so on).. For adding junit5 itself and doing the initial work, we could just begin with JIRA? On Wed, Apr 8, 2020 at 12:56 PM Shiyan Xu wrote: > Thank you all for the feedback. >

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-08 Thread Shiyan Xu
Thank you all for the feedback. > This increases the scope to a overhaul of tests across the project.. Wonder if we can do a RFC for this? Indeed it is overhaul type of change. IMO RFC is needed specifically for the test utility re-design part. Guess it can be created when it's good to start?

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-02 Thread vino yang
Hi Shiyan, +1 from my side. Best, Vino Vinoth Chandar 于2020年3月30日周一 下午11:00写道: > Hi Raymond, > > Sounds good to me. This increases the scope to a overhaul of tests across > the project.. Wonder if we can do a RFC for this? But overall +1 from me. > > I would like to call upon the community to

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-30 Thread Vinoth Chandar
Hi Raymond, Sounds good to me. This increases the scope to a overhaul of tests across the project.. Wonder if we can do a RFC for this? But overall +1 from me. I would like to call upon the community to chime in more though :) . let's give it a few days.. Thanks Vinoth On Fri, Mar 27, 2020 at

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-27 Thread Shiyan Xu
Understand Vinoth. To me AssertJ is nice-to-have. I agree with the learning overhead. The current CI time is too long and we do need to use more mocking and optimize spark jobs setup. Based on your points, I imagine the path forward can be planned as this 1. An initial PR to add Junit 5 to

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-25 Thread Vinoth Chandar
+1 on Junit5. does seem nicer with support for lambdas. assuming we do a gradual rollout. At any point, we cannot have any of the core tests disabled :) May be we can use the vintage framework for now, do minimal changes migrate and then proceed to redoing the tests On AssertJ type frameworks, I

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-24 Thread Shiyan Xu
Some references https://junit.org/junit5/docs/current/user-guide/ https://joel-costigliola.github.io/assertj/ On Tue, Mar 24, 2020 at 9:27 PM Shiyan Xu wrote: > Hi all, > > I'd like to gather some feedback about > 1. upgrading Junit 4 to 5 > 2. adopt AssertJ as preferred assertion statement

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-03-21 Thread Pratyaksh Sharma
@Balaji @Vinoth Chandar , Here is a small attempt to make this a generic one - https://github.com/apache/incubator-hudi/pull/1433/files. Please have a look, happy to hear from everyone on this. This is just a sample, if we agree on the implementation, I will add more test cases and improve it

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-03-20 Thread Pratyaksh Sharma
https://issues.apache.org/jira/browse/HUDI-727 tracks this. On Tue, Feb 25, 2020 at 2:23 PM Pratyaksh Sharma wrote: > Hi Vinoth, > > > in avro you define it as an optional field (union of type and null).. > Yes that is correct. But imagine if someone does not want to populate > null, rather he

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-03-12 Thread Vinoth Chandar
>> shall we also copy useful ones to a separate doc for context I think we can look into replicating slack #general to our commits@ list.. I think Sudha brought this up before? This will auto solve the issue and give us full retention.. And also accurately report community engagement On Thu, Mar

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-03-12 Thread Y Ethan Guo
I'll go ahead on this. If anyone else would like to help, feel free to ping me. Thanks, - Ethan On Thu, Mar 12, 2020 at 11:26 AM Y Ethan Guo wrote: > I can help check the history of issues mentioned in Slack/Github and > classify them to the troubleshooting guide Pratyaksh put up. > > In

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-03-12 Thread Y Ethan Guo
I can help check the history of issues mentioned in Slack/Github and classify them to the troubleshooting guide Pratyaksh put up. In terms of the troubleshooting issues in Slack, shall we also copy useful ones to a separate doc for context? It's hard to track older messages and we're unable to

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-03-12 Thread Vinoth Chandar
Anyone who can help or drive with this effort? :) There are hundreds of messages and ~100 threads in the last month alone in slack alone.. So, having this in place would be a timely way for us to scale the community, On Tue, Feb 25, 2020 at 1:04 AM Pratyaksh Sharma wrote: > Sure, will take a

Re: [DISCUSS] Restructure hudi-utilities module

2020-03-09 Thread Balaji Varadarajan
+1 on Vinoth's suggestion on waiting for the lower level (write-client) re-factored and re-organized first.  We can then look at Data-Source and DeltaStreamer to make sure how to best organize them.  Balaji.VOn Sunday, March 8, 2020, 11:06:13 PM PDT, Vinoth Chandar wrote: >> make

Re: [DISCUSS] Restructure hudi-utilities module

2020-03-09 Thread Vinoth Chandar
>> make delta streamer a engine agnostic part so that Spark and Flink can share some common logic. If we make the change at the Write Client level to make it engine agnostic, it should help with most of the cases.. I believe there will be spark specific pieces in the Source abstraction since

Re: [DISCUSS] Restructure hudi-utilities module

2020-03-04 Thread vino yang
Hi guys, My original thought is to make delta streamer a engine agnostic part so that Spark and Flink can share some common logic. >>I am not sure the ROI is there for renaming to hudi-deltastreamer and pull this out.. Everytime we change a module name Actually, here my suggestion is to move

Re: [DISCUSS] Restructure hudi-utilities module

2020-03-04 Thread Vinoth Chandar
I am not sure the ROI is there for renaming to hudi-deltastreamer and pull this out.. Everytime we change a module name, its a breaking change and I would prefer if we reserved those for really pressing issues.. or take natural course of development and get there.. Regarding how multi framework

Re: [DISCUSS] Restructure hudi-utilities module

2020-03-04 Thread Gary Li
+1. hudi-delta gives me the feeling that it has something to do with other frameworks... I’d vote for another name hudi-deltastreamer or hudi-streamer or hudi-stream. On Wed, Mar 4, 2020 at 2:29 AM vino yang wrote: > Hi folks, > > Currently, it seems the content of hudi-utilities looks a bit

Re:Re: Re: Re: [DISCUSS] Improve the merge performance for cow

2020-03-03 Thread lamberken
Hi Vinoth, Yes, it's incorrect to draw the conclusion from only one test. It's just an new idea to improve the merge performance, it's not the best. e.g when read old record, series of conversion operations (Row to GenericRecord to HoodieRecord) etc.. > Also let's separate the RDD vs

Re: Re: Re: [DISCUSS] Improve the merge performance for cow

2020-03-02 Thread Vinoth Chandar
Hi Lamber-ken, If you agree reduceByKey() will shuffle data, then it would serialize and deserialize anyway correct? I am not denying that this may be a valid approach.. But we need much more rigorous testing and potentially implement both approaches side-by-side to compare.. IMO We cannot

Re:Re: Re: [DISCUSS] Improve the merge performance for cow

2020-02-28 Thread lamberken
Hi vinoth, Thanks for reviewing the initial design :) I know there are many problems at present(e.g shuffling, parallelism issue). We can discussed the practicability of the idea first. > ExternalSpillableMap itself was not the issue right, the serialization was Right, the new design will

Re: Re: [DISCUSS] Improve the merge performance for cow

2020-02-28 Thread Vinoth Chandar
Does n't this move the problem to tuning spark simply? the ExternalSpillableMap itself was not the issue right, the serialization was. This map is also used on the query side btw, where we need something like that. I took a pass at the code. I think we are shuffling data again for the

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-02-27 Thread Vinoth Chandar
+1 for adding a new composite KeyGenerator, which can combine both... Workaround : you can use the Transformer api to do a more flexible key generation as you wish as well. for deltastreamer On Tue, Feb 25, 2020 at 9:37 AM Balaji Varadarajan wrote: > > See if you can have a generic

Re: [DISCUSS] Improve the merge performance for cow

2020-02-26 Thread Vinoth Chandar
Hi lamber-ken, Thanks for this. I am not quite following the proposal. What do you mean by spark built in operators? Dont we use the RDD based spark operations. Are you suggesting that we perform the merging in sql? Not following. Please clarify. On Wed, Feb 26, 2020 at 10:08 AM lamberken

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-25 Thread Balaji Varadarajan
+1. Lets do it :) Balaji.V On Mon, Feb 24, 2020 at 6:36 PM Shiyan Xu wrote: > +1 great reading and values! > > On Mon, 24 Feb 2020, 15:31 nishith agarwal, wrote: > > > +100 > > - Reduces index lookup time hence improves job runtime > > - Paves the way for streaming style ingestion > > -

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-02-25 Thread Balaji Varadarajan
See if you can have a generic implementation where individual fields in the partition-path can be configured with their own key-generator class. Currently, TimestampBasedKeyGenerator is the only type specific custom generator. If we are anticipating more such classes for specialized types,

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-25 Thread Pratyaksh Sharma
Sure, will take a look whenever I get time. On Tue, Feb 25, 2020 at 12:24 PM Vinoth Chandar wrote: > +1 > > Thanks Pratyaksh! Will take a look and structure it accordingly. Might be > worth looking at last 30 days of slack, mailing lists and see if we can add > more.. > > > On Mon, Feb 24, 2020

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-02-25 Thread Pratyaksh Sharma
Hi Vinoth, > in avro you define it as an optional field (union of type and null).. Yes that is correct. But imagine if someone does not want to populate null, rather he wants to populate default values for the field, which is a very common case. > seems like it's being copied over? When creating

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-24 Thread Vinoth Chandar
+1 Thanks Pratyaksh! Will take a look and structure it accordingly. Might be worth looking at last 30 days of slack, mailing lists and see if we can add more.. On Mon, Feb 24, 2020 at 4:34 AM Pratyaksh Sharma wrote: > Hi Vinoth, > > Have added few more issues which I faced while adopting

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-02-24 Thread Vinoth Chandar
IIUC the link between backwards compatibility and default values for fields in schema is that, in avro you define it as an optional field (union of type and null).. Not sure if it has anything to do with default values. Nonetheless, we should copy over the default values, if that code is not

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-24 Thread Shiyan Xu
+1 great reading and values! On Mon, 24 Feb 2020, 15:31 nishith agarwal, wrote: > +100 > - Reduces index lookup time hence improves job runtime > - Paves the way for streaming style ingestion > - Eliminates dependency on Hbase (alternate "global index" support at the > moment) > > -Nishith > >

Re: [DISCUSS] Code freeze date for next release(0.5.2)

2020-02-24 Thread vino yang
Hi Vinoth, >> To the original point, @vinoyang do you need more help with getting all the compliance tickets filed and scoped? Yes, I am going to know more about the compliance problems that block the coming release. I will communicate more closely with Suneel, Balaji, Leesf, and I will seek

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-24 Thread nishith agarwal
+100 - Reduces index lookup time hence improves job runtime - Paves the way for streaming style ingestion - Eliminates dependency on Hbase (alternate "global index" support at the moment) -Nishith On Mon, Feb 24, 2020 at 10:56 AM Vinoth Chandar wrote: > +1 from me as well. This will be a

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-24 Thread Vinoth Chandar
+1 from me as well. This will be a product defining feature, if we can do it/ On Sun, Feb 23, 2020 at 6:27 PM vino yang wrote: > Hi Sivabalan, > > Thanks for your proposal. > > Big +1 from my side, indexing for record granularity is really good for > performance. It is also towards the

Re: Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-24 Thread Vinoth Chandar
Thank you! On Mon, Feb 24, 2020 at 10:47 AM Suneel Marthi wrote: > https://issues.apache.org/jira/browse/HUDI-580 > > On Mon, Feb 24, 2020 at 1:42 PM Vinoth Chandar wrote: > > > Hi, > > > > Have a filed a JIRA for this, tagged with 0.5.2? > > > > Thanks > > Vinoth > > > > On Sat, Feb 22, 2020

Re: Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-24 Thread Suneel Marthi
https://issues.apache.org/jira/browse/HUDI-580 On Mon, Feb 24, 2020 at 1:42 PM Vinoth Chandar wrote: > Hi, > > Have a filed a JIRA for this, tagged with 0.5.2? > > Thanks > Vinoth > > On Sat, Feb 22, 2020 at 6:58 AM lamberken wrote: > > > > > > > Right, will do. > > > > > > Thanks, > >

Re: [DISCUSS] Code freeze date for next release(0.5.2)

2020-02-24 Thread Vinoth Chandar
Hi Siva, Thats a valid point. Would freezing code and then continuing to improve the RC help? We are just talking about a code freeze date and not a release date. To the original point, @vinoyang do you need more help with getting all the compliance tickets filed and scoped? Thanks Vinoth On

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-24 Thread Pratyaksh Sharma
Hi Vinoth, Have added few more issues which I faced while adopting Hudi. Please have a look. I guess everyone in community should make it a habit to try adding errors/ issues that one faces on this page. It would be really useful for others also. Further we will have a consolidated page for all

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-23 Thread vino yang
Hi Sivabalan, Thanks for your proposal. Big +1 from my side, indexing for record granularity is really good for performance. It is also towards the streaming processing. Best, Vino Sivabalan 于2020年2月23日周日 上午12:52写道: > As Aapche Hudi is getting widely adopted, performance has become the need

Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-22 Thread vbal...@apache.org
+1 on ensuring all scripts in Hudi codebase follow same convention for licensing. Balaji.VOn Saturday, February 22, 2020, 06:16:29 AM PST, Suneel Marthi wrote: Please go ahead and make the change @lamberken I was just looking at scripts from Hive and Kafka projects, see below.

Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-22 Thread Suneel Marthi
Please go ahead and make the change @lamberken I was just looking at scripts from Hive and Kafka projects, see below. https://github.com/apache/hive/blob/master/bin/init-hive-dfs.sh https://github.com/apache/hive/blob/master/bin/hive-config.sh

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-21 Thread Vinoth Chandar
Thanks Pratyaksh! Do you have any suggestions on priming this page with many more common issues? On Thu, Feb 20, 2020 at 12:54 AM Pratyaksh Sharma wrote: > Here is the link to troubleshooting guide - > https://cwiki.apache.org/confluence/display/HUDI/Troubleshooting+Guide. > > Suggestions are

Re: Re: Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

2020-02-20 Thread Vinoth Chandar
If there are no more comments/objections, we could re work the PR based on the discussion here.. Points made by Udit are also pretty valid.. Thanks for the constructive conversation. :) On Wed, Feb 19, 2020 at 3:12 PM lamberken wrote: > > > @Vinoth, glad to see your reply. > > > >>

Re:Re: Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

2020-02-19 Thread lamberken
@Vinoth, glad to see your reply. >> SchemaConverters does import things like types I checked the git history of package "org.apache.spark.sql.types", it hasn't changed in a year, means that spark does not change types often. >> let's have a flag in maven to skip Good suggestion. bundling

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread Nicholas
+1 on vinoyang as the release manager +1 on making a shorter 0.5.2 release. On 2020/02/19 00:55:34, leesf wrote: > +1 on vino to be RM, and will help him to release as I can. > > nishith agarwal 于2020年2月19日周三 上午7:28写道: > > > +1 on minor release focussing on Apache compliance. > > +1 on Vino

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread leesf
+1 on vino to be RM, and will help him to release as I can. nishith agarwal 于2020年2月19日周三 上午7:28写道: > +1 on minor release focussing on Apache compliance. > +1 on Vino yang to be Release Manager. > > -Nishith > > On Tue, Feb 18, 2020 at 11:53 AM vbal...@apache.org > wrote: > > > > > +1 on minor

<    2   3   4   5   6   7   8   9   10   11   >