Re: Spark v2.3.2 : Duplicate entries found for each primary Key

2019-11-19 Thread Pratyaksh Sharma
Thank you for the explanation Kabeer/Sudha. Let me go through the flow and revert back in case of any further queries. On Wed, Nov 20, 2019 at 6:21 AM Kabeer Ahmed wrote: > Pratyaksh, > > +1 to what Sudha has written. Lets zoom a bit closer. > For hive, as you said, we explicitly set input

Re: Small clarification in Hoodie Cleaner flow

2019-11-19 Thread Pratyaksh Sharma
Thank you for the clarification Balaji. Now I understand it properly. :) On Tue, Nov 19, 2019 at 11:17 PM Balaji Varadarajan wrote: > I updated the FAQ section to set defaults correctly and add more > information related to this : > >

Re: 20191119 Weekly Meeting

2019-11-19 Thread nishith agarwal
Thanks Bhavani! -Nishith On Tue, Nov 19, 2019 at 10:26 PM Bhavani Sudha wrote: > Please find the meeting summary here - > https://cwiki.apache.org/confluence/x/OxYZC > > Thanks, > Sudha > > On Tue, Nov 19, 2019 at 9:06 PM Vinoth Chandar wrote: > > > Hangout link here > >

Re: 20191119 Weekly Meeting

2019-11-19 Thread Bhavani Sudha
Please find the meeting summary here - https://cwiki.apache.org/confluence/x/OxYZC Thanks, Sudha On Tue, Nov 19, 2019 at 9:06 PM Vinoth Chandar wrote: > Hangout link here > >

20191119 Weekly Meeting

2019-11-19 Thread Vinoth Chandar
Hangout link here

Re: [DISCUSS] Introduce stricter comment and code style validation rules

2019-11-19 Thread Y Ethan Guo
+1 on all of the proposed rules. These will also make the javadoc more readable. On Mon, Nov 18, 2019 at 5:55 PM Vinoth Chandar wrote: > +1 on all three. > > Would there be a overhaul of existing code to add comments to all classes? > We are pretty reasonable already, but good to get this in

Re: Spark v2.3.2 : Duplicate entries found for each primary Key

2019-11-19 Thread Kabeer Ahmed
Pratyaksh,  +1 to what Sudha has written. Lets zoom a bit closer.  For hive, as you said, we explicitly set input format to HoodieParquetInputFormat. - HoodieParquetInputFormat extends MapredParquetInputFormat which is nothing but a input format for hive. Hive and Presto depend on this file to

Re: Small clarification in Hoodie Cleaner flow

2019-11-19 Thread Balaji Varadarajan
I updated the FAQ section to set defaults correctly and add more information related to this : https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-WhatdoestheHudicleanerdo The cleaner retention configuration is based on counts (number of commits to be retained) with the assumption that users

Re: Apache project maturity model

2019-11-19 Thread Vinoth Chandar
Thanks Thomas! Will read it over and file tickets against the "release" component. On Sat, Nov 16, 2019 at 1:57 PM Thomas Weise wrote: > Hi, > > The maturity model is an (optional) framework for evaluating the project. I > would recommend to take a look and check if there are focus areas for

Small clarification in Hoodie Cleaner flow

2019-11-19 Thread Pratyaksh Sharma
Hi, We are assuming the following in getDeletePaths() method in cleaner flow in case of KEEP_LATEST_COMMITS policy - /** * Selects the versions for file for cleaning, such that it * * - Leaves the latest version of the file untouched - For older versions, - It leaves all the commits untouched