Re: Weekly sync notes 20201225

2020-02-25 Thread Shiyan Xu
link https://cwiki.apache.org/confluence/display/HUDI/20200225+Weekly+Sync+Minutes On Tue, Feb 25, 2020 at 9:39 PM vbal...@apache.org wrote: > Please find the weekly sync notes here > 20200225 Weekly Sync Minutes - HUDI - Apache Software Foundation > > Thanks,Balaji.V

Weekly sync notes 20201225

2020-02-25 Thread vbal...@apache.org
Please find the weekly sync notes here 20200225 Weekly Sync Minutes - HUDI - Apache Software Foundation Thanks,Balaji.V

Re: Updating COW Table

2020-02-25 Thread leesf
You would pass it via option, like option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY(), EmptyHoodieRecordPayload.class.getName()) selvaraj periyasamy 于2020年2月26日周三 上午2:24写道: > OverwriteWithLatestAvroPayload is used for Delta Streamer. Is there a way > for DataSource Writer? > > please correct

Re: Updating COW Table

2020-02-25 Thread selvaraj periyasamy
OverwriteWithLatestAvroPayload is used for Delta Streamer. Is there a way for DataSource Writer? please correct me , if I am wrong. Thanks, Selva On Mon, Feb 24, 2020 at 1:15 PM Gary Li wrote: > Hi, in this case you need to design your own logic to handle merging. > Please check

Re: HudiDeltaStreamer on EMR

2020-02-25 Thread Raghvendra Dhar Dubey
Got it Shiyan, Thanks. On 2020/02/24 19:15:52, Shiyan Xu wrote: > It's likely that the source parquet data has a column of Spark Timestamp > type, which is not convertible to avro. > By the way, ParquetDFSSource is not available in 0.5.0. Only added in > 0.5.1. You'll probably need to add a

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-25 Thread Balaji Varadarajan
+1. Lets do it :) Balaji.V On Mon, Feb 24, 2020 at 6:36 PM Shiyan Xu wrote: > +1 great reading and values! > > On Mon, 24 Feb 2020, 15:31 nishith agarwal, wrote: > > > +100 > > - Reduces index lookup time hence improves job runtime > > - Paves the way for streaming style ingestion > > -

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-02-25 Thread Balaji Varadarajan
See if you can have a generic implementation where individual fields in the partition-path can be configured with their own key-generator class. Currently, TimestampBasedKeyGenerator is the only type specific custom generator. If we are anticipating more such classes for specialized types,

Weekly sync on Zoom

2020-02-25 Thread Vinoth Chandar
Hello all, Given the woes we face with Hangouts now and then. We are going to try zoom for todays meeting. Details here. https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Community+Weekly+Sync Thanks Vinoth

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-25 Thread Pratyaksh Sharma
Sure, will take a look whenever I get time. On Tue, Feb 25, 2020 at 12:24 PM Vinoth Chandar wrote: > +1 > > Thanks Pratyaksh! Will take a look and structure it accordingly. Might be > worth looking at last 30 days of slack, mailing lists and see if we can add > more.. > > > On Mon, Feb 24, 2020

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

2020-02-25 Thread Pratyaksh Sharma
Hi Vinoth, > in avro you define it as an optional field (union of type and null).. Yes that is correct. But imagine if someone does not want to populate null, rather he wants to populate default values for the field, which is a very common case. > seems like it's being copied over? When creating