Re: How to use HoodieDeltaStreamer for upsert on JsonDFSSource

2019-04-24 Thread Jack Wang
Got you, thanks much for the detailed info. On Thu, Apr 25, 2019 at 9:17 AM Vinoth Chandar wrote: > ah right. Swapping it out with --source-class > com.uber.hoodie.utilities.sources.JsonDFSSource and specifying > the following config in the property file should work > >

Re: How to use HoodieDeltaStreamer for upsert on JsonDFSSource

2019-04-24 Thread Vinoth Chandar
The demo here https://hudi.apache.org/docker_demo.html actually invokes this path.. Is that helpful? Balaji, please correct me if I am wrong. Thanks Vinoth On Wed, Apr 24, 2019 at 4:07 AM Jack Wang wrote: > Hi forks, > > Doesn't anyone know how to use HoodieDeltaStreamer for upsert on >

Re: Hudi CLI doesn't work for dedup

2019-04-24 Thread Vinoth Chandar
Hi Jack, Trying to understand what your goal is. DeDupeSparkJob is used for repairing datasets with repairs.. Is this your intention? Mailing list does not show images. :( Can you please post the code here or in a gist? I can take a look. Thanks Vinoth On Wed, Apr 24, 2019 at 3:57 AM Jack Wang

Re: Hudi support for records deduplication

2019-04-24 Thread Vinoth Chandar
Hi Li, Welcome. Both the delta streamer and data source support an option to de-duplicate data before inserting. How are you planning on writing the Hudi dataset? I can point you in the right direction accordingly Thanks Vinoth On Tue, Apr 23, 2019 at 4:12 PM Li Gao wrote: > Hi Hudi

Re: About hive table column

2019-04-24 Thread Vinoth Chandar
Hi Jun, We assign a seq_no to each record upserted in each commit. Use cases we had/have around this are to be able to building windowing/incremental consumption at record level and not commit level as we do now. Hope that helps. Thanks Vinoth On Tue, Apr 23, 2019 at 8:24 PM Jun Zhu wrote: >

How to use HoodieDeltaStreamer for upsert on JsonDFSSource

2019-04-24 Thread Jack Wang
Hi forks, Doesn't anyone know how to use HoodieDeltaStreamer for upsert on JsonDFSSource? Highly appreciated if you could provide a demo on that. thanks and regards, Jack -- [image: vshapesaqua11553186012.gif] *Jianbin Wang* Sr. Engineer II, Data +86 18633600964