Re: Joining 3 tables incrementally

2019-05-07 Thread Jaimin Shah
Hi Thanks for the quick response. As we discussed we will pull changes incrementally and join with MOR read optimized view. For example order will be pulled incrementally and will be joined with read optimized view of seller and customer. Incrementally pull seller and join with order and customer

Need Invitation to join Slack channel.

2019-05-07 Thread Prateek Mehta
Hi, I am new to the community and would like to learn and contribute more. The link shared mentions this: Don't have an @shopify.com, @amazon.com, @uber.com, @apache.org, @brane.com, @vungle.com, @qubole.com, or @csscompany.com email address? Contact your Workspace Administrator for an invitation

Re: Need Invitation to join Slack channel.

2019-05-07 Thread Vinoth Chandar
I did not realize the slack invite link limits only these domains. While I fix that, there is an issue, to post your email on https://github.com/apache/incubator-hudi/issues/143 then I can add you in. Thanks for reporting this. :) On Tue, May 7, 2019 at 2:23 AM Prateek Mehta wrote: > Hi, > I

Re: Joining 3 tables incrementally

2019-05-07 Thread Vinoth Chandar
Interesting.. you captured the pitfalls I was alluding to nicely. IIUC you are doing multiple incremental pull vs table joins to reconcile. It should work. On Tue, May 7, 2019 at 12:06 AM Jaimin Shah wrote: > Hi > > Thanks for the quick response. > As we discussed we will pull changes increm

Re: Data change events table in Hudi

2019-05-07 Thread Vinoth Chandar
Thanks for starting the thread, Minh! We do the same thing at Uber actually. Its handy to join these two at times and its a common pattern. so curious to know what others think? DeltaStreamer option seems like a good idea. Some implementation considerations on how we configure this second table e

Re: [DISCUSS] Steps to making the first Apache release

2019-05-07 Thread Thomas Weise
Hi, Renaming packages is a step that I would recommend to occur before the first ASF release. There is a lot more work: * license headers in all files * check dependencies: https://www.apache.org/legal/resolved.html * ensure no binaries are in source releases * etc etc You could also discuss wh

Re: Last commit id/ts checkpoint for incremental pull

2019-05-07 Thread Vinoth Chandar
Hi Roshan, Thanks for writing. Yes. the user needs to manage the _commit_time watermark on the HiveIncrementalPuller path. Also you need to set the table in incremental mode, providing a start commit_time and max_commits to pull as documented. The DeltaStreamer tool will manage it for you automati

Re: About github issue 639

2019-05-07 Thread Jun Zhu
Hi, I run the new code pull from master branch, and compare with another stream job which run hudi 0.4.5 on maven. Both running per 10 minutes. The roll-back worked. Top is 0.4.5, bottom is 0.4.6 [image: Screen Shot 2019-05-08 at 1.06.17 PM.png] [image: Screen Shot 2019-05-08 at 1.06.54 PM.png] And