Hudi support for records deduplication

2019-04-23 Thread Li Gao
Hi Hudi community, I am fairly new to the hudi community and trying to evaluate whether hudi's incremental compaction support record deduplications from landing data coming off kafka. If yes I want to understand how it works currently. Thank you, Li

Re: [IMP] Understanding present state and planning ahead

2019-04-23 Thread Vinoth Chandar
Bumping this thread again. We got 9 responses so far, with 5 production use-cases. One more humble request.. If you are using it in production, if you could add a small line here describing the use-case https://github.com/apache/incubator-hudi/blob/asf-site/docs/powered_by.md It would go a long

Re: Not able to find HoodieJavaApp

2019-04-23 Thread Vinoth Chandar
Hi Umesh, I took a pass. Moving HoodieTestDataGenerator into src/java is not a good idea. However, I have written up a simple demo app using the stock data that we already use in our dockerized demo https://github.com/vinothchandar/incubator-hudi/tree/quickstart Once you grab the code, build it