Re: [DISCUSS] Write failed records

2020-05-24 Thread Vinoth Chandar
Hi Raymond, Thanks for starting this discussion. Agree on 1.. (we may also need some CLI support for inspecting bad/record and also code samples to consume them etc?) On 2, these place seem appropriate. We can figure it out, in more detail when we get to implementation? On 3. +1 on logs.. We sh

Re: [jira] [Updated] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org

2020-05-24 Thread Suneel Marthi
This was fixed by INFRa yesterday - r u sure its not happening? On Sun, May 24, 2020 at 5:06 AM Vinoth Chandar (Jira) wrote: > > [ > https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Vinoth Chandar updated HUDI-927: > --

Re: hudi dependency conflicts for test

2020-05-24 Thread Vinoth Chandar
Great team work everyone! Anything worth documenting here? https://cwiki.apache.org/confluence/display/HUDI/Troubleshooting+Guide On Thu, May 21, 2020 at 11:02 PM Lian Jiang wrote: > The root cause is that I need to use java 8 instead of the default java 11 > in intellij. Thanks everyone for he

Hudi Global Bloom Index Issue

2020-05-24 Thread Dubey, Raghu
Hi Team, I used DeltaStreamer to ingest data and performed a test where the partition column changes. When the partition column in my dataset got updated, my hive query on Hudi dataset returned 2 rows for the same recordKey. This was expected and I got the explanation in this issue. https://gi

[ANNOUNCE] Hudi Community Weekly Update (2020-05-17 ~ 2020-05-24)

2020-05-24 Thread leesf
Dear community, Nice to share Hudi community weekly update for 2020-05-17 ~ 2020-05-24 with updates on features, discussion, bug fix and tests. === Discussion [Writer Core] A discussion about supporting log append scenario with better write and asynchronous co

Re: Hudi Global Bloom Index Issue

2020-05-24 Thread Sivabalan
Hi Raghu, Hudi has a property named "hoodie.bloom.index.update.partition.path". You might want to try setting this to true if you need the behavior you are expecting. Here is the docs docs for this config. Default value is false for this config param. /** * Only applies if index type is GLOBAL

Re: [Discussion] hudi support log append scenario with better write and asynchronous compaction

2020-05-24 Thread Vinoth Chandar
Thank you for your patience... This was a thought provoking RFC. I think we can solve a even more generalized problem here.. data clustering (which we support in limited form for only bulk_insert today). Please read my comment here.. https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+hudi