Re: Could Hudi Data lake support low latency, high throughput random reads?

2021-06-27 Thread Jialun Liu
n > how much of your target dynamoDB table changes between loads, it can save > you cost and time. > > On Sat, Jun 26, 2021 at 5:43 PM Jialun Liu wrote: > > > Hey Vinoth, > > > > Thanks for your reply! > > > > I am actually looking into a differen

Re: Could Hudi Data lake support low latency, high throughput random reads?

2021-06-26 Thread Jialun Liu
norm here. :) > There is probably some work to do here for scaling it for large amounts of > data. > > Hope that helps. > > Thanks > Vinoth > > On Mon, Jun 7, 2021 at 4:04 PM Jialun Liu wrote: > > > Hey Gary, > > > > Thanks for your reply! > > > > T

Re: Could Hudi Data lake support low latency, high throughput random reads?

2021-06-07 Thread Jialun Liu
st, > Gary > > On Sun, Jun 6, 2021 at 12:29 PM Jialun Liu wrote: > > > Hey Felix, > > > > Thanks for your reply! > > > > I briefly researched in Presto, it looks like it is designed to support > the > > high concurrency of Big data SQL query.

Re: Could Hudi Data lake support low latency, high throughput random reads?

2021-06-05 Thread Jialun Liu
at 1:33 PM Kizhakkel Jose, Felix wrote: > Hi Bill, > > Did you try using Presto (from EMR) to query HUDI tables on S3, and it > could support real time queries. And you have to partition your data > properly to minimize the amount of data each query has to scan/process. > > Regards

Could Hudi Data lake support low latency, high throughput random reads?

2021-06-05 Thread Jialun Liu
Hey guys, I am not sure if this is the right forum for this question, if you know where this should be directed, appreciated for your help! The question is that "Could Hudi Data lake support low latency, high throughput random reads?". I am considering building a data lake that produces

Re: Apache Hudi Data Reconciliation

2020-09-13 Thread Jialun Liu
Balaji.V > > > > > > On Thursday, September 10, 2020, 11:05:44 AM PDT, Jialun Liu < > liujialu...@gmail.com> wrote: > > Hey Gray, > > Thanks for replying so quickly! > > Could you please point me to the documentation of this feature? I would > love to

Re: Apache Hudi Data Reconciliation

2020-09-10 Thread Jialun Liu
e your own payload > class to handle precombine(dedup within delta) and > updateHistoryRecord(delta merge with history). The default payload is > updateWithLatestRecord. > > Gary Li > ________ > From: Jialun Liu > Sent: Thursday, September

Apache Hudi Data Reconciliation

2020-09-09 Thread Jialun Liu
Hey guys, I want to confirm if Apache Hudi has the capability of handling data reconciliation for use cases like late record, out of order records, retry etc. A simple example: @11:00 RecordA, updatedAt = 11:00 (failed to update) @11:30 RecordA, updatedAt = 11:30 (success) @12:00 (Retry the