Hi,

Can someone help me to understand this ?

Thanks,
Sid

On Mon, 21 Feb 2022, 13:02 Sid Kal, <[email protected]> wrote:

> Hi Danny,
>
> Thank you for your response.
>
> The file is actually used by the user who wrote that blog. In my actual
> dataset, I have a schema like
> customerid,customername,effective_date,customer_mob. In this case, how will
> Hudi manage CDC?
>
> Thanks,
> Sid
>
> On Mon, Feb 21, 2022 at 8:06 AM Danny Chan <[email protected]> wrote:
>
>> Hello, what is the schema of the reading file: S3_INCR_RAW_DATA ?
>>
>> Best,
>> Danny
>>
>> Sid Kal <[email protected]> 于2022年2月21日周一 03:49写道:
>> >
>> >
>> >
>> >
>> >
>> >
>> > We have a use case for which we were planning to use Hudi tables for
>> CDC purposes. Basically, my whole intention is to perform upserts along
>> with the deletes. So, if a record in my source system is deleted, it should
>> be deleted from my target as well.
>> >
>> > I went through this link where a user is performing CDC using Hudi.
>> >
>> https://towardsdatascience.com/data-lake-change-data-capture-cdc-using-apache-hudi-on-amazon-emr-part-2-process-65e4662d7b4b
>> >
>> > My question is how does Hudi internally recognize the records in the
>> incremental data load? So how should the incremental file be using which we
>> can recognize which records are meant to be appended/deleted/updates.
>> >
>> > I am actually confused with this part:
>> >
>> > S3_INCR_RAW_DATA =
>> "s3://aws-analytics-course/raw/dms/fossil/coal_prod/20200808-*.csv"
>> > df_coal_prod_incr = spark.read.csv(S3_INCR_RAW_DATA, header=False,
>> schema=coal_prod_schema)
>> > df_coal_prod_incr_u_i=df_coal_prod_incr.filter("Mode IN ('U', 'I')")
>> >
>> > Where the user is directly filtering out on mode. Is "Mode" a column
>> inside the dataset? Or how is it gonna be?
>> >
>> > I am a newbie to Hudi.
>> >
>> > Thanks,
>> > Sid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to