Dear Danny Chan,
Thanks four your help again.
1. I think i didn’t make myself clear, so i drew a picture to help to express
it.
- Things begin at database and CDC / OGG stream.
- Flink handle CDC / OGG stream and do same
actions(INSERT,UPDATE,DELETE) on hudi sync table(Table A,B in picture)
- There are some other applications( flink streaming, flink batch, or spark)
downstream based on hudi sync table and produce new table(Table C in picture)
of none primary keys.
- New DownStreams based on Table A,B and new Table C;
- but i cannot detect deletion event on hudi table. This is described detailly
in the picture “Problem".
2. I didn’t understand your answer.
- "the flink writer would nullify the payload instant so that the write handle
would recognize these DELETEs and do HARD delete: do not write anything in the
file.”
do you mean that it is not Hudi Storage keep null value
but HudiFlinkWriter changes data to null for DELETION events and hudi write
handler recognize these events nullified payload are DELETIONs and then write
nothing in the file?
Best Regards!
原始邮件
发件人: Danny Chan<[email protected]>
收件人: users<[email protected]>
发送时间: 2021年7月12日(周一) 19:58
主题: Re: hudi hard deletion in flink sql AND detect deletions of upstream
Hi vtygoss ~
By default, when consuming cdc stream DELETEs, the flink writer would nullify
the payload instant so that the write handle would recognize these DELETEs and
do HARD delete: do not write anything in the file.
If you want to detect DELETEs downstream, you may need to wait for the
HUDI-1771, which would keep DELETEs with proper change flags.
Best,
Danny Chan
vtygoss <[email protected]> 于2021年7月12日周一 下午7:38写道:
Hi,
I have two problems:
1. How to specify hard deletion in hudi-flink-bundle-0.9.0?
2. How to detect the deletion events in downstream hudi-flink sql streaming?
The down streams need to detect the deletions of input hudi table and act
accordingly.
I tried to use org.apache.hudi.common.model. EmptyHoodieRecordPayload, but it
seems like that EmptyHoodieRecordPayload is not really deletion but emits null
value of none primary key? i am not sure. BTW, klass EmptyHoodieRecordPayload
is lack of a constructor of parameter klass
“org.apache.hudi.common.util.Option".
please offer some advices, thank you very much!
Best Regards!
```
CREATE TABLE t3(
uuid VARCHAR(20),
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20),
primary key(uuid) not enforced
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://bruneihealth/user/data/db/hudi_flink/t3',
'table.type' = 'MERGE_ON_READ',
'read.tasks' = '1',
'read.streaming.enabled' = 'true',
'read.streaming.check-interval' = '1',
'hoodie.datasource.write.partitionpath.field'='_hoodie_partition_path',
'write.payload.class' =
'org.apache.hudi.common.model.EmptyHoodieRecordPayload',
'compaction.async.enabled'='false'
);
```
997796e6-18ba-41f4-ab8e-132444471fc6.png
Description: Binary data
