[ 
https://issues.apache.org/jira/browse/HUDI-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843110#comment-17843110
 ] 

Vinoth Chandar commented on HUDI-7229:
--------------------------------------

Punting this to 1.1 


 # [1.1] Implement support on top of data blocks.
 ## we need to pass change columns information and operation all the way to 
write handles, using a field in HoodieRecord
 ## ... 
 # [1.1] Implement support on top of cdc data blocks.
 ## we can track similar bitmaps for cdc data blocks as well
 ## we need to extend the new file group reader to also merge base and cdc 
blocks. (not just base and data blocks).

> Enable partial updates for CDC work payload
> -------------------------------------------
>
>                 Key: HUDI-7229
>                 URL: https://issues.apache.org/jira/browse/HUDI-7229
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Lin Liu
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> OLTP workloads on upstream databases, often update/delete/insert different 
> columns in the table on each operation. Currently, Hudi can only supporting 
> partial updates in cases where the same columns are being mutated in a given 
> write to Hudi (e.g Spark SQL ETLs with MIT or Update statements). Here, we 
> explore what it takes to support a smarter storage format, that can only 
> encode the changed columns into log along with the different implementations.
> h2. Goals
>  # Enable partial update functionality for all existing and potential future 
> CDC workloads without huge modification or duplication.
>  # Performance parity with current full-record updates or partial updates 
> across the same set of columns
>  # Exhibit reduction in storage costs, by only storing the changed columns.
>  # Should also result in computation cost reductions by scanning/processing 
> less data
>  # Should not affect the scalability of the existing system ingestion system. 
> The number of files generated for partial update should not increase 
> dramatically.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to