[ https://issues.apache.org/jira/browse/HUDI-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843110#comment-17843110 ]
Vinoth Chandar commented on HUDI-7229: -------------------------------------- Punting this to 1.1 # [1.1] Implement support on top of data blocks. ## we need to pass change columns information and operation all the way to write handles, using a field in HoodieRecord ## ... # [1.1] Implement support on top of cdc data blocks. ## we can track similar bitmaps for cdc data blocks as well ## we need to extend the new file group reader to also merge base and cdc blocks. (not just base and data blocks). > Enable partial updates for CDC work payload > ------------------------------------------- > > Key: HUDI-7229 > URL: https://issues.apache.org/jira/browse/HUDI-7229 > Project: Apache Hudi > Issue Type: Task > Reporter: Lin Liu > Assignee: Vinoth Chandar > Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > OLTP workloads on upstream databases, often update/delete/insert different > columns in the table on each operation. Currently, Hudi can only supporting > partial updates in cases where the same columns are being mutated in a given > write to Hudi (e.g Spark SQL ETLs with MIT or Update statements). Here, we > explore what it takes to support a smarter storage format, that can only > encode the changed columns into log along with the different implementations. > h2. Goals > # Enable partial update functionality for all existing and potential future > CDC workloads without huge modification or duplication. > # Performance parity with current full-record updates or partial updates > across the same set of columns > # Exhibit reduction in storage costs, by only storing the changed columns. > # Should also result in computation cost reductions by scanning/processing > less data > # Should not affect the scalability of the existing system ingestion system. > The number of files generated for partial update should not increase > dramatically. > -- This message was sent by Atlassian Jira (v8.20.10#820010)