Hi march,
Thanks for suggesting improvemnt on update.
I have gone through the paper for some highlights and here are few points
with my understanding, we can work and discuss more.
1. Since we are talking about the updating the existing file instead of new
carbon data file which
is the current lo
Hi !,
Update is still using converter step with bad record handing.
If it is update by dataframe scenario no need of bad record handling,
only for update by value case we can keep it.
This can give significant improvement as we already observed in insert flow.
I tried once to send it to new inse
I have serveral ideas to optimize the update performance:
1. Reduce the storage size of tupleId:
The tupleId is too long leading heavily shuffle IO overhead while join
change table with target table.
2. Avoid to convert String to UTF8String in the row processing.
Before write rows into delta
There is an interesting paper "L-Store: A Real-time OLTP and OLAP System",
which uses an creative way to improve update performance.
The Idea is:
*1. Store the updated column value in the tail page*.
When update any column of a record, a new tail page is created and appended
to the page dictionar