Re: [Discussion] Optimize the Update Performance

2020-05-13 Thread Ajantha Bhat
Hi !, Update is still using converter step with bad record handing. If it is update by dataframe scenario no need of bad record handling, only for update by value case we can keep it. This can give significant improvement as we already observed in insert flow. I tried once to send it to new inse

Re: [Discussion] Optimize the Update Performance

2020-05-13 Thread haomarch
I have serveral ideas to optimize the update performance: 1. Reduce the storage size of tupleId: The tupleId is too long leading heavily shuffle IO overhead while join change table with target table. 2. Avoid to convert String to UTF8String in the row processing. Before write rows into delta

Re: [Disscussion] Support GloabalSort in the CDC Flow

2020-05-13 Thread haomarch
Shall we use NO_SORT to write delta files? while using Gloabal to write base files? It will improve the update performance a lot. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/