Hi, Has anyone any positive feedback on the hive MERGE statement ? There is some informations [1] and [2].
>From my experience, merging a source table of 300M rows and 100 columns to a target of 1.5B is 100 times slower than doing an UPDATE and an INSERT. It is also slower than a third approach consisting in building the new table from scratch, and renaming it to replace the old one. Second bad point: Right now spark is not able to read an ACID table without Major compaction. Meaning, the table needs to be rebuild from scratch behind the scene. Then I am wondering if the merge statement is impracticable because of bad use of myself or because this feature is just not mature enough. [1]: https://thisdataguy.com/2018/01/29/why-is-my-hive-merge-statement-slow/ [2]: https://fr.hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/
