MERGE performances issue

Nicolas Paris Sun, 06 May 2018 14:37:10 -0700

Hi,

Has anyone any positive feedback on the hive MERGE statement ? There
is some informations [1] and [2].


>From my experience, merging a source table of 300M rows and 100 columns
to a target of 1.5B is 100 times slower than doing an UPDATE and an INSERT.
It is also slower than a third approach consisting in building the
new table from scratch, and renaming it to replace the old one.

Second bad point: Right now spark is not able to read an ACID table
without Major compaction. Meaning, the table needs to be rebuild
from scratch behind the scene.

Then I am wondering if the merge statement is impracticable because
of bad use of myself or because this feature is just not mature enough.

[1]: https://thisdataguy.com/2018/01/29/why-is-my-hive-merge-statement-slow/
[2]:
https://fr.hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/

MERGE performances issue

Reply via email to