Re: MERGE performances issue

Oleksiy S Mon, 07 May 2018 01:52:25 -0700

>> Has anyone any positive feedback on the hive MERGE statement ?

FYI


https://issues.apache.org/jira/browse/HIVE-19286
https://issues.apache.org/jira/browse/HIVE-19295


On Mon, May 7, 2018 at 12:35 AM, Nicolas Paris <[email protected]> wrote:

> Hi,
>
> Has anyone any positive feedback on the hive MERGE statement ? There
> is some informations [1] and [2].
>
> From my experience, merging a source table of 300M rows and 100 columns
> to a target of 1.5B is 100 times slower than doing an UPDATE and an INSERT.
> It is also slower than a third approach consisting in building the
> new table from scratch, and renaming it to replace the old one.
>
> Second bad point: Right now spark is not able to read an ACID table
> without Major compaction. Meaning, the table needs to be rebuild
> from scratch behind the scene.
>
> Then I am wondering if the merge statement is impracticable because
> of bad use of myself or because this feature is just not mature enough.
>
> [1]: https://thisdataguy.com/2018/01/29/why-is-my-hive-merge-
> statement-slow/
> [2]: https://fr.hortonworks.com/blog/apache-hive-moving-
> beyond-analytics-offload-with-sql-merge/
>
>
>


-- 
Oleksiy

Re: MERGE performances issue

Reply via email to