Re: Optimal approach for changing file format of a partitioned table

2018-08-06 Thread Furcy Pin
Hi Elliot, >From your description of the problem, I'm assuming that you are doing a INSERT OVERWRITE table PARTITION(p1, p2) SELECT * FROM table or something close, like a CREATE TABLE AS ... maybe. If this is the case, I suspect that your shuffle phase comes from dynamic partitioning, and in pa

Subqueries two tables to one in Hive

2018-08-06 Thread Sowjanya Kakarala
Hi Everyone, I am trying to insert data from 2tables to one table as separate columns. Example: Table1 as A: Id Data time_stamp 1 0.1 2018-01-01 2 0.2 2018-01-01 3 0.3 2018-01-02 Table2 as B Id Data time_stamp 1 1.1 2018-01-01 2 2.2 2018-01-01 3 1.3 2018-01-02 Now I a

Re: Optimal approach for changing file format of a partitioned table

2018-08-06 Thread Gopal Vijayaraghavan
A hive version would help to preface this, because that matters for this (like TEZ-3709 doesn't apply for hive-1.2). > I’m trying to simply change the format of a very large partitioned table from > Json to ORC. I’m finding that it is unexpectedly resource intensive, > primarily due to a shu