Hi Prasanth,
Here is the explain plan for the insert query:
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
Stage-4
Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
Stage-2 depends on stages: Stage-0
Stage-3
Stage-5
Stage-6 depends on stages: Stage-5
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: original_table
Statistics: Num rows: 44962613 Data size: 264560040271 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ...
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24,
_col25, _col26, _col27
Statistics: Num rows: 44962613 Data size: 264560040271 Basic
stats: COMPLETE Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 44962613 Data size: 264560040271
Basic stats: COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format:
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: pin.dynamic_partitioned_table
Stage: Stage-7
Conditional Operator
Stage: Stage-4
Move Operator
files:
hdfs directory: true
destination:
hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000
Stage: Stage-0
Move Operator
tables:
partition:
dt 2016-04-05
title_id
title_id_type
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format:
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: pin.dynamic_partitioned_table
Stage: Stage-2
Stats-Aggr Operator
Stage: Stage-3
Merge File Operator
Map Operator Tree:
ORC File Merge Operator
merge level: stripe
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
Stage: Stage-5
Merge File Operator
Map Operator Tree:
ORC File Merge Operator
merge level: stripe
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
Stage: Stage-6
Move Operator
files:
hdfs directory: true
destination:
hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000
Thank you,
Matt
On Mon, May 2, 2016 at 12:48 PM, Prasanth Jayachandran <
[email protected]> wrote:
> Hi
>
> Can you please post explain plan for your insert query? I suspect sorted
> dynamic partition optimization is bailing out because of
> the constant value for ‘dt' column. If you are not seeing a reducer then
> its likely not using the sorted dynamic partition optimization.
> You are probably hitting this bug
> https://issues.apache.org/jira/browse/HIVE-12893
> I can confirm if thats the case by looking at the explain plan.
>
> Thanks
> Prasanth
>
> On May 2, 2016, at 2:24 PM, Ryan Harris <[email protected]>
> wrote:
>
> reading this:
> "but when I add 2000 new titles with 300 rows each"
> I'm thinking that you are over-partitioning your data....
> I'm not sure exactly how that relates to the OOM error you are getting (it
> may not)....I'd test things out partitioning by date-only.... maybe date +
> title_type, but adding 2000+ dynamic partitions that each have 300 rows of
> data in them is asking for problems in Hive IMO...
>
>
> *From:* Matt Olson [mailto:[email protected] <[email protected]>]
> *Sent:* Friday, April 29, 2016 7:50 PM
> *To:* [email protected]
> *Subject:* Container out of memory: ORC format with many dynamic
> partitions
>
> Hi all,
>
> I am using Hive 1.0.1 and trying to do a simple insert into an ORC table,
> creating dynamic partitions. I am selecting from a table partitioned by dt
> and category, and inserting into a table partitioned by dt, title, and
> title_type. Other than the partitioning, the tables have the same schemas.
> Both title and title_type are fields in the first table, and when I insert
> into the second table, I am using them to create dynamic partitions. The .q
> file with the CREATE and INSERT statements is copied below.
>
> SET hive.optimize.sort.dynamic.partition=true;
> SET hive.exec.orc.memory.pool=1.0;
> SET hive.exec.max.dynamic.partitions = 5000;
> SET hive.exec.max.dynamic.partitions.pernode = 5000;
> SET hive.merge.mapfiles = true;
> SET mapred.min.split.size=134217728;
> SET mapred.min.split.size.per.node=134217728;
> SET mapred.min.split.size.per.rack=134217728;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET
> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET mapred.max.split.size=134217728;
> SET hive.map.aggr.hash.percentmemory=0.125;
> SET hive.exec.parallel=true;
> SET hive.exec.compress.intermediate=true;
> SET hive.exec.compress.output=true;
> SET mapred.map.child.java.opts=-Xmx2048M;
> SET mapred.child.java.opts=-Xmx2048M;
> SET mapred.task.profile=false;
>
> CREATE EXTERNAL TABLE IF NOT EXISTS dynamic_partition_table (
>
> field1 string,
>
> field2 string,
>
> ...
>
> field26 string
>
> )
> PARTITIONED BY (dt string, title string, title_type string)
> STORED AS ORC
> LOCATION '/hive/warehouse/partitioned_table'
> TBLPROPERTIES ("orc.compress.size"="16000");
>
> INSERT OVERWRITE TABLE dynamic_partition_table PARTITION (dt="2016-04-05",
> title, title_type)
> SELECT
>
> field1,
>
> field2,
> ...
>
> title,
>
> title_type
>
> FROM original_table
> WHERE dt = "2016-04-05";
>
> The original table has about 250 GB of data for 2016-04-05, and about 260
> different titles (some titles have very little data, some have ~20 GB).
> There is generally only one title_type per title. The INSERT action
> succeeds on that data set, but when I add 2000 new titles with 300 rows
> each to the original table, I get the following error during the INSERT:
>
>
> Container [pid=6278,containerID=container_e26_1460661845156_49295_01_000244]
> is running beyond physical memory limits. Current usage: 2.2 GB of 2 GB
> physical memory used; 2.7 GB of 4.2 GB virtual memory used. Killing container.
>
>
> I've found a couple questions online about this same error message for ORC
> files with lots of dynamic partitions, on an older version of Hive:
> https://qnalist.com/questions/4836037/hive-0-12-orc-heap-issues-on-write
>
> Based on that and the information about configuration properties at
> https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat,
> I have tried setting hive.exec.orc.memory.pool=1.0 in order to give as much
> heap space as possible to the ORC file writers. As you can see from the
> CREATE TABLE statement, I also decreased the orc.compress.size from the
> default 256 kb to 16 kb. After making these changes, the INSERT is still
> failing with the "beyond physical memory limits" error.
>
> I've tried inserting into a table stored as RCFile rather than ORC, and in
> that case the action succeeds even with the additional 2000 titles.
>
> Can anyone explain how exactly the two ORC parameters above affect the
> writing of dynamic partitions in ORC files, and why I'm not getting the OOM
> error when I use the RCFile format instead? I'd also appreciate any
> suggestions for other tuning I could do to fix the memory management when
> using ORC.
>
> Thanks for any help,
> Matt
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
>