Hi Prasanth, Here is the explain plan for the insert query:
OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5 Stage-4 Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 Stage-2 depends on stages: Stage-0 Stage-3 Stage-5 Stage-6 depends on stages: Stage-5 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: original_table Statistics: Num rows: 44962613 Data size: 264560040271 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ... outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27 Statistics: Num rows: 44962613 Data size: 264560040271 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: true Statistics: Num rows: 44962613 Data size: 264560040271 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: pin.dynamic_partitioned_table Stage: Stage-7 Conditional Operator Stage: Stage-4 Move Operator files: hdfs directory: true destination: hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000 Stage: Stage-0 Move Operator tables: partition: dt 2016-04-05 title_id title_id_type replace: true table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: pin.dynamic_partitioned_table Stage: Stage-2 Stats-Aggr Operator Stage: Stage-3 Merge File Operator Map Operator Tree: ORC File Merge Operator merge level: stripe input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Stage: Stage-5 Merge File Operator Map Operator Tree: ORC File Merge Operator merge level: stripe input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat Stage: Stage-6 Move Operator files: hdfs directory: true destination: hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000 Thank you, Matt On Mon, May 2, 2016 at 12:48 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > Can you please post explain plan for your insert query? I suspect sorted > dynamic partition optimization is bailing out because of > the constant value for ‘dt' column. If you are not seeing a reducer then > its likely not using the sorted dynamic partition optimization. > You are probably hitting this bug > https://issues.apache.org/jira/browse/HIVE-12893 > I can confirm if thats the case by looking at the explain plan. > > Thanks > Prasanth > > On May 2, 2016, at 2:24 PM, Ryan Harris <ryan.har...@zionsbancorp.com> > wrote: > > reading this: > "but when I add 2000 new titles with 300 rows each" > I'm thinking that you are over-partitioning your data.... > I'm not sure exactly how that relates to the OOM error you are getting (it > may not)....I'd test things out partitioning by date-only.... maybe date + > title_type, but adding 2000+ dynamic partitions that each have 300 rows of > data in them is asking for problems in Hive IMO... > > > *From:* Matt Olson [mailto:maolso...@gmail.com <maolso...@gmail.com>] > *Sent:* Friday, April 29, 2016 7:50 PM > *To:* user@hive.apache.org > *Subject:* Container out of memory: ORC format with many dynamic > partitions > > Hi all, > > I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, > creating dynamic partitions. I am selecting from a table partitioned by dt > and category, and inserting into a table partitioned by dt, title, and > title_type. Other than the partitioning, the tables have the same schemas. > Both title and title_type are fields in the first table, and when I insert > into the second table, I am using them to create dynamic partitions. The .q > file with the CREATE and INSERT statements is copied below. > > SET hive.optimize.sort.dynamic.partition=true; > SET hive.exec.orc.memory.pool=1.0; > SET hive.exec.max.dynamic.partitions = 5000; > SET hive.exec.max.dynamic.partitions.pernode = 5000; > SET hive.merge.mapfiles = true; > SET mapred.min.split.size=134217728; > SET mapred.min.split.size.per.node=134217728; > SET mapred.min.split.size.per.rack=134217728; > SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec; > SET > mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec; > SET mapred.max.split.size=134217728; > SET hive.map.aggr.hash.percentmemory=0.125; > SET hive.exec.parallel=true; > SET hive.exec.compress.intermediate=true; > SET hive.exec.compress.output=true; > SET mapred.map.child.java.opts=-Xmx2048M; > SET mapred.child.java.opts=-Xmx2048M; > SET mapred.task.profile=false; > > CREATE EXTERNAL TABLE IF NOT EXISTS dynamic_partition_table ( > > field1 string, > > field2 string, > > ... > > field26 string > > ) > PARTITIONED BY (dt string, title string, title_type string) > STORED AS ORC > LOCATION '/hive/warehouse/partitioned_table' > TBLPROPERTIES ("orc.compress.size"="16000"); > > INSERT OVERWRITE TABLE dynamic_partition_table PARTITION (dt="2016-04-05", > title, title_type) > SELECT > > field1, > > field2, > ... > > title, > > title_type > > FROM original_table > WHERE dt = "2016-04-05"; > > The original table has about 250 GB of data for 2016-04-05, and about 260 > different titles (some titles have very little data, some have ~20 GB). > There is generally only one title_type per title. The INSERT action > succeeds on that data set, but when I add 2000 new titles with 300 rows > each to the original table, I get the following error during the INSERT: > > > Container [pid=6278,containerID=container_e26_1460661845156_49295_01_000244] > is running beyond physical memory limits. Current usage: 2.2 GB of 2 GB > physical memory used; 2.7 GB of 4.2 GB virtual memory used. Killing container. > > > I've found a couple questions online about this same error message for ORC > files with lots of dynamic partitions, on an older version of Hive: > https://qnalist.com/questions/4836037/hive-0-12-orc-heap-issues-on-write > > Based on that and the information about configuration properties at > https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat, > I have tried setting hive.exec.orc.memory.pool=1.0 in order to give as much > heap space as possible to the ORC file writers. As you can see from the > CREATE TABLE statement, I also decreased the orc.compress.size from the > default 256 kb to 16 kb. After making these changes, the INSERT is still > failing with the "beyond physical memory limits" error. > > I've tried inserting into a table stored as RCFile rather than ORC, and in > that case the action succeeds even with the additional 2000 titles. > > Can anyone explain how exactly the two ORC parameters above affect the > writing of dynamic partitions in ORC files, and why I'm not getting the OOM > error when I use the RCFile format instead? I'd also appreciate any > suggestions for other tuning I could do to fix the memory management when > using ORC. > > Thanks for any help, > Matt > ------------------------------ > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. > > >