Re: Container out of memory: ORC format with many dynamic partitions

Matt Olson Mon, 02 May 2016 13:31:19 -0700

Hi Prasanth,

Here is the explain plan for the insert query:


OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: original_table
            Statistics: Num rows: 44962613 Data size: 264560040271 Basic
stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: ...
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24,
_col25, _col26, _col27
              Statistics: Num rows: 44962613 Data size: 264560040271 Basic
stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: true
                Statistics: Num rows: 44962613 Data size: 264560040271
Basic stats: COMPLETE Column stats: NONE
                table:
                    input format:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format:
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: pin.dynamic_partitioned_table

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination:
hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            dt 2016-04-05
            title_id
            title_id_type
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format:
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: pin.dynamic_partitioned_table

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-5
    Merge File Operator
      Map Operator Tree:
          ORC File Merge Operator
      merge level: stripe
      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination:
hdfs://ci-ocean/mnt/tmp/hive-molson/molson/9f6b1ce0-f71a-4c87-9440-77f09e3860eb/hive_2016-05-02_20-14-12_260_7512820923555713567-1/-ext-10000

Thank you,
Matt


On Mon, May 2, 2016 at 12:48 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Hi
>
> Can you please post explain plan for your insert query? I suspect sorted
> dynamic partition optimization is bailing out because of
> the constant value for ‘dt' column. If you are not seeing a reducer then
> its likely not using the sorted dynamic partition optimization.
> You are probably hitting this bug
> https://issues.apache.org/jira/browse/HIVE-12893
> I can confirm if thats the case by looking at the explain plan.
>
> Thanks
> Prasanth
>
> On May 2, 2016, at 2:24 PM, Ryan Harris <ryan.har...@zionsbancorp.com>
> wrote:
>
> reading this:
> "but when I add 2000 new titles with 300 rows each"
> I'm thinking that you are over-partitioning your data....
> I'm not sure exactly how that relates to the OOM error you are getting (it
> may not)....I'd test things out partitioning by date-only.... maybe date +
> title_type, but adding 2000+ dynamic partitions that each have 300 rows of
> data in them is asking for problems in Hive IMO...
>
>
> *From:* Matt Olson [mailto:maolso...@gmail.com <maolso...@gmail.com>]
> *Sent:* Friday, April 29, 2016 7:50 PM
> *To:* user@hive.apache.org
> *Subject:* Container out of memory: ORC format with many dynamic
> partitions
>
> Hi all,
>
> I am using Hive 1.0.1 and trying to do a simple insert into an ORC table,
> creating dynamic partitions. I am selecting from a table partitioned by dt
> and category, and inserting into a table partitioned by dt, title, and
> title_type. Other than the partitioning, the tables have the same schemas.
> Both title and title_type are fields in the first table, and when I insert
> into the second table, I am using them to create dynamic partitions. The .q
> file with the CREATE and INSERT statements is copied below.
>
> SET hive.optimize.sort.dynamic.partition=true;
> SET hive.exec.orc.memory.pool=1.0;
> SET hive.exec.max.dynamic.partitions = 5000;
> SET hive.exec.max.dynamic.partitions.pernode = 5000;
> SET hive.merge.mapfiles = true;
> SET mapred.min.split.size=134217728;
> SET mapred.min.split.size.per.node=134217728;
> SET mapred.min.split.size.per.rack=134217728;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET
> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET mapred.max.split.size=134217728;
> SET hive.map.aggr.hash.percentmemory=0.125;
> SET hive.exec.parallel=true;
> SET hive.exec.compress.intermediate=true;
> SET hive.exec.compress.output=true;
> SET mapred.map.child.java.opts=-Xmx2048M;
> SET mapred.child.java.opts=-Xmx2048M;
> SET mapred.task.profile=false;
>
> CREATE EXTERNAL TABLE IF NOT EXISTS dynamic_partition_table (
>
> field1 string,
>
> field2 string,
>
> ...
>
> field26 string
>
> )
> PARTITIONED BY (dt string, title string, title_type string)
> STORED AS ORC
> LOCATION '/hive/warehouse/partitioned_table'
> TBLPROPERTIES ("orc.compress.size"="16000");
>
> INSERT OVERWRITE TABLE dynamic_partition_table PARTITION (dt="2016-04-05",
> title, title_type)
> SELECT
>
> field1,
>
> field2,
> ...
>
> title,
>
> title_type
>
> FROM original_table
> WHERE dt = "2016-04-05";
>
> The original table has about 250 GB of data for 2016-04-05, and about 260
> different titles (some titles have very little data, some have ~20 GB).
> There is generally only one title_type per title. The INSERT action
> succeeds on that data set, but when I add 2000 new titles with 300 rows
> each to the original table, I get the following error during the INSERT:
>
>
> Container [pid=6278,containerID=container_e26_1460661845156_49295_01_000244] 
> is running beyond physical memory limits. Current usage: 2.2 GB of 2 GB 
> physical memory used; 2.7 GB of 4.2 GB virtual memory used. Killing container.
>
>
> I've found a couple questions online about this same error message for ORC
> files with lots of dynamic partitions, on an older version of Hive:
> https://qnalist.com/questions/4836037/hive-0-12-orc-heap-issues-on-write
>
> Based on that and the information about configuration properties at
> https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat,
> I have tried setting hive.exec.orc.memory.pool=1.0 in order to give as much
> heap space as possible to the ORC file writers. As you can see from the
> CREATE TABLE statement, I also decreased the orc.compress.size from the
> default 256 kb to 16 kb. After making these changes, the INSERT is still
> failing with the "beyond physical memory limits" error.
>
> I've tried inserting into a table stored as RCFile rather than ORC, and in
> that case the action succeeds even with the additional 2000 titles.
>
> Can anyone explain how exactly the two ORC parameters above affect the
> writing of dynamic partitions in ORC files, and why I'm not getting the OOM
> error when I use the RCFile format instead?  I'd also appreciate any
> suggestions for other tuning I could do to fix the memory management when
> using ORC.
>
> Thanks for any help,
> Matt
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>
>
>

Re: Container out of memory: ORC format with many dynamic partitions

Reply via email to