Hello All,

I'm new to hive development and I'm memory limitation error for running a
simple query with a predicate which should return only a few records. Below
are the details of the Hive table, Query and Error. Please advise me on how
to efficiently query on predicates which does not have partitions.

Table Properties:     CREATE EXTERNAL TABLE TEST(location_id double,

longitude double,

latitude double,

state string

)

COMMENT 'This table is created for testing purposes'

PARTITIONED BY(country string, date string)

STORED AS ORC

LOCATION '<S3 Location>'

Total records:  9 Billion Records

Number of partitions: >4k

EMR Cluster Properties:   Total Memory: 48 GB

Number of Nodes: 2

Total vCores: 8

mapreduce.map.memory.mb=3072

mapreduce.map.java.opts=-Xmx2458m


Query Executed:  select * from test where location_id = 1234;

Error:Status:  Failed

Application  failed 2 times due to AM Container for exited with  exitCode:
-104

Failing this attempt.Diagnostics: Container is running beyond physical
memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.8 GB
of 5 GB virtual memory used. Killing container.

Dump of the process-tree for  :

        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

        |- 1253 1234 1234 123 (bash) 0 0 11597648 676 /bin/bash -c
/usr/lib/jvm/java-openjdk/bin/java  -Xmx819m
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/app30/container_11/tmp
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_10/container_11
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=''
org.apache.tez.dag.app.DAGAppMaster --session
1>/var/log/hadoop-yarn/containers/application_10/container_11/stdout
2>/var/log/hadoop-yarn/containers/application_10/container_11/stderr

        |- 1253 1234 1234 123  (java) 1253 1234 1234 123
 /usr/lib/jvm/java-openjdk/bin/java -Xmx819m
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_10/container_11/tmp
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_10/container_11
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=
org.apache.tez.dag.app.DAGAppMaster --session

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Reply via email to