Hello All,
I'm new to hive development and I'm memory limitation error for running a
simple query with a predicate which should return only a few records. Below
are the details of the Hive table, Query and Error. Please advise me on how
to efficiently query on predicates which does not have partitions.
Table Properties: CREATE EXTERNAL TABLE TEST(location_id double,
longitude double,
latitude double,
state string
)
COMMENT 'This table is created for testing purposes'
PARTITIONED BY(country string, date string)
STORED AS ORC
LOCATION '<S3 Location>'
Total records: 9 Billion Records
Number of partitions: >4k
EMR Cluster Properties: Total Memory: 48 GB
Number of Nodes: 2
Total vCores: 8
mapreduce.map.memory.mb=3072
mapreduce.map.java.opts=-Xmx2458m
Query Executed: select * from test where location_id = 1234;
Error:Status: Failed
Application failed 2 times due to AM Container for exited with exitCode:
-104
Failing this attempt.Diagnostics: Container is running beyond physical
memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 2.8 GB
of 5 GB virtual memory used. Killing container.
Dump of the process-tree for :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 1253 1234 1234 123 (bash) 0 0 11597648 676 /bin/bash -c
/usr/lib/jvm/java-openjdk/bin/java -Xmx819m
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/app30/container_11/tmp
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_10/container_11
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=''
org.apache.tez.dag.app.DAGAppMaster --session
1>/var/log/hadoop-yarn/containers/application_10/container_11/stdout
2>/var/log/hadoop-yarn/containers/application_10/container_11/stderr
|- 1253 1234 1234 123 (java) 1253 1234 1234 123
/usr/lib/jvm/java-openjdk/bin/java -Xmx819m
-Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_10/container_11/tmp
-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_10/container_11
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=
org.apache.tez.dag.app.DAGAppMaster --session
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143