Hello,
We are running Hive 0.12 and using the hive.auto.convert.join feature when :
hive.auto.convert.join.noconditionaltask.size = 50000000
hive.mapjoin.followby.gby.localtask.max.memory.usage = 0.7
The query is a mapjoin with a group by afterwards like so :
select id,x,max(y)
from (
select t1.id,t1.x,t2.y from tbl1 join tbl2 on (t1.id=t2.id)
) z
group by id,x;
While executing a join to a table that has ~3m rows we are failing on :
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException:
2014-06-10 04:42:21 Processing rows: 2500000 Hashtable size: 2499999
Memory usage:704765184 percentage: 0.701
at
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
This is understood as we pass the 70% limit.
But, the table only takes 35mb in the HDFS and somehow reading it to the hash
table increases it size drastically when in the end it fails after reaching
~700mb.
So this is the first question - why does it take so much space in memory?
Later, i tried to increase hive.mapjoin.followby.gby.localtask.max.memory.usage
to allow the mapjoin to finish. By doing so i got another problem.
The table is in fact loaded to memory as seen here :
Processing rows: 2900000 Hashtable size: 2899999 Memory usage:
818590784 percentage: 0.815
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42 Processing rows:
2900000 Hashtable size: 2899999 Memory usage: 818590784 percentage:
0.815
INFO exec.TableScanOperator: 0 finished. closing...
INFO exec.TableScanOperator: 0 forwarded 2946773 rows
INFO exec.HashTableSinkOperator: 1 finished. closing...
INFO exec.HashTableSinkOperator: Temp URI for side table:
file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2
Dump the side-table into file:
file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:42 Dump the side-table into
file:
file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
Upload 1 File to:
file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 2014-05-28 12:16:45 Upload 1 File to:
file:/tmp/hadoop/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-local-10004/HashTable-Stage-2/MapJoin-mapfile691--.hashtable
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.HashTableSinkOperator: 1 forwarded 0 rows
INFO exec.TableScanOperator: 0 Close done
End of local task; Time Taken: 10.745 sec.
But then, the join stage hangs for long time and fails on OOM.
>From the logs, i can see that it hangs on this line :
2014-05-28 12:16:58,229 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator:
******* Load from HashTable File: input :
maprfs:/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10003/000000_0
2014-05-28 12:16:58,230 INFO org.apache.hadoop.hive.ql.exec.MapJoinOperator:
Load back 1 hashtable file from tmp file
uri:/tmp/mapr-hadoop/mapred/local/taskTracker/hadoop/distcache/-479500712399318067_367753608_1109273133/maprfs/user/hadoop/tmp/hive/hive_2014-05-28_12-16-21_239_3089817264132856114-94/-mr-10005/HashTable-Stage-2/Stage-2.tar.gz/MapJoin-mapfile691--.hashtable
2014-05-28 12:18:31,302 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6
finished. closing...
It hangs on "Load back 1 hashtable file from tmp" for 1:33 minutes and then we
get the exception :
2014-05-28 12:18:31,304 WARN org.apache.hadoop.ipc.Client: Unexpected error
reading responses on connection Thread[IPC Client (47) connection to
/127.0.0.1:48520 from job_201405191528_9910,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
2014-05-28 12:18:31,306 INFO org.apache.hadoop.mapred.Task: Communication
exception: java.io.IOException: Call to /127.0.0.1:48520 failed on local
exception: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:275)
at $Proxy0.ping(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:680)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:732)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:209)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:829)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:725)
Port 127.0.0.1:48520 is the tasktracker.
The file the local stage uploaded "MapJoin-mapfile691--.hashtable" is only 87MB
The zip in which its located "Stage-2.tar.gz" is only 23MB
What's going on here? Why can the join continue successfully?
Last, i tried removing the group by from the query. After doing so, the query
ends with no problem (setting
hive.mapjoin.followby.gby.localtask.max.memory.usage more than 0.82)
No hangs or anything.
How can the group by effect the "Load back 1 hashtable file from tmp" step in
any way?
Thanks in advance for any answers/comments.
-----------------------------------------------
[cid:[email protected]]
Dima Machlin, Big Data Architect
15 Abba Eban Blvd. PO Box 4125, Herzliya 46140 IL
P: +972-9-9518147 |M: +972-54-5671337|F: +972-9-9584736
Pursway.com<http://www.pursway.com/>