Here is an error message:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2245)
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:242)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
at java.util.ArrayList.add(ArrayList.java:440)
at
org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:95)
at
org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:70)
at
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:216)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


And the note from the SortByShuffler.java
              // TODO: implement this by accumulating rows with the same
key into a list.
              // Note that this list needs to improved to prevent excessive
memory usage, but this
              // can be done in later phase.


The join sql run success when i use hive on mapreduce.
So how do mapreduce deal with it?
And Is there plan to improved to prevent excessive memory usage?

Best wishes!
Thanks!

Reply via email to