Hi all,

I am running a Pig Script which is running fine for small data. But when I
scale the data, I am getting the following error at my map stage.
Please refer to the map logs as below.

My Pig script is doing a group by first, followed by a join on the grouped
data.


Any clues to understand where I should look at or how shall I deal with
this situation. I don't want to just go by just increasing the heap space.
My map jvm heap space is already 3 GB with io.sort.mb = 768 MB.

2014-02-06 19:15:12,243 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable 2014-02-06 19:15:15,025 INFO
org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2014-02-06 19:15:15,123 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2bd9e282 2014-02-06
19:15:15,546 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 768
2014-02-06 19:15:19,846 INFO org.apache.hadoop.mapred.MapTask: data buffer
= 612032832/644245088 2014-02-06 19:15:19,846 INFO
org.apache.hadoop.mapred.MapTask: record buffer = 9563013/10066330
2014-02-06 19:15:20,037 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor 2014-02-06 19:15:21,083 INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
Created input record counter: Input records from _1_tmp1327641329
2014-02-06 19:15:52,894 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: buffer full= true 2014-02-06 19:15:52,895 INFO
org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 611949600; bufvoid
= 644245088 2014-02-06 19:15:52,895 INFO org.apache.hadoop.mapred.MapTask:
kvstart = 0; kvend = 576; length = 10066330 2014-02-06 19:16:06,182 INFO
org.apache.hadoop.mapred.MapTask: Finished spill 0 2014-02-06 19:16:16,169
INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler
call - Collection threshold init = 328728576(321024K) used =
1175055104(1147514K) committed = 1770848256(1729344K) max =
2097152000(2048000K) 2014-02-06 19:16:20,446 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of
308540402 bytes from 1 objects. init = 328728576(321024K) used =
1175055104(1147514K) committed = 1770848256(1729344K) max =
2097152000(2048000K) 2014-02-06 19:17:22,246 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call-
Usage threshold init = 328728576(321024K) used = 1768466512(1727018K)
committed = 1770848256(1729344K) max = 2097152000(2048000K) 2014-02-06
19:17:35,597 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled
an estimate of 1073462600 bytes from 1 objects. init = 328728576(321024K)
used = 1768466512(1727018K) committed = 1770848256(1729344K) max =
2097152000(2048000K) 2014-02-06 19:18:01,276 INFO
org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true
2014-02-06 19:18:01,288 INFO org.apache.hadoop.mapred.MapTask: bufstart =
611949600; bufend = 52332788; bufvoid = 644245088 2014-02-06 19:18:01,288
INFO org.apache.hadoop.mapred.MapTask: kvstart = 576; kvend = 777; length =
10066330 2014-02-06 19:18:03,377 INFO org.apache.hadoop.mapred.MapTask:
Finished spill 1 2014-02-06 19:18:05,494 INFO
org.apache.hadoop.mapred.MapTask: Record too large for in-memory buffer:
644246693 bytes 2014-02-06 19:18:36,008 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of
306271368 bytes from 1 objects. init = 328728576(321024K) used =
1449267128(1415299K) committed = 2097152000(2048000K) max =
2097152000(2048000K) 2014-02-06 19:18:44,448 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater
with mapRetainSize=-1 and reduceRetainSize=-1 2014-02-06 19:18:44,780 FATAL
org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:2786) at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at
java.io.DataOutputStream.write(DataOutputStream.java:90) at
java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at
java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:454) at
org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at
org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:523) at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361) at
org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at
org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357) at
org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57) at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179) at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:1501)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1091)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:128)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Reply via email to