Hi,

Got the following error when running the full Wikipedia links example (using
RecommenderJob) after the 3rd day of execution:

10/12/19 02:24:08 INFO mapred.JobClient:  map 100% reduce 29%
10/12/19 02:32:29 INFO mapred.JobClient: Task Id :
attempt_201012151738_0012_r_000002_0, Status : FAILED
java.io.IOException: Task: attempt_201012151738_0012_r_000002_0 - The reduce
copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: Intermediate merge failed
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
Caused by: java.lang.RuntimeException: java.io.EOFException
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551)
... 1 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
at
org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
... 9 more

I was running this on a local hadoop installation 20.2 and I allocated 1GB
heap for 8 mapreduce mappers and reducers using an 8 core server with 20GB
ram.

Reckon the workers may have run out of memory as it appears to have failed
when doing some in memory operations.

If it's of any use to anybody I can upload the the log files for diagnostics
to S3.

Cheers
-- 
Niall Riddell

Reply via email to