Hi, Got the following error when running the full Wikipedia links example (using RecommenderJob) after the 3rd day of execution:
10/12/19 02:24:08 INFO mapred.JobClient: map 100% reduce 29% 10/12/19 02:32:29 INFO mapred.JobClient: Task Id : attempt_201012151738_0012_r_000002_0, Status : FAILED java.io.IOException: Task: attempt_201012151738_0012_r_000002_0 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.IOException: Intermediate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501) Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123) at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at org.apache.hadoop.mapred.Merger.merge(Merger.java:107) at org.apache.hadoop.mapred.Merger.merge(Merger.java:93) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551) ... 1 more Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159) at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140) at org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97) ... 9 more I was running this on a local hadoop installation 20.2 and I allocated 1GB heap for 8 mapreduce mappers and reducers using an 8 core server with 20GB ram. Reckon the workers may have run out of memory as it appears to have failed when doing some in memory operations. If it's of any use to anybody I can upload the the log files for diagnostics to S3. Cheers -- Niall Riddell
