Are there any rules against writing results to Reducer.Context while in the cleanup() method?
Ive got a reducer that is downloading a few 10s of millions of images from a set of URLs feed to it. To be efficient I run many connections in parallel, but limit connections per domain and frequency of connections. In order to do that efficiently I read in many URLs from the reduce method and queue them in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup() method where I block until all threads have finished processing. We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes after Hadoop reports 100% input records delivered, then at the end, my code appears to exit normally and I get this exception immediately after: 2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing complete, shut down normally 1 2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logsʼ truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation 2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop .hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/p art-r-00002 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem. java:1642) I have suspicion that there are some subtle rules of Hadoops Im violating here.