RE: Nested map reduce job

2012-05-05 Thread Mingxi Wu
You may not need nested map-reduce job. All you need to do is to use keys to partition the permutation. And duplicate the data from map. output.collect(1, value); output.collect(2, value); . . . output.collect(n, value); Then, set your reducer number to n. When you emit data in the mapper, th

upload hang at DFSClient$DFSOutputStream.close(3488)

2012-04-15 Thread Mingxi Wu
Hi, I use hadoop cloudera 0.20.2-cdh3u0. I have a program which uploads local files to HDFS every hour. Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And write to HDFS file. After less than two days, it will hang. It hangs at FSDataOutputStream.close(86). Here is the

rename after copying on HDFS does not succeed

2011-12-03 Thread Mingxi Wu
Hi , I create a copy of a file in HDFS using org.apache.hadoop.fs.FileUtil.copy(dfs, file1, dfs, file2, false, false, dfs.getConf()); boolean succ = dfs.rename(new Path(file2), new Path(file3)); The rename always return false, any suggestions? Is that because FileUtil.copy() is not closing