You can ignore this for now. I was able to get merging of files to work under Hadoop Streaming by using the following 2 properties:
-mapper "cut -f2-" -Dmapred.reduce.tasks=0 On Fri, May 24, 2013 at 12:55 AM, Something Something < [email protected]> wrote: > Hello, > > Trying to use Hadoop Streaming to create output that contains no key - > just value. > > Here's what I am trying: > > 1) Created IdentifierResolver as follows: > > public class MyIdentifierResolver extends IdentifierResolver { > > public void resolve(String identifier) { > System.out.println("Entered resolve with identifier: " + > identifier); > super.resolve(identifier); > if (identifier.equals("NullWritable")) { > System.out.println("Setting output key class to NullWritable"); > setOutputKeyClass(NullWritable.class); > } > } > > > 2) Set the properties as follows: > > -Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \ > -Dstream.map.output=NullWritable \ > -Dstream.reduce.output=NullWritable > > > This should work right? But it's still writing the 'key' in the output. > Is there a better way to do this in Hadoop? > > Note: Basically, we are trying to merge files (over 2000) into smaller > number of files (e.g. 500). The files are too big so 'getmerge' does not > work 'cause we run into space issues. > > Please help. Thanks. >
