Re: Compile error using contrib.utils.join package with new mapreduce API

Hemanth Yamijala Tue, 15 Jan 2013 09:29:45 -0800

On the dev mailing list, Harsh pointed out that there is another join
related package:
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/


This seems to be available in 2.x and trunk. Could you check if this
provides functionality you require - so we at least know there is new API
support in later versions ?

Thanks
Hemanth


On Mon, Jan 14, 2013 at 7:45 PM, Hemanth Yamijala <yhema...@thoughtworks.com
> wrote:

> Hi,
>
> No. I didn't find any reference to a working sample. I also didn't find
> any JIRA that asks for a migration of this package to the new API. Not sure
> why. I have asked on the dev list.
>
> Thanks
> hemanth
>
>
> On Mon, Jan 14, 2013 at 6:25 PM, Michael Forage <
> michael.for...@livenation.co.uk> wrote:
>
>>  Thanks Hemanth****
>>
>> ** **
>>
>> I appreciate your response****
>>
>> Did you find any working example of it in use? It looks to me like I’d
>> still be tied to the old API****
>>
>> Thanks****
>>
>> Mike****
>>
>> ** **
>>
>> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
>> *Sent:* 14 January 2013 05:08
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Compile error using contrib.utils.join package with new
>> mapreduce API****
>>
>> ** **
>>
>> Hi,****
>>
>> ** **
>>
>> The datajoin package has a class called DataJoinJob (
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/contrib/utils/join/DataJoinJob.html
>> )****
>>
>> ** **
>>
>> I think using this will help you get around the issue you are facing.****
>>
>> ** **
>>
>> From the source, this is the command line usage of the class:****
>>
>> ** **
>>
>> usage: DataJoinJob inputdirs outputdir map_input_file_format  numofParts
>> mapper_class reducer_class map_output_value_class output_value_class
>> [maxNumOfValuesPerGroup [descriptionOfJob]]]****
>>
>> ** **
>>
>> Internally the class uses the old API to set the mapper and reducer
>> passed as arguments above.****
>>
>> ** **
>>
>> Thanks****
>>
>> hemanth****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> On Fri, Jan 11, 2013 at 9:00 PM, Michael Forage <
>> michael.for...@livenation.co.uk> wrote:****
>>
>> Hi****
>>
>>  ****
>>
>> I’m using Hadoop 1.0.4 and using the hadoop.mapreduce API having problems
>> compiling a simple class to implement a reduce-side data join of 2 files.
>> ****
>>
>> I’m trying to do this using contrib.utils.join and in Eclipse it all
>> compiles fine other than:****
>>
>>  ****
>>
>> job.*setMapperClass*(MapClass.*class*);****
>>
>>       job.*setReducerClass*(Reduce.*class*);****
>>
>>  ****
>>
>> …which both complain that the referenced class no longer extends either
>> Mapper<> or Reducer<>****
>>
>> It’s my understanding that for what they should instead extend 
>> DataJoinMapperBase
>> and DataJoinReducerBase in order ****
>>
>>  ****
>>
>> Have searched for a solution everywhere  but unfortunately, all the
>> examples I can find are based on the deprecated mapred API.****
>>
>> Assuming this package actually works with the new API, can anyone offer
>> any advice?****
>>
>>  ****
>>
>> Complete compile errors:****
>>
>>  ****
>>
>> The method setMapperClass(Class<? extends Mapper>) in the type Job is not
>> applicable for the arguments (Class<DataJoin.MapClass>)****
>>
>> The method setReducerClass(Class<? extends Reducer>) in the type Job is
>> not applicable for the arguments (Class<DataJoin.Reduce>)****
>>
>>  ****
>>
>> …and the code…****
>>
>>  ****
>>
>> *package* JoinTest;****
>>
>>  ****
>>
>> *import* java.io.DataInput;****
>>
>> *import* java.io.DataOutput;****
>>
>> *import* java.io.IOException;****
>>
>> *import* java.util.Iterator;****
>>
>>  ****
>>
>> *import* org.apache.hadoop.conf.Configuration;****
>>
>> *import* org.apache.hadoop.conf.Configured;****
>>
>> *import* org.apache.hadoop.fs.Path;****
>>
>> *import* org.apache.hadoop.io.LongWritable;****
>>
>> *import* org.apache.hadoop.io.Text;****
>>
>> *import* org.apache.hadoop.io.Writable;****
>>
>> *import* org.apache.hadoop.mapreduce.Job;****
>>
>> *import* org.apache.hadoop.mapreduce.Mapper;****
>>
>> *import* org.apache.hadoop.mapreduce.Reducer;****
>>
>> *import* org.apache.hadoop.mapreduce.Mapper.Context;****
>>
>> *import* org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>>
>> *import* org.apache.hadoop.mapreduce.lib.input.TextInputFormat;****
>>
>> *import* org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>>
>> *import* org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;****
>>
>> *import* org.apache.hadoop.util.Tool;****
>>
>> *import* org.apache.hadoop.util.ToolRunner;****
>>
>>  ****
>>
>> *import* org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;****
>>
>> *import* org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;****
>>
>> *import* org.apache.hadoop.contrib.utils.join.TaggedMapOutput;****
>>
>>  ****
>>
>> *public* *class* DataJoin *extends* Configured *implements* Tool {****
>>
>>     ****
>>
>>       *public* *static* *class* MapClass *extends* DataJoinMapperBase {**
>> **
>>
>>         ****
>>
>>         *protected* Text generateInputTag(String inputFile) {****
>>
>>             String datasource = inputFile.split("-")[0];****
>>
>>             *return* *new* Text(datasource);****
>>
>>         }****
>>
>>         ****
>>
>>         *protected* Text generateGroupKey(TaggedMapOutput aRecord) {****
>>
>>             String line = ((Text) aRecord.getData()).toString();****
>>
>>             String[] tokens = line.split(",");****
>>
>>             String groupKey = tokens[0];****
>>
>>             *return* *new* Text(groupKey);****
>>
>>         }****
>>
>>         ****
>>
>>         *protected* TaggedMapOutput generateTaggedMapOutput(Object
>> value) {****
>>
>>             TaggedWritable retv = *new* TaggedWritable((Text) value);****
>>
>>             retv.setTag(*this*.inputTag);****
>>
>>             *return* retv;****
>>
>>         }****
>>
>>     }****
>>
>>  ****
>>
>>       ****
>>
>>     *public* *static* *class* Reduce *extends* DataJoinReducerBase {****
>>
>>         ****
>>
>>         *protected* TaggedMapOutput combine(Object[] tags, Object[]
>> values) {****
>>
>>             *if* (tags.length < 2) *return* *null*;  ****
>>
>>             String joinedStr = ""; ****
>>
>>             *for* (*int* i=0; i<values.length; i++) {****
>>
>>                 *if* (i > 0) joinedStr += ",";****
>>
>>                 TaggedWritable tw = (TaggedWritable) values[i];****
>>
>>                 String line = ((Text) tw.getData()).toString();****
>>
>>                 String[] tokens = line.split(",", 2);****
>>
>>                 joinedStr += tokens[1];****
>>
>>             }****
>>
>>             TaggedWritable retv = *new* TaggedWritable(*new*Text(joinedStr));
>> ****
>>
>>             retv.setTag((Text) tags[0]); ****
>>
>>             *return* retv;****
>>
>>         }****
>>
>>     }****
>>
>>     ****
>>
>>     *public* *static* *class* TaggedWritable *extends* TaggedMapOutput {*
>> ***
>>
>>     ****
>>
>>         *private* Writable data;****
>>
>>         ****
>>
>>         *public* TaggedWritable(Writable data) {****
>>
>>             *this*.tag = *new* Text("");****
>>
>>             *this*.data = data;****
>>
>>         }****
>>
>>         ****
>>
>>         *public* Writable getData() {****
>>
>>             *return* data;****
>>
>>         }****
>>
>>         ****
>>
>>         *public* *void* write(DataOutput out) *throws* IOException {****
>>
>>             *this*.tag.write(out);****
>>
>>             *this*.data.write(out);****
>>
>>         }****
>>
>>         ****
>>
>>         *public* *void* readFields(DataInput in) *throws* IOException {**
>> **
>>
>>             *this*.tag.readFields(in);****
>>
>>             *this*.data.readFields(in);****
>>
>>         }****
>>
>>     }****
>>
>>     ****
>>
>>     *public* *int* run(String[] args) *throws* Exception {****
>>
>>         Configuration conf = getConf();****
>>
>>         ****
>>
>>         Job job = *new* Job(conf, "DataJoin");****
>>
>>             job.setJarByClass(DataJoin.*class*);****
>>
>>             ****
>>
>>             Path in = * new* Path(args[0]);****
>>
>>             Path out = * new* Path(args[1]);****
>>
>>             FileInputFormat.*setInputPaths*(job,  in);****
>>
>>             FileOutputFormat.*setOutputPath*(job,  out);****
>>
>>             ****
>>
>>             ****
>>
>> job.setJobName("DataJoin");****
>>
>>             job.*setMapperClass*(MapClass.*class*);****
>>
>>             job.*setReducerClass*(Reduce.*class*);****
>>
>>                         ****
>>
>>             job.setInputFormatClass(TextInputFormat.*class*);****
>>
>>             ****
>>
>>             //V3 set to Text****
>>
>>             job.setOutputFormatClass(TextOutputFormat.*class*);****
>>
>>             ****
>>
>>             //Applies to *mapper* output****
>>
>>             job.setOutputKeyClass(Text.*class*);****
>>
>>             job.setOutputValueClass(Text.*class*);****
>>
>>       ****
>>
>>             //job.set("mapred.textoutputformat.separator", ",");****
>>
>>             ****
>>
>>             System.*exit*(job.waitForCompletion(*true*)?0:1);****
>>
>>             ****
>>
>>             *return* 0;        ****
>>
>>  ****
>>
>>     }****
>>
>>     *public* *static* *void* main(String[] args) *throws* Exception { ***
>> *
>>
>>         *int* res = ToolRunner.*run*(*new* Configuration(),****
>>
>>                                  *new* DataJoin(),****
>>
>>                                  args);****
>>
>>         ****
>>
>>         System.*exit*(res);****
>>
>>     }****
>>
>> }****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> Thanks****
>>
>>  ****
>>
>> Mike****
>>
>>  ****
>>
>> ** **
>>
>
>

Re: Compile error using contrib.utils.join package with new mapreduce API

Reply via email to