Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

sam liu Sun, 20 Oct 2013 06:28:02 -0700

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.


BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!



2013/10/20 sam liu <samliuhad...@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <a...@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <samliuhad...@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <samliuhad...@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Reply via email to