subject:"Partitioned Datasets Map\/Reduce"

Re: Partitioned Datasets Map/Reduce

2010-07-05 Thread abc xyz

; > Any help would be highly appreciated. > > > > From: Aaron Kimball To: common-user@hadoop.apache.org Sent: Mon, July 5, 2010 8:51:44 AM Subject: Re: Partitioned Datasets Map/Reduce

Re: Partitioned Datasets Map/Reduce

2010-07-05 Thread Aaron Kimball

One possibility: write out all the partition numbers (one per line) to a single file, then use the NLineInputFormat to make each line its own map task. Then in your mapper itself, you will get in a key of "0" or "1" or "2" etc. Then explicitly open /dataset1/part-(n) and /dataset2/part-(n) in your

Partitioned Datasets Map/Reduce

2010-07-03 Thread abc xyz

Hello everyone, I have written my custom partitioner for partitioning datasets. I want to partition two datasets using the same partitioner and then in the next mapreduce job, I want each mapper to handle the same partition from the two sources and perform some function such as joining et