Re: CompositeInputFormat

2014-08-09 Thread Pedro Magalhaes
, 2014 at 1:36 PM, Pedro Magalhaes pedror...@gmail.com wrote: I saw that one of the requirements to use CompositeInputFormat is: A map-side join can be used to join the outputs of several jobs that had the same number of reducers, the same keys, and output files that are not splittable

Re: CompositeInputFormat

2013-07-11 Thread Jay Vyas
Map Side joins will use the CompositeInputFormat. They will only really be worth doing if one data set is small, and the other is large. This is a good example : http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/ the trick is to google for CompositeInputFormat.compose

RE: CompositeInputFormat

2013-07-11 Thread Botelho, Andrew
Sorry I should've specified that I need an example of CompositeInputFormat that uses the new API. The example linked below uses old API objects like JobConf. Any known examples of CompositeInputFormat using the new API? Thanks in advance, Andrew From: Jay Vyas [mailto:jayunit...@gmail.com

RE: CompositeInputFormat

2013-07-11 Thread Devaraj k
] Sent: 12 July 2013 03:33 To: user@hadoop.apache.org Subject: RE: CompositeInputFormat Sorry I should've specified that I need an example of CompositeInputFormat that uses the new API. The example linked below uses old API objects like JobConf. Any known examples of CompositeInputFormat using

Exception comes out with the override option in CompositeInputFormat

2013-05-19 Thread YouPeng Yang
Hi All I‘m trying the CompositeInputFormat to perform the Map-side join。 The outer and inner join go well. However I try the override option .It come out an exception[1]: My JobDriver class as follows: public static void main(String[] args) throws IOException, InterruptedException

TupleWritable value in mapper Not getting cleaned up ( using CompositeInputFormat )

2013-03-22 Thread devansh kumar
Hi,   I am trying to do an outer join on to input files. Can anyone help me to find out the problem here??   But while joining the TupleWritable value in the mapper is not getting cleaned up and so is using the previous values of a different key.   The code I used is : (  ‘plist’ is containing

TupleWritable value in mapper Not getting cleaned up ( using CompositeInputFormat )

2013-03-20 Thread Rusia, Devansh
Hi, I am trying to do an outer join on to input files. But while joining the TupleWritable value in the mapper is not getting cleaned up and so is using the previous values of a different key. The code I used is : ( 'plist' is containing the set of paths to be taken as input )

Re: CompositeInputFormat - why in mapred but not mapreduce?

2012-01-15 Thread Harsh J
, Mike Spreitzer mspre...@us.ibm.com wrote: Having looked at a few releases of Hadoop, I am surprised to find that in most of them the CompositeInputFormat class is in mapred but not mapreduce.  While there is a CompositeInputFormat under mapreduce in release 0.21.0, there is no CompositeInputFormat

CompositeInputFormat - why in mapred but not mapreduce?

2012-01-14 Thread Mike Spreitzer
Having looked at a few releases of Hadoop, I am surprised to find that in most of them the CompositeInputFormat class is in mapred but not mapreduce. While there is a CompositeInputFormat under mapreduce in release 0.21.0, there is no CompositeInputFormat under mapreduce in release 1.0.0

CompositeInputFormat scalbility

2009-06-24 Thread pmg
with tuples and cross product of FileA and FileB. 123[724101722493,5026328101569] 123[724101722493,5026328001562] 123[781676672721,5026328101569] 123[781676672721,5026328001562] How does CompositeInputFormat scale when we want to join 600K with 2 millions records. Does it run

Re: CompositeInputFormat scalbility

2009-06-24 Thread jason hadoop
package I can create output file with tuples and cross product of FileA and FileB. 123[724101722493,5026328101569] 123[724101722493,5026328001562] 123[781676672721,5026328101569] 123[781676672721,5026328001562] How does CompositeInputFormat scale when we want to join 600K

Re: CompositeInputFormat scalbility

2009-06-24 Thread pmg
[724101722493,5026328101569] 123[724101722493,5026328001562] 123[781676672721,5026328101569] 123[781676672721,5026328001562] How does CompositeInputFormat scale when we want to join 600K with 2 millions records. Does it run on the node with single map/reduce? Also how can I

Re: CompositeInputFormat scalbility

2009-06-24 Thread jason hadoop
[781676672721,5026328101569] 123[781676672721,5026328001562] How does CompositeInputFormat scale when we want to join 600K with 2 millions records. Does it run on the node with single map/reduce? Also how can I not write the result into a file instead input split the result into different

Data locality with CompositeInputFormat

2008-07-17 Thread Christian Kunz
When specifying multiple input directories for the CompositeInputFormat, is there any deterministic selection where to the tasks are put (data locality)? Any preference for running rack-local or node-local to the splits of first/last input directory? Thanks, -Christian