, 2014 at 1:36 PM, Pedro Magalhaes pedror...@gmail.com
wrote:
I saw that one of the requirements to use CompositeInputFormat is:
A map-side join can be used to join the outputs of several jobs that
had
the same number of reducers, the same keys, and output files that are
not
splittable
Map Side joins will use the CompositeInputFormat. They will only really be
worth doing if one data set is small, and the other is large.
This is a good example :
http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/
the trick is to google for CompositeInputFormat.compose
Sorry I should've specified that I need an example of CompositeInputFormat that
uses the new API.
The example linked below uses old API objects like JobConf.
Any known examples of CompositeInputFormat using the new API?
Thanks in advance,
Andrew
From: Jay Vyas [mailto:jayunit...@gmail.com
]
Sent: 12 July 2013 03:33
To: user@hadoop.apache.org
Subject: RE: CompositeInputFormat
Sorry I should've specified that I need an example of CompositeInputFormat that
uses the new API.
The example linked below uses old API objects like JobConf.
Any known examples of CompositeInputFormat using
Hi All
I‘m trying the CompositeInputFormat to perform the Map-side join。 The
outer and inner join go well. However I try the override option .It come
out an exception[1]:
My JobDriver class as follows:
public static void main(String[] args) throws IOException,
InterruptedException
Hi,
I am trying to do an outer join on to input files.
Can anyone help me to find out the problem here??
But while joining the TupleWritable value in the mapper is not getting cleaned
up and so is using the previous values of a different
key.
The code I used is : ( ‘plist’ is containing
Hi,
I am trying to do an outer join on to input files.
But while joining the TupleWritable value in the mapper is not getting cleaned
up and so is using the previous values of a different key.
The code I used is : ( 'plist' is containing the set of paths to be taken as
input )
, Mike Spreitzer mspre...@us.ibm.com wrote:
Having looked at a few releases of Hadoop, I am surprised to find that in
most of them the CompositeInputFormat class is in mapred but not mapreduce.
While there is a CompositeInputFormat under mapreduce in release 0.21.0,
there is no CompositeInputFormat
Having looked at a few releases of Hadoop, I am surprised to find that in
most of them the CompositeInputFormat class is in mapred but not
mapreduce. While there is a CompositeInputFormat under mapreduce in
release 0.21.0, there is no CompositeInputFormat under mapreduce in
release 1.0.0
with tuples and cross
product of FileA and FileB.
123[724101722493,5026328101569]
123[724101722493,5026328001562]
123[781676672721,5026328101569]
123[781676672721,5026328001562]
How does CompositeInputFormat scale when we want to join 600K with 2
millions records. Does it run
package I can create output file with tuples and cross
product of FileA and FileB.
123[724101722493,5026328101569]
123[724101722493,5026328001562]
123[781676672721,5026328101569]
123[781676672721,5026328001562]
How does CompositeInputFormat scale when we want to join 600K
[724101722493,5026328101569]
123[724101722493,5026328001562]
123[781676672721,5026328101569]
123[781676672721,5026328001562]
How does CompositeInputFormat scale when we want to join 600K with 2
millions records. Does it run on the node with single map/reduce?
Also how can I
[781676672721,5026328101569]
123[781676672721,5026328001562]
How does CompositeInputFormat scale when we want to join 600K with 2
millions records. Does it run on the node with single map/reduce?
Also how can I not write the result into a file instead input split the
result into different
When specifying multiple input directories for the CompositeInputFormat,
is there any deterministic selection where to the tasks are put (data
locality)?
Any preference for running rack-local or node-local to the splits of
first/last input directory?
Thanks,
-Christian
14 matches
Mail list logo