@Thomas
Thanks.My input files are sorted .
@Jingkei
Thanks.I will have a look at the instructions for join.
On Tue, Oct 27, 2009 at 12:39 AM, Thomas Thevis wrote:
> Hey Anty,
>
> there exists a config key 'map.input.file' which should return the name of
> the input file the mapper gets its input
Hey Anty,
there exists a config key 'map.input.file' which should return the name
of the input file the mapper gets its input values from.
In the pre-hadoop-0.20.0 era, one would have to implement the
configure() method to have access to the configuration. Since then, it
could be possible to u
Thanks very much for your reply Thomas.
I search in Mapper.map() method,but i still can't find out the way to
retrieve the source file name of the input data,can you describe in more
details?
for your proposed suggestion,i have some doubts,
the names of the three files are random,so we couldn't so
Thanks very much for your reply Thomas.
I search in Mapper.map() method,but i still can't find out the way to
retrieve the source file name of the input data,can you describe in more
details?
for your proposed suggestion,i have some doubts,
the names of the three files are random,so we couldn't so
Assuming your input files are sorted, you should be able to use the map-side
join framework to do the job you describe (effectively an outer join) while
avoiding going through the Reduce phase.
There are instructions on how to use it here:
http://hadoop.apache.org/common/docs/current/api/org/apach
Hi Anty,
as far as I know, it is possible to retrieve the source file name of the
input data within the Mapper's map() method.
If so, you could use secondary sort on values (have a look at the Hadoop
wiki pages) to propagate the values sorted first by key and second by
filename to the Reducer
Does MultipleInputs meet this situation?
Does any one have any idea about this?
On Mon, Oct 26, 2009 at 7:44 PM, Anty wrote:
> Hi:
> all
> I have a such use case:i have three files,each file is key-value pairs,
> file1: file2: file3:
> key1-value1A
Hi:
all
I have a such use case:i have three files,each file is key-value pairs,
file1: file2: file3:
key1-value1A key1-value1B key1-value1C
key2-value2A key2-value2B key2-value2C
key3-value3A kye3-valu