Hi,
Is it possible to get the 'id' of the currently executing split or block
from within the mapper? Using this block Id / split id, I want to be able
to query the namenode to get the names of hosts having that block / spllit,
and the actual path to the data.
I need this for some analytics that
I think if you called getInputFormat on JobConf and then called getSplits
you would atleast get the locations.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputSplit.html
On Sun, Apr 8, 2012 at 9:16 AM, Deepak Nettem deepaknet...@gmail.comwrote:
Hi,
Is it
Deepak
On Sun, Apr 8, 2012 at 9:46 PM, Deepak Nettem deepaknet...@gmail.com wrote:
Hi,
Is it possible to get the 'id' of the currently executing split or block
from within the mapper? Using this block Id / split id, I want to be able
to query the namenode to get the names of hosts having
I have a related question about blocks related to thisNormally, a reduce
job outputs several files, all in the same directory.
But why? Since we know that Hadoop is abstracting our file for us, shouldn't
the part-r- outputs ultimately be thought of as a single file?
What is the
Hi,
The part in the default filename stands for partition. In some
cases I agree you would not mind viewing them as a singular file
instead of having to read directories - but there are also use cases
where you would want each partition file to be unique cause you
partitioned and processed them