Hi,
Is it possible to get the 'id' of the currently executing split or block
from within the mapper? Using this block Id / split id, I want to be able
to query the namenode to get the names of hosts having that block / spllit,
and the actual path to the data.
I need this for some analytics that
I think if you called getInputFormat on JobConf and then called getSplits
you would atleast get the locations.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputSplit.html
On Sun, Apr 8, 2012 at 9:16 AM, Deepak Nettem deepaknet...@gmail.comwrote:
Hi,
Is it
Thanks for your advise, File.createTempFile() works great, at least in
pseudo-ditributed mode, hope cluster solution will do the same work. You
saved me hours of trying...
On 04/07/2012 11:29 PM, Harsh J wrote:
MapReduce sets mapred.child.tmp for all tasks to be the Task
Attempt's
Hi guys. Just a theoretical question here : I notice in chapter 1 of the
Hadoop orielly book that the new API example has *no* Configuration object.
Why is that?
I thought the new API still uses / needs a Configuration class when running
jobs.
Jay Vyas
MMSB
UCHC
On Apr 7, 2012, at 4:29
Deepak
On Sun, Apr 8, 2012 at 9:46 PM, Deepak Nettem deepaknet...@gmail.com wrote:
Hi,
Is it possible to get the 'id' of the currently executing split or block
from within the mapper? Using this block Id / split id, I want to be able
to query the namenode to get the names of hosts having
The Job class encapsulates the Configuration object and manages it for
you. You can also get its reference out via Job.getConfiguration() -
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/JobContext.html#getConfiguration()
Hence, when you do a: Job job = new Job();,
It will work. Pseudo-distributed mode shouldn't be all that different
from a fully distributed mode. Do let us know if it does not work as
intended.
On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
Thanks for your advise, File.createTempFile() works great, at least in
I will, but deploying application on a cluster is now far away. Just
finishing raw implementation. Cluster tuning is planed in the end of
this month.
Thanks.
On 04/08/2012 09:06 PM, Harsh J wrote:
It will work. Pseudo-distributed mode shouldn't be all that different
from a fully distributed
I have a related question about blocks related to thisNormally, a reduce
job outputs several files, all in the same directory.
But why? Since we know that Hadoop is abstracting our file for us, shouldn't
the part-r- outputs ultimately be thought of as a single file?
What is the
Hi,
The part in the default filename stands for partition. In some
cases I agree you would not mind viewing them as a singular file
instead of having to read directories - but there are also use cases
where you would want each partition file to be unique cause you
partitioned and processed them
Hi.
I am new to Hadoop and I am working on project on AWS Elastic MapReduce.
The problem I am facing is:
* org.apache.commons.lang.time.DateUtils: parseDate() works OK but
parseDateStrictly() fails.
I think parseDateStrictly might be new in lang 2.5. I thought I included all
dependencies.
11 matches
Mail list logo