HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2.
From: John Lilley john.lil...@redpoint.net
Sent: Tuesday, November 05, 2013 12:34 PM
To: user@hadoop.apache.org
Subject: HDP 2.0 GA?
I noticed that HDP 2.0 is available for download here:
I've done this before by placing the name of each file to process into a single
file (newline separated) and using the NLineInputFormat class as the input
format. Run your job with the single file with all of the file names to process
as the input. Each mapper will then be handed one line (this
I'm not sure that is possible. You can use the NLineInputFormat as a control
file and have a line per node in the cluster. I've used that technique for a
data generation program and it works well. This will run a pre-determined
number of mappers. However, it's up to the scheduler to decide when
I'm looking at implementing an advanced Reducer by overriding the Reducer.run()
method. Seems straightforward enough but looking at the default implementation
it invokes:
((ReduceContext.ValueIterator)
(context.getValues().iterator())).resetBackupStore();
The problem is that
I've run into that before. Try setting mapreduce.task.timeout. I seem to
remember that setting it to zero may turn off the timeout, but of course can be
dangerous if you have a runaway task. The default is 600 seconds ;-)
Check out
That's right. The TextInputFormat handles situations where records cross split
boundaries. What your mapper will see is whole records.
-Original Message-
From: maha [mailto:m...@umail.ucsb.edu]
Sent: Friday, February 18, 2011 1:14 PM
To: common-user
Subject: Quick question
Hi all,
Generally, if you have large files, setting the block size to 128M or larger is
helpful. You can do that on a per file basis or set the block size for the
whole filesystem. The larger block size cuts down on the number of map tasks
required to handle the overall data size. I've experimented