Hi,
I'm migrating Apache jobs to the new MapReduce API. I came across too many
issues but there's one i can't seem to figure out:
SequenceFile.Reader[] readers = SequenceFileOutputFormat.getReaders(tmpFolder,
conf);
I have looked through the API docs for too many times now but i cannot find
Hi,
I had some questions specifically on the Map-Reduce phase:
[1] For the reduce phase, the TaskTrackers corresponding to the reduce node,
poll the Job Tracker to know about maps that have completed and if the
Jobtracker informs it about maps that are complete, it then pulls the data from
the
[1]. I think the reducers are allocated a space before the execution begins
and it depends on the number of reducers. If am not a mistaken, a hash
logic is used to implement this.
[2]. do not think we can determine the 'number' of reduce nodes. Its
determined by the load conditions i assume and
On Fri, Dec 16, 2011 at 7:03 PM, Ann Pal ann_r_...@yahoo.com wrote:
Hi,
I had some questions specifically on the Map-Reduce phase:
[1] For the reduce phase, the TaskTrackers corresponding to the reduce node,
poll the Job Tracker to know about maps that have completed and if the
Jobtracker
Ravi,
Thanks for the info.
Arun
On Fri, Dec 16, 2011 at 12:27 PM, Ravi Gummadi gr...@yahoo-inc.com wrote:
Amar is working on this issue MAPREDUCE-3349. The patch is not comiited to
trunk yet. Feel free to try it out while it gets reviewed and committed.
-Ravi
Hi,
I want to read a file that has 100MB of size and it is in the HDFS. How
should I do it? Is it with IOUtils.readFully?
Can anyone give me an example?
--
Thanks,
--
Thanks,
Yes you can use utility methods from IOUtils
ex:
FileOutputStream fo = new FileOutputStream (file);
IOUtils.copyBytes(fs.open(fileName), fo, 1024, true);
here fs is DFS stream.
other option is, you can make use of FileSystem apis.
EX:
FileSystem fs=new DistributedFileSystem();
Thanks a lot for your answers!
For [1] With the Pull model chances of seeing a TCP-Incast problem where
multiple map nodes send data to the same reduce node at the same time are
minimal (since the reducer is responsible for retrieving data it can handle).
Is this a valid assumption?
For [3]
pid files are there, I checked for running processes with the sameID's
and they all checked out.
--Joey
On Fri, Dec 16, 2011 at 5:40 PM, Rahul Jain rja...@gmail.com wrote:
You might be suffering from HADOOP-7822; I'd suggest you verify your pid
files and fix the problem by hand if it is the same