Thanks for the reply Nitin, but I don't see what's the bottleneck of having
it distributed with multi-threaded maps ?
I see your point in that each map is processing different splits, but my
question is if each map task had 2 threads multiplexing or running in
parallel if there is enough cores
I think there's quite a few people like me here asking basic questions on the
user@ group.
From: Monkey2Code [mailto:monkey2c...@gmail.com]
Sent: Sunday, January 13, 2013 2:23 PM
To: gene...@hadoop.apache.org; user@hadoop.apache.org
Subject: request on behalf of newbies
Hi all,
Am a newbie in
Lot are Der on fb lot of blog's are there and last but not the least Google is
ur friend first do some thing and ask ;)
Sent from Samsung Galaxy NoteMonkey2Code monkey2c...@gmail.com wrote:Hi all,
Am a newbie in hadoop echo system, just wondering if there is any thread or
group where newbies
Thank for your response, Andy.
I think node2 has more mappers than others because the mappers on node2 move on
faster and finish earlier than the mappers on node3 and node4. When the first
wave of mappers on node2 (00018, 00019, ...) finished and there are more
mappers to run, then the
How to judge which counter would work?
2013/1/11 bejoy.had...@gmail.com
**
Hi
To add on to Harsh's comments.
You need not have to change the task time out.
In your map/reduce code, you can increment a counter or report status
intermediate on intervals so that there is communication
To add to that, log aggregation is a feature available with Hadoop 2.0
(where mapreduce is re-written to YARN). The functionality is available via
the History Server:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks
hemanth
On Sat, Jan 12, 2013