On Aug 15, 2008, at 9:15 AM, charles du wrote:
Thanks a lot for the information.
I used option '-file' provided by hadoop-streamiing to upload read-
only
files for my map/reduce job, and read them as a local file in my perl
script. I am wondering if it is similar to what distributed cache do
Thanks a lot for the information.
I used option '-file' provided by hadoop-streamiing to upload read-only
files for my map/reduce job, and read them as a local file in my perl
script. I am wondering if it is similar to what distributed cache does
performance wise? Thanks.
tp.
On Tue, Aug 12, 200
On Aug 12, 2008, at 11:21 AM, charles du wrote:
Hi:
Does hadoop always start a new process for each map task?
Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to
optimize that.
Till HADOOP-249 is fixed, you could try and launch fewer, fatter maps
by doing more work on each
Hi:
Does hadoop always start a new process for each map task?
I have a 20s-machine cluster and configured each task tracker to run 2
concurrent tasks at most. So the cluster can run 40 task in parallel. If I
start a hadoop job with 1000 tasks, will hadoop create 1000 map processes
during the exe