Re: map tasks and processes

2008-08-15 Thread Arun C Murthy
On Aug 15, 2008, at 9:15 AM, charles du wrote: Thanks a lot for the information. I used option '-file' provided by hadoop-streamiing to upload read- only files for my map/reduce job, and read them as a local file in my perl script. I am wondering if it is similar to what distributed cache do

Re: map tasks and processes

2008-08-15 Thread charles du
Thanks a lot for the information. I used option '-file' provided by hadoop-streamiing to upload read-only files for my map/reduce job, and read them as a local file in my perl script. I am wondering if it is similar to what distributed cache does performance wise? Thanks. tp. On Tue, Aug 12, 200

Re: map tasks and processes

2008-08-12 Thread Arun C Murthy
On Aug 12, 2008, at 11:21 AM, charles du wrote: Hi: Does hadoop always start a new process for each map task? Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to optimize that. Till HADOOP-249 is fixed, you could try and launch fewer, fatter maps by doing more work on each

map tasks and processes

2008-08-12 Thread charles du
Hi: Does hadoop always start a new process for each map task? I have a 20s-machine cluster and configured each task tracker to run 2 concurrent tasks at most. So the cluster can run 40 task in parallel. If I start a hadoop job with 1000 tasks, will hadoop create 1000 map processes during the exe