I do have this in my command line, and it did not work. -Dmapred.tasktracker.map.tasks.maximum=2
I also tried changing mapred-site.xml, and restart the tasktracker, it did not work either. I am sure it will work if I restart everything, but I really do not want to lose my data on hdfs. So I have not tried restarting everyting. Best regards, -Shaojun On Fri, Jan 18, 2013 at 12:23 PM, Jeffrey Buell <jbu...@vmware.com> wrote: > Try: > > -Dmapred.tasktracker.map.tasks.maximum=1 > > Although I usually put this parameter in mapred-site.xml. > > Jeff > > > Dear all, > > I know it is best to use small amount of mem in mapper and reduce. > However, sometimes it is hard to do so. For example, in machine > learning algorithms, it is common to load the model into mem in the > mapper step. When the model is big, I have to allocate a lot of mem > for the mapper. > > Here is my question: how can I config hadoop so that it does not fork > too many mappers and run out of physical memory? > > My machines have 24G, and I have 100 of them. Each time, hadoop will > fork 6 mappers on each machine, no matter what config I used. I really > want to reduce it to what ever number I want, for example, just 1 > mapper per machine. > > Here are the config I tried. (I use streaming, and I pass the config > in the command line) > > -Dmapred.child.java.opts=-Xmx8000m <-- did not bring down the number of > mappers > > -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number > of mappers > > Am I missing something here? > I use Hadoop 0.20.205 > > Thanks a lot in advance! > -Shaojun