Re: optimization help needed

2010-03-17 Thread Reik Schatz
pts","-Xmx***m). > > -Gang > > > > > - 原始邮件 > 发件人: Reik Schatz > 收件人: "common-user@hadoop.apache.org" > 发送日期: 2010/3/17 (周三) 10:13:45 上午 > 主 题: Re: optimization help needed > > Very good input not to sent the "original xml" o

Re: optimization help needed

2010-03-17 Thread Gang Luo
after several attempts. You may need to increase the heap size for each task by JobConf.set("mapred.child.java.opts","-Xmx***m). -Gang - 原始邮件 发件人: Reik Schatz 收件人: "common-user@hadoop.apache.org" 发送日期: 2010/3/17 (周三) 10:13:45 上午 主 题: Re: optimization help ne

Re: optimization help needed

2010-03-17 Thread Reik Schatz
uce the amount of data sent from mappers to reducers. Use > combiner to pre-aggregate the data may also help. > > -Gang > > > > > - 原始邮件 > 发件人: Reik Schatz > 收件人: "common-user@hadoop.apache.org" > 发送日期: 2010/3/17 (周三) 5:04:33 上午 > 主

Re: optimization help needed

2010-03-17 Thread Gang Luo
eik Schatz 收件人: "common-user@hadoop.apache.org" 发送日期: 2010/3/17 (周三) 5:04:33 上午 主 题: optimization help needed Preparing a Hadoop presentation here. For demonstration I start up a 5 machine m1.large cluster in EC2 via cloudera scripts ($hadoop-ec2 launch-cluster my-hadoop-cluster 5).

optimization help needed

2010-03-17 Thread Reik Schatz
Preparing a Hadoop presentation here. For demonstration I start up a 5 machine m1.large cluster in EC2 via cloudera scripts ($hadoop-ec2 launch-cluster my-hadoop-cluster 5). Then I sent a 500 MB xml file over into HDFS. The Mapper will receive a XML block as the key, select a email address from