How does AWS know how many map/reduce slot should be configured to each EC2 instance?

2013-07-19 Thread WangRamon
Hi All We have a plan to move to Amazon AWS cloud, by doing some research i find that i can start the map/reduce cluster in AWS with the following command:% bin/hadoop-ec2 launch-cluster test-cluster 2 The command allows me to start a cluster with required nodes(no more than 20, correct me if i

RE: How does AWS know how many map/reduce slot should be configured to each EC2 instance?

2013-07-19 Thread WangRamon
expensive than EC2 node)3. Yes, you can find them in admin console. On 19 July 2013 16:23, WangRamon ramon_w...@hotmail.com wrote: Hi All We have a plan to move to Amazon AWS cloud, by doing some research i find that i can start the map/reduce cluster in AWS with the following command:% bin

Is there an additional overhead when storing data in HDFS?

2012-11-20 Thread WangRamon
Hi All I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB file, the replicate factor of HDSF is 2, when the file is uploaded to HDFS, should HDFS use 4GB to store it or more then 4GB to store it? If it takes more than 4GB space, why?

RE: Is there an additional overhead when storing data in HDFS?

2012-11-20 Thread WangRamon
is for every 512 bytes of data, 4 bytes of checksum are stored. In this case additional 32MB data. On Tue, Nov 20, 2012 at 11:00 PM, WangRamon ramon_w...@hotmail.com wrote: Hi All I'm wondering if there is an additional overhead when storing some data into HDFS? For example, I have a 2GB

Is there a way to get notificaiton when the job is failed?

2012-09-06 Thread WangRamon
Hi Guys Is there some 3rd party monitor tool that i can use to monitor the hadoop cluster, especially that i can get a notification/email when there is a job failed? Thanks for any suggestion. CheersRamon

RE: Is there a way to get notificaiton when the job is failed?

2012-09-06 Thread WangRamon
class-name -Djob.end.notification.url=http-url ... The URL can contain two special strings $jobId and $jobStatus which will be replaced with the actual values for the job. Then, your web application can implement any notification as required. Thankshemanth On Thu, Sep 6, 2012 at 2:02 PM, WangRamon

Fair Share Scheduler not worked as expected

2012-03-20 Thread WangRamon
Hi All I noticed there is something strange in my Fair Share Scheduler monitor GUI, the SUMof the Faire Share Value is always about 30 even there is only one M/R Job is running, so I don't know whether the value is about the usage percentage, if it was the percentage, that explains why all

Fair Share Scheduler not worked as expected

2012-03-20 Thread WangRamon
Hi All I noticed there is something strange in my Fair Share Scheduler monitor GUI, the SUMof the Faire Share Value is always about 30 even there is only one M/R Job is running, so I don't know whether the value is about the usage percentage, if it was the percentage, that explains why all

Can Hadoop 1.0.1 work with HBase 0.90.5?

2012-03-17 Thread WangRamon
Hi all I'm looking for a compatible version with HBase 0.90.5, as i know Hadoop 1.0.0 can work with HBase, it has HBase (append/hsynch/hflush, and security) in its release note, but there is no such note in the Hadoop 1.0.1 release, so can anybody give me a correct answer? Thanks. BTW, i see

RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-12 Thread WangRamon
this for number of map tasks spawned - depending on number of input files and also number of tasks to run concurrently depends on number cores/cpus. Thanks From: WangRamon [ramon_w...@hotmail.com] Sent: Saturday, March 10, 2012 5:35 PM To: mapreduce-user@hadoop.apache.org Subject: RE

What is the NEW api?

2012-03-11 Thread WangRamon
Hi all I've been with Hadoop-0.20-append for a few time and I plan to upgrade to 1.0.0 release, but i find there are many people taking about the NEW API, so I'm lost, can anyone please tell me what is the new API? Is the OLD one available in the 1.0.0 release? Thanks CheersRamon

Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon
Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14 map and 14 reduce slots, here is the configuration: property namemapred.tasktracker.map.tasks.maximum/name value14/value /propertyproperty

RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon
is the total reduce capacity? -Joey On Mar 10, 2012, at 5:39, WangRamon ramon_w...@hotmail.com wrote: Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14 map and 14 reduce slots, here is the configuration: property

RE: Mapper Record Spillage

2012-03-10 Thread WangRamon
How man map/reduce tasks slots do you have for each node? If the total number is 10, then you will use 10 * 4096mb memory when all tasks are running, which is bigger than the total memory 32G you have for each node. Date: Sat, 10 Mar 2012 20:00:13 -0800 Subject: Mapper Record Spillage From:

RE: does hadoop always respect setNumReduceTasks?

2012-03-10 Thread WangRamon
Jane, i think you have mapred.tasktracker.reduce.tasks.maximum or mapred.reduce.tasks set to 1 in your local, and have them set to some other values in the emr, that's why you always get one reducer in your local and not on the emr. CheersRamon Date: Thu, 8 Mar 2012 21:30:26 -0500 Subject:

Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.

2012-03-10 Thread WangRamon
Sorry for the duplicate, i have sent this mail to map/reduce mail list but haven't got any useful response yet, so i think maybe i can get some suggestions here, thanks. Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14 map and 14 reduce slots, here

RE: How to manage hadoop job submit?

2011-11-21 Thread WangRamon
(org.apache.hadoop.mapreduce.JobPriority) [1 - Stable API] - http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setJobPriority(org.apache.hadoop.mapred.JobPriority) HTH. On 20-Nov-2011, at 3:14 PM, WangRamon wrote: Hi All I'm new to hadoop, I know I

How to manage hadoop job submit?

2011-11-20 Thread WangRamon
Hi All I'm new to hadoop, I know I can use haddop jar to submit my M/R job, but we need to submit a lot of jobs in my real environment, there is priority requirement for each jobs, so is there any way to manage how to submit jobs? Any Java API? Or we can only use the hadoop command line with

RE: How to manage hadoop job submit?

2011-11-20 Thread WangRamon
] - Original Message - From: WangRamon ramon_w...@hotmail.com To: common-user@hadoop.apache.org Cc: Sent: Sunday, November 20, 2011 4:44 AM Subject: How to manage hadoop job submit? Hi All I'm new to hadoop, I know I can use haddop jar to submit my M/R job, but we need to submit a lot