Re: dynamically resizing Hadoop cluster on AWS?

2013-10-23 Thread Jun Ping Du
Move to @user alias. - Original Message - From: Jun Ping Du j...@vmware.com To: gene...@hadoop.apache.org Sent: Wednesday, October 23, 2013 10:03:27 PM Subject: Re: dynamically resizing Hadoop cluster on AWS? If only compute node (TaskTracker or NodeManager) in your instance

Re: rack awarness unexpected behaviour

2013-10-03 Thread Jun Ping Du
The current HDFS's default replica placement policy don't fit two biased racks case very well: assume local rack has more nodes, which means more reducer slots and more disk capacity, then more reducer tasks will be executed within local rack. According to replica placement policy, it will put

Re: Multidata center support

2013-08-30 Thread Jun Ping Du
Hi, Although you can set datacenter layer on your network topology, it is never enabled in hadoop as lacking of replica placement and task scheduling support. There are some work to add layers other than rack and node under HADOOP-8848 but may not suit for your case. Agree with Adam that a

Re: rack awarness unexpected behaviour

2013-08-22 Thread Jun Ping Du
For 3 replicas, the replication sequence is: 1st on local node of Writer, 2nd on remote rack node of 1st replica, 3rd on same rack of 2nd replica. There could be some special cases like: disk is full on 1st node, or no node available for 2nd replica rack, and Hadoop already take care it well.

Re: find a doc bug in description of fair-scheduler in yarn

2013-08-22 Thread Jun Ping Du
Hi, It should be fixed in new version (2.1.0-beta), please refer: https://issues.apache.org/jira/browse/YARN-646 Thanks, Junping - Original Message - From: ch huang justlo...@gmail.com To: user@hadoop.apache.org Sent: Friday, August 23, 2013 10:30:42 AM Subject: find a doc bug

Re: MutableCounterLong and MutableCounterLong class difference in metrics v2

2013-08-09 Thread Jun Ping Du
Hi Lei, MutableCounterLong is a type of counter which can be increased only (count number is often large comparing with MutableCounterInt). It is used a lot in Hadoop metrics system, i.e. DatanodeMetrics. You can find more details on metrics v2 in Hadoop wiki link (

Re: Copy data from Mainframe to HDFS

2013-07-23 Thread Jun Ping Du
Hi Sandeep, I think Apache Oozie is something you are looking for. It provide workflow management on Hadoop (and Pig, Hive, etc.) jobs and support continuously run jobs in specific time period. Please refer: http://oozie.apache.org/docs/3.3.2/ for details. Thanks, Junping - Original

RE: TaskTracker Error

2012-02-23 Thread Jun Ping Du
I see from the log, the tasktracker try to connect to ipc server: ubuntu.local/192.168.164.138:9100. Do you set the correct mapred.job.tracker on slave node also? Thanks, Junping -Original Message- From: tgh [mailto:guanhua.t...@ia.ac.cn] Sent: Thursday, February 23, 2012 5:03 PM To:

RE: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Jun Ping Du
Hi Guruprasad, Do you have the valid IP--hostname setting in /etc/hosts so that each nodes can be accessed by hostname? I guess the configuration over public network can work may because it can get hostname resolved by DNS. Thanks, Junping -Original Message- From: Guruprasad B