Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-10-29 Thread sam liu
Yes, you are correct: using fsck tool I found some files in my cluster expected more replications than the value defined in dfs.replication. If I set the expected replication of this files to a proper number, the decommissioning process will go smoothly and the datanode could be decommissioned fina

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-31 Thread Harsh J
As I said before, it is a per-file property and the config can be bypassed by clients that do not read the configs, place a manual API override, etc.. If you want to really define a hard maximum and catch such clients, try setting dfs.replication.max to 2 at your NameNode. On Thu, Aug 1, 2013 at

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-31 Thread sam liu
But, please mention that the value of 'dfs.replication' of the cluster is always 2, even when the datanode number is 3. And I am pretty sure I did not manually create any files with rep=3. So, why were some files of hdfs created with repl=3, but not repl=2? 2013/8/1 Harsh J > The step (a) point

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-31 Thread Harsh J
The step (a) points to your problem and solution both. You have files being created with repl=3 on a 2 DN cluster which will prevent decommission. This is not a bug. On Wed, Jul 31, 2013 at 12:09 PM, sam liu wrote: > I opened a jira for tracking this issue: > https://issues.apache.org/jira/browse

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-30 Thread sam liu
I opened a jira for tracking this issue: https://issues.apache.org/jira/browse/HDFS-5046 2013/7/2 sam liu > Yes, the default replication factor is 3. However, in my case, it's > strange: during decommission hangs, I found some block's expected replicas > is 3, but the 'dfs.replication' value in

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-01 Thread sam liu
Yes, the default replication factor is 3. However, in my case, it's strange: during decommission hangs, I found some block's expected replicas is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node is always 2 from the beginning of cluster setup. Below is my steps: 1. Install

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-06-21 Thread Harsh J
The dfs.replication is a per-file parameter. If you have a client that does not use the supplied configs, then its default replication is 3 and all files it will create (as part of the app or via a job config) will be with replication factor 3. You can do an -lsr to find all files and filter which

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-06-21 Thread sam liu
Hi George, Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still encounter this issue. Thanks! 2013/6/21 George Kousiouris > > Hi, > > I think i have faced this before, the problem is that you have the rep > factor=3 so it seems to hang because it needs 3 nodes to achie

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-06-21 Thread George Kousiouris
Hi, I think i have faced this before, the problem is that you have the rep factor=3 so it seems to hang because it needs 3 nodes to achieve the factor (replicas are not created on the same node). If you set the replication factor=2 i think you will not have this issue. So in general you must

Hang when add/remove a datanode into/from a 2 datanode cluster

2013-06-21 Thread sam liu
Hi, I encountered an issue which hangs the decommission operatoin. Its steps: 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in hdfs-site.xml, set the 'dfs.replication' to 2 2. Add node dn3 into the cluster as a new datanode, and did not change the ' dfs.replication' value