Hi Hilmi! The topology script / DNSToSwitchMapping tell the NameNode about the topology of the cluster : https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html
You can trace through https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L805 to find out how re-replications are ordered. (If you start the Namenode with environment variable "export HADOOP_NAMENODE_OPTS='-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1049' " set, you can connect a debugger to it. You might want to set a breakpoint in BlockManager.updateNeededReconstructions() ( https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4148) and BlockManager.computeDatanodeWork() ( https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4508 ) I suspect most of what you are looking for is here BlockPlacementPolicyDefault.chooseTarget() ( https://github.com/apache/hadoop/blob/48899134d2a77935a821072b5388ab1b1b7b399c/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L134 ) Also, please be aware that the code has changed a lot over different versions thanks to incredible contributions from the community. If you're trying to debug something, please make sure to find the right links in the right branch. HTH Ravi On Wed, Aug 2, 2017 at 4:31 AM, Hilmi Egemen Ciritoğlu < hilmi.egemen.cirito...@gmail.com> wrote: > Hi guys, > > I spend my time to read too much about setting replication factor as well > as block placement so far. But I still wonder how setrep command is working > behind in the code. > > I am looking for answer to following questions: > > What if you have one rack and increase and decrease replication factor, is > it block distribution will be randomised or based on disk usage etc. > (except or after rack-awareness issue) ? > > And what if I have 5 rack and replication factor 4 ? I am looking for > corner case to understand completely. > > I would be really appreciated if you can answer my question and explain > code side bit more too. > > Regards, > Egemen > >