Rack awareness feature introduced to place the data blocks distributed among multiple racks, to avoid the data loss in case of whole rack failure.
Now while reading/writing data blocks, to find the closest, data locality w.r.t to client will be considered. To know the nearest datanode in terms of rack mapping for the client, client's rack details arts required. So that's why if there are no datanodes also client's rack mapping will be resolved by namenode. By giving the correct real details, local rack datanode will be chosen for read improving the performance. In case default rack is given for non-datanode ip, any random datanode will be chosen to read the data. Hope this helps, Cheers, -Vinay On 3 Jun 2016 03:37, "Colin Kincaid Williams" <disc...@uw.edu> wrote: Recently we had a namenode that had a failed edits directory, and there was a failover. Things appeared to be functioning properly at first, but later we had hdfs issues. Looking at the namenode logs, we saw 2016-06-01 20:38:18,771 ERROR org.apache.hadoop.net.ScriptBasedMapping: Script /etc/hadoop/conf/getRackID.sh returned 0 values when 1 were expected. 2016-06-01 20:38:18,771 WARN org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 10.51.28.100:42826 Call#484441029 Retry#0 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:359) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1774) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) So we could see that our rack awareness script was not returning a value. Then we made changes to the script to return the callers arguments for the script. We found a list of IPs, some which run services like oozie, some IPs our gateway server. However none of these IPs are the datanodes themselves. The symptoms of this issue were that the namenode itself couldn't cat files on the system, or make requests to move files on the history server, etc. >From my understanding about rack awareness, we just need to provide a rack id for hosts that are datanodes. However all are datanodes were listed, and the requested ips were from non-datanodes. The solution was to provide a default ip for missing IPs in the rack awareness script. This is not well understood from the rack awareness docs, and caused a DOS on our hadoop services. But I want to know why the rack awareness script is getting called with IPs of non datanodes from our hadoop namenode. Is this a design feature of the yarn libraries? Why do non data node IPs need a rack id? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org