Hi, Thanks for your insight Vinay:
It makes sense using it now, I appreciate the ability to select which rack or round-robin. However I think the client api behavior might have changed, because our first rack awareness script from early hadoop 2.0.0 didn't provide a default ip, but I don't recall these issues. With respect to my current issue: We had noticed that we could not hdfs dfs -cat any files from our namenode itself. But we had made our rack awareness script present it's caller arguments by echo $1 >> /tmp/foo. I didn't find the IP for the namenode, or loopback interface. Then it didn't appear to be requesting rack information for the namenode. However, after adding the default rack to the script; the issue went away. But the rack awareness didn't enter the namenode IP into the file, why did we see the following behavior from the namenode itself? sudo -u hdfs hdfs dfs -cat /user/history/done_intermediate/hdfs/job_1464883770011_0005.summary cat: java.lang.NullPointerException On Fri, Jun 3, 2016 at 1:14 AM, Vinayakumar B <vinayakum...@apache.org> wrote: > Rack awareness feature introduced to place the data blocks distributed among > multiple racks, to avoid the data loss in case of whole rack failure. > > Now while reading/writing data blocks, to find the closest, data locality > w.r.t to client will be considered. To know the nearest datanode in terms of > rack mapping for the client, client's rack details arts required. So that's > why if there are no datanodes also client's rack mapping will be resolved by > namenode. By giving the correct real details, local rack datanode will be > chosen for read improving the performance. > In case default rack is given for non-datanode ip, any random datanode will > be chosen to read the data. > > Hope this helps, > > Cheers, > -Vinay > > On 3 Jun 2016 03:37, "Colin Kincaid Williams" <disc...@uw.edu> wrote: > > Recently we had a namenode that had a failed edits directory, and > there was a failover. Things appeared to be functioning properly at > first, but later we had hdfs issues. > > Looking at the namenode logs, we saw > > 2016-06-01 20:38:18,771 ERROR > org.apache.hadoop.net.ScriptBasedMapping: Script > /etc/hadoop/conf/getRackID.sh returned 0 values when 1 were expected. > 2016-06-01 20:38:18,771 WARN org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from > 10.51.28.100:42826 Call#484441029 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:359) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1774) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > So we could see that our rack awareness script was not returning a > value. Then we made changes to the script to return the callers > arguments for the script. We found a list of IPs, some which run > services like oozie, some IPs our gateway server. However none of > these IPs are the datanodes themselves. > > The symptoms of this issue were that the namenode itself couldn't cat > files on the system, or make requests to move files on the history > server, etc. > > From my understanding about rack awareness, we just need to provide a > rack id for hosts that are datanodes. However all are datanodes were > listed, and the requested ips were from non-datanodes. > > The solution was to provide a default ip for missing IPs in the rack > awareness script. This is not well understood from the rack awareness > docs, and caused a DOS on our hadoop services. > > But I want to know why the rack awareness script is getting called > with IPs of non datanodes from our hadoop namenode. Is this a design > feature of the yarn libraries? Why do non data node IPs need a rack > id? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org