Hello Colin, Judging from the stack trace, I think you've hit a known HDFS bug: HDFS-8055. A fix for this bug has been committed for the upcoming Apache Hadoop 2.8.0 release.
https://issues.apache.org/jira/browse/HDFS-8055 --Chris Nauroth On 6/3/16, 1:21 PM, "Colin Kincaid Williams" <disc...@uw.edu> wrote: >Hi, > >Thanks for your insight Vinay: > >It makes sense using it now, I appreciate the ability to select which >rack or round-robin. However I think the client api behavior might >have changed, because our first rack awareness script from early >hadoop 2.0.0 didn't provide a default ip, but I don't recall these >issues. > >With respect to my current issue: We had noticed that we could not >hdfs dfs -cat any files from our namenode itself. But we had made our >rack awareness script present it's caller arguments by echo $1 >> >/tmp/foo. I didn't find the IP for the namenode, or loopback >interface. Then it didn't appear to be requesting rack information for >the namenode. However, after adding the default rack to the script; >the issue went away. But the rack awareness didn't enter the namenode >IP into the file, why did we see the following behavior from the >namenode itself? > >sudo -u hdfs hdfs dfs -cat >/user/history/done_intermediate/hdfs/job_1464883770011_0005.summary > >cat: java.lang.NullPointerException > > > > >On Fri, Jun 3, 2016 at 1:14 AM, Vinayakumar B <vinayakum...@apache.org> >wrote: >> Rack awareness feature introduced to place the data blocks distributed >>among >> multiple racks, to avoid the data loss in case of whole rack failure. >> >> Now while reading/writing data blocks, to find the closest, data >>locality >> w.r.t to client will be considered. To know the nearest datanode in >>terms of >> rack mapping for the client, client's rack details arts required. So >>that's >> why if there are no datanodes also client's rack mapping will be >>resolved by >> namenode. By giving the correct real details, local rack datanode will >>be >> chosen for read improving the performance. >> In case default rack is given for non-datanode ip, any random datanode >>will >> be chosen to read the data. >> >> Hope this helps, >> >> Cheers, >> -Vinay >> >> On 3 Jun 2016 03:37, "Colin Kincaid Williams" <disc...@uw.edu> wrote: >> >> Recently we had a namenode that had a failed edits directory, and >> there was a failover. Things appeared to be functioning properly at >> first, but later we had hdfs issues. >> >> Looking at the namenode logs, we saw >> >> 2016-06-01 20:38:18,771 ERROR >> org.apache.hadoop.net.ScriptBasedMapping: Script >> /etc/hadoop/conf/getRackID.sh returned 0 values when 1 were expected. >> 2016-06-01 20:38:18,771 WARN org.apache.hadoop.ipc.Server: IPC Server >> handler 0 on 8020, call >> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from >> 10.51.28.100:42826 Call#484441029 Retry#0 >> java.lang.NullPointerException >> at >> >>org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocated >>Blocks(DatanodeManager.java:359) >> at >> >>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSN >>amesystem.java:1774) >> at >> >>org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocation >>s(NameNodeRpcServer.java:527) >> at >> >>org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPr >>otocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85 >>) >> at >> >>org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla >>torPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java >>:356) >> at >> >>org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client >>NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >> at >> >>org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr >>otobufRpcEngine.java:587) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> >>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation >>.java:1642) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >> >> So we could see that our rack awareness script was not returning a >> value. Then we made changes to the script to return the callers >> arguments for the script. We found a list of IPs, some which run >> services like oozie, some IPs our gateway server. However none of >> these IPs are the datanodes themselves. >> >> The symptoms of this issue were that the namenode itself couldn't cat >> files on the system, or make requests to move files on the history >> server, etc. >> >> From my understanding about rack awareness, we just need to provide a >> rack id for hosts that are datanodes. However all are datanodes were >> listed, and the requested ips were from non-datanodes. >> >> The solution was to provide a default ip for missing IPs in the rack >> awareness script. This is not well understood from the rack awareness >> docs, and caused a DOS on our hadoop services. >> >> But I want to know why the rack awareness script is getting called >> with IPs of non datanodes from our hadoop namenode. Is this a design >> feature of the yarn libraries? Why do non data node IPs need a rack >> id? >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: user-h...@hadoop.apache.org >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org >For additional commands, e-mail: user-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org