Re: Why do non data nodes need rack awareness?

Colin Kincaid Williams Fri, 03 Jun 2016 13:22:35 -0700

Hi,

Thanks for your insight Vinay:


It makes sense using it now, I appreciate the ability to select which
rack or round-robin. However I think the client api behavior might
have changed, because our first rack awareness script from early
hadoop 2.0.0 didn't provide a default ip, but I don't recall these
issues.

With respect to my current issue: We had noticed that we could not
hdfs dfs -cat any files from our namenode itself. But we had made our
rack awareness script present it's caller arguments by echo $1 >>
/tmp/foo. I didn't find the IP for the namenode, or loopback
interface. Then it didn't appear to be requesting rack information for
the namenode. However, after adding the default rack to the script;
the issue went away. But the rack awareness didn't enter the namenode
IP into the file, why did we see the following behavior from the
namenode itself?

sudo -u hdfs hdfs dfs -cat
/user/history/done_intermediate/hdfs/job_1464883770011_0005.summary

cat: java.lang.NullPointerException




On Fri, Jun 3, 2016 at 1:14 AM, Vinayakumar B <vinayakum...@apache.org> wrote:
> Rack awareness feature introduced to place the data blocks distributed among
> multiple racks, to avoid the data loss in case of whole rack failure.
>
> Now while reading/writing data blocks, to find the closest, data locality
> w.r.t to client will be considered. To know the nearest datanode in terms of
> rack mapping for the client, client's rack details arts required.  So that's
> why if there are no datanodes also client's rack mapping will be resolved by
> namenode. By giving the correct real details, local rack datanode will be
> chosen for read improving the performance.
> In case default rack is given for non-datanode ip, any random datanode will
> be chosen to read the data.
>
> Hope this helps,
>
> Cheers,
> -Vinay
>
> On 3 Jun 2016 03:37, "Colin Kincaid Williams" <disc...@uw.edu> wrote:
>
> Recently we had a namenode that had a failed edits directory, and
> there was a failover. Things appeared to be functioning properly at
> first, but later we had hdfs issues.
>
> Looking at the namenode logs, we saw
>
> 2016-06-01 20:38:18,771 ERROR
> org.apache.hadoop.net.ScriptBasedMapping: Script
> /etc/hadoop/conf/getRackID.sh returned 0 values when 1 were expected.
> 2016-06-01 20:38:18,771 WARN org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8020, call
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from
> 10.51.28.100:42826 Call#484441029 Retry#0
> java.lang.NullPointerException
>   at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:359)
>   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1774)
>   at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
>   at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
>   at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
>   at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> So we could see that our rack awareness script was not returning a
> value. Then we made changes to the script to return the callers
> arguments for the script. We found a list of IPs, some which run
> services like oozie, some IPs our gateway server. However none of
> these IPs are the datanodes themselves.
>
> The symptoms of this issue were that the namenode itself couldn't cat
> files on the system, or make requests to move files on the history
> server, etc.
>
> From my understanding about rack awareness, we just need to provide a
> rack id for hosts that are datanodes. However all are datanodes were
> listed, and the requested ips were from non-datanodes.
>
> The solution was to provide a default ip for missing IPs in the rack
> awareness script. This is not well understood from the rack awareness
> docs, and caused a DOS on our hadoop services.
>
> But I want to know why  the rack awareness script is getting called
> with IPs of non datanodes from our hadoop namenode. Is this a design
> feature of the yarn libraries? Why do non data node IPs need a rack
> id?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: Why do non data nodes need rack awareness?

Reply via email to