Re: Why do non data nodes need rack awareness?

2016-06-03 Thread Chris Nauroth
Hello Colin, Judging from the stack trace, I think you've hit a known HDFS bug: HDFS-8055. A fix for this bug has been committed for the upcoming Apache Hadoop 2.8.0 release. https://issues.apache.org/jira/browse/HDFS-8055 --Chris Nauroth On 6/3/16, 1:21 PM, "Colin Kincaid Williams"

Re: Why do non data nodes need rack awareness?

2016-06-03 Thread Colin Kincaid Williams
Hi, Thanks for your insight Vinay: It makes sense using it now, I appreciate the ability to select which rack or round-robin. However I think the client api behavior might have changed, because our first rack awareness script from early hadoop 2.0.0 didn't provide a default ip, but I don't

Re: Why do non data nodes need rack awareness?

2016-06-02 Thread Vinayakumar B
Rack awareness feature introduced to place the data blocks distributed among multiple racks, to avoid the data loss in case of whole rack failure. Now while reading/writing data blocks, to find the closest, data locality w.r.t to client will be considered. To know the nearest datanode in terms of

Why do non data nodes need rack awareness?

2016-06-02 Thread Colin Kincaid Williams
Recently we had a namenode that had a failed edits directory, and there was a failover. Things appeared to be functioning properly at first, but later we had hdfs issues. Looking at the namenode logs, we saw 2016-06-01 20:38:18,771 ERROR org.apache.hadoop.net.ScriptBasedMapping: Script