Thanks, that’s interesting information.  Use of an Edge Node sounds like a 
useful convention.  We are software vendors, and we want to connect to any 
Hadoop cluster regardless of configuration.  How does the Edge Node support 
connections to HDFS from the client?  Doesn’t the HDFS FileSystem require 
direct connections to each DataNode?  Does such an Edge Node proxy all of those 
connections automatically, or does our software need to be made aware of this 
convention somehow?

Thanks,
John


From: Rishi Yadav [mailto:ri...@infoobjects.com]
Sent: Saturday, June 07, 2014 8:20 AM
To: user@hadoop.apache.org
Subject: Re: Gathering connection information

Typically users ssh edge node which is co-located with the cluster. It also 
minimizes latency between client and cluster.


—
Sent from Mailbox<https://www.dropbox.com/mailbox>


On Sat, Jun 7, 2014 at 7:12 AM, Peyman Mohajerian 
<mohaj...@gmail.com<mailto:mohaj...@gmail.com>> wrote:
In my experience you build a node called Edge Node which has all the libraries 
and configuration setting in XML to connect to the cluster, it just doesn't 
have any of the Hadoop daemons running.

On Wed, Jun 4, 2014 at 2:46 PM, John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote:
We’ve found that much of the Hadoop samples assume that running is being done 
form a cluster node, and that the connection information can be gleaned 
directly from a configuration object.  However, we always run our client from a 
remote computer, and our users must manually specify the NN/RM addresses and 
ports.  We’ve found this varies maddeningly between distros and especially on 
hosted virtual implementations.  Getting the wrong port results in various 
inscrutable errors with red-herring messages about security.  Is there a 
prescribed way to get the correct connection information more easily, like from 
a web API (where at least we’d only need one address and port)?

john



Reply via email to