accessing hdfs cluster through ssh tunnel

2019-09-12 Thread saurabh pratap singh
Hadoop version : 2.8.5 I have a hdfs set up in private data center (which is not exposed to internet ) .In the same data center I have another node (gateway node).Purpose of this gateway node is to provide access to hdfs from edge machine (which is present outside of data center) through public int

Re: accessing hdfs cluster through ssh tunnel

2019-09-12 Thread Tony S. Wu
You need connectivity from edge node to the entire cluster, not just namenode. Your topology, unfortunately, probably won’t work too well. A proper VPN / IPSec tunnel might be a better idea. On Thu, Sep 12, 2019 at 12:04 AM saurabh pratap singh < saurabh.cs...@gmail.com> wrote: > Hadoop version :

Re: accessing hdfs cluster through ssh tunnel

2019-09-12 Thread saurabh pratap singh
Thank you so much for your reply . I have further question there are some blogs which talks about some similar setup like this one https://github.com/vkovalchuk/hadoop-2.6.0-windows/wiki/How-to-access-HDFS-behind-firewall-using-SOCKS-proxy I am just curious how does that works. On Thu, Sep 12,

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread Julien Laurenceau
Hi Hadoop is designed to avoid proxy as it will act as a bottleneck. Namenodes are used to obtain a direct socket client / datanodes that is specific to each job. Le ven. 13 sept. 2019 à 14:21, Tony S. Wu a écrit : > You need connectivity from edge node to the entire cluster, not just > namenode

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread Hariharan Iyer
You will have to use a socks proxy (-D option in ssh tunnel). In addition, when invoking hadoop fs command, you will have to add -Dsocks.proxyHost and - Dsocks.proxyPort. Thanks, Hariharan On Thu, 12 Sep 2019, 23:26 saurabh pratap singh, wrote: > Thank you so much for your reply . > I have furt

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread saurabh pratap singh
Thank you all for your help . Solution that worked for me is as follows: I opened ssh tunnel for namenode which ensure that hadoop fs -ls works In order for hadoop fs -put to work (as it was timing out because namenode was returning private ip addresses of datanode which cant be resolved by edge ma

Re: accessing hdfs cluster through ssh tunnel

2019-09-16 Thread saurabh pratap singh
Hi all So I was not satisfied with the above mentioned approach and tried hadoop socks server config at client end and used ssh with -D option as mentioned by Hariharan Iyer (Thank you for that) and it worked as expected without the need of opening separate ssh tunnels for data nodes. Thanks. On