You will have to use a socks proxy (-D option in ssh tunnel). In addition,
when invoking hadoop fs command, you will have to add -Dsocks.proxyHost and
- Dsocks.proxyPort.

Thanks,
Hariharan

On Thu, 12 Sep 2019, 23:26 saurabh pratap singh, <saurabh.cs...@gmail.com>
wrote:

> Thank you so much for your reply .
> I have further question there are some blogs which talks about some
> similar setup like this one
>
> https://github.com/vkovalchuk/hadoop-2.6.0-windows/wiki/How-to-access-HDFS-behind-firewall-using-SOCKS-proxy
>
>
> I am just curious how does that works.
>
> On Thu, Sep 12, 2019 at 11:05 PM Tony S. Wu <tonyswu....@gmail.com> wrote:
>
>> You need connectivity from edge node to the entire cluster, not just
>> namenode. Your topology, unfortunately, probably won’t work too well. A
>> proper VPN / IPSec tunnel might be a better idea.
>>
>> On Thu, Sep 12, 2019 at 12:04 AM saurabh pratap singh <
>> saurabh.cs...@gmail.com> wrote:
>>
>>> Hadoop version : 2.8.5
>>> I have a hdfs set up in private data center (which is not exposed to
>>> internet ) .In the same data center I have another node (gateway
>>> node).Purpose of this gateway node is to provide access to hdfs from edge
>>> machine (which is present outside of data center) through public internet .
>>> To enable this kind of setup I have setup an ssh tunnel from edge
>>> machine to name node host and port(9000) through gateway node .
>>> something like
>>>
>>> ssh -N -L <local-port>:<namenode-private-ip>:<namenodeport>
>>> <gateway-user>@<gatewayhost> -i <ssh-keys>  -vvvv .
>>>
>>> When i did hadoop fs -ls hdfs://localhost:<local-port> it works fine
>>> from edge machine but
>>> when i executed hadoop fs -put <some-file> hdfs://localhost:<local-port>
>>> it fails with following error message.
>>>
>>> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout
>>> while waiting for channel to be ready for connect. ch :
>>> java.nio.channels.SocketChannel[connection-pending
>>> remote=/<private-ip-of-datanode>:50010]
>>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
>>> at
>>> org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:253)
>>> at
>>> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1725)
>>> at
>>> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
>>> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
>>>
>>>
>>> Looks like it is trying to write directly to private ip address of data
>>> node .How do i resolve this?
>>>
>>> Do let me know if some other information is needed .
>>>
>>> Thanks
>>>
>>

Reply via email to