[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-31 Thread Milan Potocnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081341#comment-14081341
 ] 

Milan Potocnik commented on YARN-1994:
--

[~cwelch] looks good, thanks for the effort!

+1 from me

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
> YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, 
> YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
> YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-30 Thread Milan Potocnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079104#comment-14079104
 ] 

Milan Potocnik commented on YARN-1994:
--

I agree it is tricky to hunt down all service endpoints and make sure they 
support proper hostname. Also, when new endpoints are added, they have to be 
aware of this convention. I guess some logic could be added to RPC.Server in 
the long run.

Few notes, though.

We have experienced issues on Windows, when client and service are on the same 
machine. It turns out that in certain situations when client is resolving the 
connect address (which has actual ('main') hostname), it does not go to DNS 
Server, but rather performs it locally and in some cases might return an 
unwanted IP address (since the machine itself is aware of all of its network 
interfaces). If special hostname is used ('hostname-IB' in my earlier example), 
resolve will go to DNS Server and everything will work.

The proposed approach is not unprecedented, HDFS has similar functionality, you 
can specify custom hostnames for components (so that 
InetAddress.getLocalHost().getHostName() is never called). Please have a look 
at:
 - fs.default.fs - you can specify custom hostname that namenode will use
 - dfs.namenode.rpc-bind-host - can be set to 0.0.0.0 in that case
 - dfs.datanode.hostname - can be used to specify custom datanode hostname
 - dfs.datanode.address, dfs.datanode.ipc.address, etc... - can be set to 
0.0.0.0 in that case
 - dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname also 
need to be set

So I think it would make sense to have similar functionality available in 
YARN/MR as well.

Thanks,
Milan



> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
> YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
> YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-29 Thread Milan Potocnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078412#comment-14078412
 ] 

Milan Potocnik commented on YARN-1994:
--

[~cwelch]
I'll try to explain one of the use cases.

Let's say we have following interfaces in our network:
 - 1 ethernet, public network
 - 2 IB, private network. Please note that on Windows, IB does not support 
teaming

On DNS Server, DNS entry for machine's hostname can resolve to any of the three 
interfaces (for each 'hostname' entry - three IP addresses). We also add a 
special DNS entry for each machine that resolves only to two IB interfaces, 
let's say in the form of 'hostname-IB'.

Use case 1: We want internal communication in the cluster to always use IB. We 
also want to be fault tolerant if one of the IB fails (remember, no teaming on 
Windows). In order to bind to both IB interfaces, we must set bind address to 
0.0.0.0. When this is set, clients when connecting will currently get hostname, 
which in some cases (DNS server usually returns IPs by round-robin) will 
resolve to Ethernet IP address, which could be blocked by firewall, or it might 
degrade performance in internal communication.

By setting _BIND_HOST to 0.0.0.0 and _ADDRESS to 'hostname-IB' we avoid the 
non-determinism of InetSocketAddress.getHostName()

For outside clients we can also control connectivity by making sure they 
connect via the public network, but this is a simpler problem, since they would 
use different DNS server.



> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
> YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
> YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-29 Thread Milan Potocnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078240#comment-14078240
 ] 

Milan Potocnik commented on YARN-1994:
--

Hi [~cwelch], [~arpitagarwal],

I think some clarification is needed here. Initial reason we wanted to 
introduce _BIND_HOST options was to provide deterministic behaviors when 
clients try to connect to a service endpoint which is listening on all 
interfaces (0.0.0.0). In short, _BIND_HOST is what services use to bind, 
_ADDRESS is what clients should use to connect. This way, everything is 
deterministic.

In Multi NIC environments, with default implementation, calls to 
conf.updateConnectAddress for 0.0.0.0 address would eventually call 
InetSocketAddress.getHostName(). In MultiNIC environments, this can introduce 
non-deterministic behavior. Imagine you have DNS entries for each of the 
network interfaces and although you bind your service endpoint to all of them, 
you want users to use a specific one (for instance, InfiniBand for better 
performance). InetSocketAddress.getHostName() will return just the machine's 
hostname which will usually resolve to some random network interface of the 
service when the client resolves it. Although service binds to 0.0.0.0, some 
interfaces might be disabled by firewall.

This is why besides RPCUtil.getSocketAddress, we also need 
RPCUtil.updateConnectAddr to explicitly specify connect address which clients 
should use, i.e. a DNS entry pointing to a specific interface.

There are also two cases in the code, where current implementation does not 
work in MultiNIC environments which we fixed:
- MRClientService & TaskAttemptListenerImpl where we had to propagate NM 
hostname through context,
- which is set in ContainerManagerImpl via NodeId from 
YarnConfiguration.NM_ADDRESS (the logic Arpit mentioned in the comment)

Please have a look the the patch version 5, for easier understanding.

Hope this clarifies the initial idea.

Thanks,
Milan


> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, 
> YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, 
> YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-16 Thread Milan Potocnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Potocnik updated YARN-1994:
-

Attachment: YARN-1994.5.patch

[~arpitagarwal], I have attached a diff against latest trunk, hope this works.

As for the logic in AdminService.java, I have simplified it to use 
RPCUtil.updateConnectAddr, since it does the same thing basically as the extra 
code which I removed.


> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
> YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-15 Thread Milan Potocnik (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062281#comment-14062281
 ] 

Milan Potocnik commented on YARN-1994:
--

Both TestFSDownload and TestMemoryApplicationHistoryStore pass on my box and do 
not seem to be related to the change.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
> YARN-1994.3.patch, YARN-1994.4.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-10 Thread Milan Potocnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Potocnik updated YARN-1994:
-

Attachment: YARN-1994.4.patch

Hi guys,

I have attached a slightly updated version of the patch which incorporates your 
changes to test and configuration. The only difference should be that:
 - Adding logic for Timeline service
 - Putting all JHS related bind options under MR_HISTORY_BIND_HOST, instead of 
having 4 options.
 - Did some minor code cleanup

Thanks for reviewing and pushing this!


> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
> YARN-1994.3.patch, YARN-1994.4.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)