[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081341#comment-14081341 ] Milan Potocnik commented on YARN-1994: -- [~cwelch] looks good, thanks for the effort! +1 from me > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079104#comment-14079104 ] Milan Potocnik commented on YARN-1994: -- I agree it is tricky to hunt down all service endpoints and make sure they support proper hostname. Also, when new endpoints are added, they have to be aware of this convention. I guess some logic could be added to RPC.Server in the long run. Few notes, though. We have experienced issues on Windows, when client and service are on the same machine. It turns out that in certain situations when client is resolving the connect address (which has actual ('main') hostname), it does not go to DNS Server, but rather performs it locally and in some cases might return an unwanted IP address (since the machine itself is aware of all of its network interfaces). If special hostname is used ('hostname-IB' in my earlier example), resolve will go to DNS Server and everything will work. The proposed approach is not unprecedented, HDFS has similar functionality, you can specify custom hostnames for components (so that InetAddress.getLocalHost().getHostName() is never called). Please have a look at: - fs.default.fs - you can specify custom hostname that namenode will use - dfs.namenode.rpc-bind-host - can be set to 0.0.0.0 in that case - dfs.datanode.hostname - can be used to specify custom datanode hostname - dfs.datanode.address, dfs.datanode.ipc.address, etc... - can be set to 0.0.0.0 in that case - dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname also need to be set So I think it would make sense to have similar functionality available in YARN/MR as well. Thanks, Milan > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078412#comment-14078412 ] Milan Potocnik commented on YARN-1994: -- [~cwelch] I'll try to explain one of the use cases. Let's say we have following interfaces in our network: - 1 ethernet, public network - 2 IB, private network. Please note that on Windows, IB does not support teaming On DNS Server, DNS entry for machine's hostname can resolve to any of the three interfaces (for each 'hostname' entry - three IP addresses). We also add a special DNS entry for each machine that resolves only to two IB interfaces, let's say in the form of 'hostname-IB'. Use case 1: We want internal communication in the cluster to always use IB. We also want to be fault tolerant if one of the IB fails (remember, no teaming on Windows). In order to bind to both IB interfaces, we must set bind address to 0.0.0.0. When this is set, clients when connecting will currently get hostname, which in some cases (DNS server usually returns IPs by round-robin) will resolve to Ethernet IP address, which could be blocked by firewall, or it might degrade performance in internal communication. By setting _BIND_HOST to 0.0.0.0 and _ADDRESS to 'hostname-IB' we avoid the non-determinism of InetSocketAddress.getHostName() For outside clients we can also control connectivity by making sure they connect via the public network, but this is a simpler problem, since they would use different DNS server. > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078240#comment-14078240 ] Milan Potocnik commented on YARN-1994: -- Hi [~cwelch], [~arpitagarwal], I think some clarification is needed here. Initial reason we wanted to introduce _BIND_HOST options was to provide deterministic behaviors when clients try to connect to a service endpoint which is listening on all interfaces (0.0.0.0). In short, _BIND_HOST is what services use to bind, _ADDRESS is what clients should use to connect. This way, everything is deterministic. In Multi NIC environments, with default implementation, calls to conf.updateConnectAddress for 0.0.0.0 address would eventually call InetSocketAddress.getHostName(). In MultiNIC environments, this can introduce non-deterministic behavior. Imagine you have DNS entries for each of the network interfaces and although you bind your service endpoint to all of them, you want users to use a specific one (for instance, InfiniBand for better performance). InetSocketAddress.getHostName() will return just the machine's hostname which will usually resolve to some random network interface of the service when the client resolves it. Although service binds to 0.0.0.0, some interfaces might be disabled by firewall. This is why besides RPCUtil.getSocketAddress, we also need RPCUtil.updateConnectAddr to explicitly specify connect address which clients should use, i.e. a DNS entry pointing to a specific interface. There are also two cases in the code, where current implementation does not work in MultiNIC environments which we fixed: - MRClientService & TaskAttemptListenerImpl where we had to propagate NM hostname through context, - which is set in ContainerManagerImpl via NodeId from YarnConfiguration.NM_ADDRESS (the logic Arpit mentioned in the comment) Please have a look the the patch version 5, for easier understanding. Hope this clarifies the initial idea. Thanks, Milan > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Potocnik updated YARN-1994: - Attachment: YARN-1994.5.patch [~arpitagarwal], I have attached a diff against latest trunk, hope this works. As for the logic in AdminService.java, I have simplified it to use RPCUtil.updateConnectAddr, since it does the same thing basically as the extra code which I removed. > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, > YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062281#comment-14062281 ] Milan Potocnik commented on YARN-1994: -- Both TestFSDownload and TestMemoryApplicationHistoryStore pass on my box and do not seem to be related to the change. > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, > YARN-1994.3.patch, YARN-1994.4.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Potocnik updated YARN-1994: - Attachment: YARN-1994.4.patch Hi guys, I have attached a slightly updated version of the patch which incorporates your changes to test and configuration. The only difference should be that: - Adding logic for Timeline service - Putting all JHS related bind options under MR_HISTORY_BIND_HOST, instead of having 4 options. - Did some minor code cleanup Thanks for reviewing and pushing this! > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, > YARN-1994.3.patch, YARN-1994.4.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)