[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984919#comment-15984919 ]
Jason Lowe commented on YARN-6523: ---------------------------------- I don't know the full story behind the SystemCredentialsForApps thing. Looks like something that was put in for Slider and other long-running services where the initial tokens can expire. It would be good to get input from [~vinodkv] and [~jianhe] since they were more involved in this. I agree it seems silly for every node in the cluster to get _all_ apps HDFS credentials on _every heartbeat_. I suspect this was the simplest thing to implement, but it's far from efficient. Going to the other extreme of just sending the app credentials only once for just the apps that could be active on the node is a lot more complicated. It's true that RMNodeImpl is tracking what applications are on the node, but this is _reactive_ tracking to what the node is already doing. There are some scenarios where the updated tokens need to be on the node _before_ the container launch request arrives at the node and therefore the app becomes active in the node's RMNodeImpl. For example, a Slider app runs for months. The initial tokens at app submit time have long expired, so the RM has had to re-fetch the tokens. Then suddenly the Slider app wants to launch a container on a node it's never touched before. The node's RMNodeImpl doesn't know the app is active until a container starts running on it, but the container can't localize without the updated tokens that the node has never received yet. So we'd need to send the credentials when the scheduler allocates an app's container on the node for the first time and then also when any of the app's credentials are updated (e.g.: when a token is replaced with a refreshed version). And then there's handling lost heartbeats, node reconnect, etc. In short, efficient delta is a lot more complicated. Rather than going straight to the complicated, fully optimal implementation we could do something in-between. For example, we could have a sequence number associated with the system credentials. Nodes would send the last sequence number that they have received, and if it matches the current sequence number then the RM does _not_ send them in the heartbeat response. If the sequence numbers don't match then the RM sends the current sequence number along with the system credentials. It's still sending all the credentials instead of optimal deltas, but at least they're only being sent when the node needs the updated version. And yes, we should precompute the SystemCredentialsForAppsProto once when the credentials change and re-send the same object to any node that needs the updated credentials rather than recreate the same object over and over and over. That should drastically cut down on the number of objects related to system credentials in heartbeats and how often we're sending them. > RM requires large memory in sending out security tokens as part of Node > Heartbeat in large cluster > -------------------------------------------------------------------------------------------------- > > Key: YARN-6523 > URL: https://issues.apache.org/jira/browse/YARN-6523 > Project: Hadoop YARN > Issue Type: Bug > Components: RM > Affects Versions: 2.8.0, 2.7.3 > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > Priority: Critical > > Currently as part of heartbeat response RM sets all application's tokens > though all applications might not be active on the node. On top of it > NodeHeartbeatResponsePBImpl converts tokens for each app into > SystemCredentialsForAppsProto. Hence for each node and each heartbeat too > many SystemCredentialsForAppsProto objects were getting created. > We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with > 8GB RAM configured for RM -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org