Varun Saxena created YARN-2983:
----------------------------------
Summary: NPE possible in ClientRMService#getQueueInfo
Key: YARN-2983
URL: https://issues.apache.org/jira/browse/YARN-2983
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Varun Saxena
Assignee: Varun Saxena
While going through code for checking YARN-2978 , found one issue.
During construction GetQueueInfoResponse in {{ClientRMService#getQueueInfo}},
we first collect application attempts from scheduler and then get apps from a
ConcurrentHashMap in {{RMContext}}. Although the operation(get/put/remove,etc)
itself on a ConcurrentHashMap is thread-safe, but a series of multiple
ConcurrentHashMap#get (say, in a for loop) is not.
For instance, in code below, we are calling rmContext.getRMApps()#get in a
loop. Now a ConcurrentHashMap#get can return null if the key doesnt exist. But
there is no null check inside this for loop before dereferencing the value
returned i.e. rmApp. Although all the applicationattempts have been fetched
for the queue just above the for loop, but as this block of code is not
synchronized, there is a possibility that another thread may delete RMApp from
the ConcurrentHashMap at the same time. This can happen when an app
finishes/completes and number of completed apps exceed the config
{{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can
occur.
{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
throws YarnException {
.....
if (request.getIncludeApplications()) {
List<ApplicationAttemptId> apps =
scheduler.getAppsInQueue(request.getQueueName());
appReports = new ArrayList<ApplicationReport>(apps.size());
for (ApplicationAttemptId app : apps) {
RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
appReports.add(rmApp.createAndGetApplicationReport(null, true));
}
}
......
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)