[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156127#comment-14156127 ]
Zhijie Shen commented on YARN-2468: ----------------------------------- bq. I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value. I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the member variable. getPendingLogFilesToUploadForThisContainer can return this collection, and pass it into LogValue.write by adding one param of it. 2. IMHO, the following code can be improved. If we use iterator, we can delete the unnecessary element on the fly. {code} for (File file : candidates) { Matcher fileMatcher = filterPattern.matcher(file.getName()); if (fileMatcher.find()) { filteredFiles.add(file); } } if (!exclusion) { return filteredFiles; } else { candidates.removeAll(filteredFiles); return candidates; } {code} This block could be: {code} ... while(candidatesItr.hasNext()) { candidate = candidatesItr.next(); ... if ((not match && inclusive) || (match && exclusive)) { candidatesItr.remove() } } {code} 3. [~jianhe] mentioned to me before that the following condition is not always true to determine an AM container. Any idea? And it seems that we don't need shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set wasContainerSuccessful to true. Personally, if it's not trivial to identify the AM container, I prefer to write a TODO comment and leave it until we implement the log retention API. {code} if (containerId.getId() == 1) { return true; } {code} bq. It seems to be, let's validate this via a test-case. Is it addressed by {code} this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600); {code} Is it better to add a line of comment of the rationale behind the config? 5. Can the following code {code} Set<ContainerId> finishedContainers = new HashSet<ContainerId>(); for (ContainerId id : pendingContainerInThisCycle) { finishedContainers.add(id); } {code} be simplified as {code} Set<ContainerId> finishedContainers = new HashSet<ContainerId>(pendingContainerInThisCycle); {code} > Log handling for LRS > -------------------- > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager > Reporter: Xuan Gong > Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, > YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, > YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)