[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604503#comment-14604503 ]
zhihai xu commented on YARN-3017: --------------------------------- I just found this change may cause problem in LogAggregation during rolling upgrade with NM-Recovery-supervised enabled. The following code in {{AggregatedLogFormat#getPendingLogFilesToUploadForThisContainer}} will upload the log based on the containerId String. So we may miss uploading the old log files after upgrade. {code} File containerLogDir = new File(appLogDir, ConverterUtils.toString(this.containerId)); if (!containerLogDir.isDirectory()) { continue; // ContainerDir may have been deleted by the user. } pendingUploadFiles .addAll(getPendingLogFilesToUpload(containerLogDir)); {code} To support this issue, we also need make change in {{getPendingLogFilesToUploadForThisContainer}} to compare containerId using {{ContainerId#fromString}}. It looks like it makes sense to keep the old format for compatibility. > ContainerID in ResourceManager Log Has Slightly Different Format From > AppAttemptID > ---------------------------------------------------------------------------------- > > Key: YARN-3017 > URL: https://issues.apache.org/jira/browse/YARN-3017 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.8.0 > Reporter: MUFEED USMAN > Assignee: Mohammad Shahid Khan > Priority: Minor > Labels: PatchAvailable > Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch, > YARN-3017_3.patch > > > Not sure if this should be filed as a bug or not. > In the ResourceManager log in the events surrounding the creation of a new > application attempt, > ... > ... > 2014-11-14 17:45:37,258 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching > masterappattempt_1412150883650_0001_000002 > ... > ... > The application attempt has the ID format "_1412150883650_0001_000002". > Whereas the associated ContainerID goes by "_1412150883650_0001_02_". > ... > ... > 2014-11-14 17:45:37,260 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting > up > container Container: [ContainerId: container_1412150883650_0001_02_000001, > NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: <memory:2048, > vCores:1, > disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service: > 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_000002 > ... > ... > Curious to know if this is kept like that for a reason. If not while using > filtering tools to, say, grep events surrounding a specific attempt by the > numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)