[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604503#comment-14604503
 ] 

zhihai xu commented on YARN-3017:
---------------------------------

I just found this change may cause problem in LogAggregation during rolling 
upgrade with NM-Recovery-supervised enabled.
The following code in 
{{AggregatedLogFormat#getPendingLogFilesToUploadForThisContainer}} will upload 
the log based on the containerId String. So we may miss uploading the old log 
files after upgrade.
{code}
        File containerLogDir =
            new File(appLogDir, ConverterUtils.toString(this.containerId));
        if (!containerLogDir.isDirectory()) {
          continue; // ContainerDir may have been deleted by the user.
        }
        pendingUploadFiles
          .addAll(getPendingLogFilesToUpload(containerLogDir));
{code}
To support this issue, we also need make change in 
{{getPendingLogFilesToUploadForThisContainer}} to compare containerId using 
{{ContainerId#fromString}}.
It looks like it makes sense to keep the old format for compatibility.

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-3017
>                 URL: https://issues.apache.org/jira/browse/YARN-3017
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.8.0
>            Reporter: MUFEED USMAN
>            Assignee: Mohammad Shahid Khan
>            Priority: Minor
>              Labels: PatchAvailable
>         Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch, 
> YARN-3017_3.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_000002
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_000002".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_000001,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: <memory:2048, 
> vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_000002
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to