[ 
https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180554#comment-14180554
 ] 

Xuan Gong commented on YARN-2724:
---------------------------------

As [~mitdesai] mentioned, "the problem here is due to calculation of file 
length before even trying to open the file. Log aggregator reads the file 
length of the log file that is to be aggregated and records it. Then it tries 
to go and read the file contents."

For the issue reported by [~sumitmohanty], it is because of file permission. We 
can not aggregate the log file.

Looking at the code
{code}
        final long fileLength = logFile.length();
        // Write the logFile Type
        out.writeUTF(logFile.getName());

        // Write the log length as UTF so that it is printable
        out.writeUTF(String.valueOf(fileLength));

        // Write the log itself
        FileInputStream in = null;
        try {
          in = SecureIOUtils.openForRead(logFile, getUser(), null);
          byte[] buf = new byte[65535];
          int len = 0;
          long bytesLeft = fileLength;
          while ((len = in.read(buf)) != -1) {
            //If buffer contents within fileLength, write
            if (len < bytesLeft) {
              out.write(buf, 0, len);
              bytesLeft-=len;
            }
            //else only write contents within fileLength, then exit early
            else {
              out.write(buf, 0, (int)bytesLeft);
              break;
            }
          }
          long newLength = logFile.length();
          if(fileLength < newLength) {
            LOG.warn("Aggregated logs truncated by approximately "+
                (newLength-fileLength) +" bytes.");
          }
          this.uploadedFiles.add(logFile);
        } catch (IOException e) {
          String message = "Error aggregating log file. Log file : "
              + logFile.getAbsolutePath() + e.getMessage();
          LOG.error(message, e);
          out.write(message.getBytes());
        } finally {
          if (in != null) {
            in.close();
          }
        }
{code}
Excluding the permission issue, there will be more issues which can cause the 
same problem.


> If an unreadable file is encountered during log aggregation then aggregated 
> file in HDFS badly formed
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2724
>                 URL: https://issues.apache.org/jira/browse/YARN-2724
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: log-aggregation
>    Affects Versions: 2.5.1
>            Reporter: Sumit Mohanty
>            Assignee: Xuan Gong
>
> Look into the log output snippet. It looks like there is an issue during 
> aggregation when an unreadable file is encountered. Likely, this results in 
> bad encoding.
> {noformat}
> LogType: command-13.json
> LogLength: 13934
> Log Contents:
> Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json
>  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_000004/command-3.json
>  (Permission denied)
>               
> errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+0000: 5.134: 
> [GC2014-10-21T04:45:12.046+0000: 5.134: [ParNew: 163840K->15575K(184320K), 
> 0.0488700 secs] 163840K->15575K(1028096K), 0.0492510 secs] [Times: user=0.06 
> sys=0.01, real=0.05 secs]
> 2014-10-21T04:45:14.939+0000: 8.027: [GC2014-10-21T04:45:14.939+0000: 8.027: 
> [ParNew: 179415K->11865K(184320K), 0.0941310 secs] 179415K->17228K(1028096K), 
> 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs]
> 2014-10-21T04:46:42.099+0000: 95.187: [GC2014-10-21T04:46:42.099+0000: 
> 95.187: [ParNew: 175705K->12802K(184320K), 0.0466420 secs] 
> 181068K->18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, 
> real=0.04 secs]
> {noformat}
> Specifically, look at the text after the exception text. There should be two 
> more entries for log files but none exist. This is likely due to the fact 
> that command-13.json is expected to be of length 13934 but its is not as the 
> file was never read.
> I think, it should have been
> {noformat}
> LogType: command-13.json
> LogLength: <Length of the exception text>
> Log Contents:
> Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-13.json
>  (Permission denied)command-3.json13983Error aggregating log file. Log file : 
> /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_000004/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_000004/command-3.json
>  (Permission denied)
> {noformat}
> {noformat}
> LogType: errors-3.txt
> LogLength:0
> Log Contents:
> {noformat}
> {noformat}
> LogType:gc.log
> LogLength:???
> Log Contents:
> ......-20141021044514484052014-10-21T04:45:12.046+0000: 5.134: 
> [GC2014-10-21T04:45:12.046+0000: 5.134: [ParNew: 163840K- .......
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to