[ 
https://issues.apache.org/jira/browse/YARN-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831199#comment-13831199
 ] 

ledion bitincka commented on YARN-1440:
---------------------------------------

{quote}
Storing the logs in HDFS 1-to-1 as they appear in the container log directories 
on the nodes would be a lot of files.
{quote}

[~jlowe] - from my understanding the NodeManager creates one TFile for *each* 
container executed, within which it then encodes and stores all the log files 
that the container created. For example, for an MR application the TFile would 
contain stdout, stderr and syslog - usually the first two are of size 0, while 
syslog contains the app's logs. Therefore, there's no real reduction in the 
number of files created. How common is it for other YARN apps to have more than 
one log file?

{quote}
Would it be helpful for YARN to supply a public API that reads the files for 
you?
{quote}

[~sandyr] - that would be helpful, however simple flat files would be the best 
api, thus all the tools available for HDFS files would be available for log 
files too.

> Yarn aggregated logs are difficult for external tools to understand
> -------------------------------------------------------------------
>
>                 Key: YARN-1440
>                 URL: https://issues.apache.org/jira/browse/YARN-1440
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: ledion bitincka
>              Labels: log-aggregation, logs, tfile, yarn
>
> The log aggregation feature in Yarn is awesome! However, the file type and 
> format in which the log files are aggregated into (TFile) should either be 
> much simpler or be made pluggable. The current TFile format forces anyone who 
> wants to see the files to either 
> a) use the web UI
> b) use the CLI tools (yarn logs)  or 
> c) write custom code to read the files 
> My suggestion would be to simplify the log collection by collecting and 
> writing the raw log files into a directory structure as follows: 
> {noformat}
> /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} 
> {noformat}
> This way the application developers can (re)use a much wider array of tools 
> to process the logs. 
> For the readers who are not familiar with logs and their format you can find 
> more info the following two blog posts:
> http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
> http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to