[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600358#comment-14600358 ] Robert Kanter commented on YARN-2942: - After some offline discussion, we've decided to put this on hold for now. There's concerns that HDFS-3689 hasn't had enough time to bake, so building the aggregated log changes on top of it isn't a good idea yet. In the meantime, to help alleviate the log problem, I've created MAPREDUCE-6415, which is based on the procedure Jason described earlier with HAR files. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, > ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556926#comment-14556926 ] Karthik Kambatla commented on YARN-2942: Thanks for your persistence through the multiple versions of this design, Robert. I think we have an actionable plan now, thanks Jason and Vinod for your inputs on the JIRA and offline. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, > ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555382#comment-14555382 ] Robert Kanter commented on YARN-2942: - [~kasha], [~vinodkv], [~jlowe], and I had a discussion earlier today about the best way to more forward on this. We came up with a design that mostly picks-and-chooses from the previous designs: - The log files get aggregated to HDFS by each NM as they do now, except we use a concat-friendly format (like the one I've used in patches on this JIRA) - We'd then concat the aggregated log files for an application into a single file. Designs v4 and v5 had this, but they were using ZooKeeper to coordinate the NMs concatenating their own file. Now that the RM knows when all NMs are done aggregating, it can take care of the concatenation via a new Service which concats the aggregated log files for a particular job into a single file at some interval. (So ZooKeeper isn't required and no coordination is really needed) - In the discussion, we talked about having another new RM service that would periodically compact the concatenated files (i.e. copy and replace them) to cleanup the blocks. Ideally, this would be something that HDFS could add itself, and we wouldn't need this step. However, [~kasha] and I talked with some HDFS folks and they're not sure this is something they want to put in HDFS. In order to ensure that the compaction doesn't run while the NN is busy, they suggested having it triggered by a command that the admin runs (like what's done with HDFS balancing). I think that's a better idea than having the RM automatically do it arbitrarily, in the meantime. If HDFS ever adds this in the future, this last step is something that can be easily deprecated. I'll write a v8 document with the formal details and upload it sometime tomorrow. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553384#comment-14553384 ] Karthik Kambatla commented on YARN-2942: Thanks everyone for the discussion. Clearly, there are trade-offs to make between (1) a single aggregation across nodes for an application with a slightly higher chance of losing a container's logs if a node were to go down vs (2) a two-step aggregation that places more load on HDFS. While looking at this trade-off, we should consider HDFS state today and possible improvements in the future. If HDFS were to support concurrent-append, option 1 seems like a better approach. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544109#comment-14544109 ] Robert Kanter commented on YARN-2942: - The log delays in aggregation should be fine if the logs are available directly from the NMs in the meantime. {quote}Like I was originally saying, we really need all of this functionality in the file-system.{quote} I agree with you that this would be great and make things super easy for us. However, I don't see them adding the functionality that we need any time soon. Given that, I think we need to come up with our own solution with what we have currently available. {quote}Overall, today's log-aggregation is fairly on the edge...we need to think twice before hard-wiring the notion of concurrent log-append right into the platform. The ZK solution was less intrusive as it was still on the edge with the downside of adding external dependencies.{quote} I think that the v7 design could also be easily replaced as well. Most of it would live in an RM service, which could be turned off or replaced. However, you are correct that the design based on what Jason said would be more invasive and not really replaceable. That said, I don't think anyone's wanted/tried to do that. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540970#comment-14540970 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- bq. Having the RM coordinate the aggregation is similar to my design with ZK, but instead of a ZK lock, the RM orchestrates things. I like the idea of getting rid of the original aggregation and having the NMs all write to HDFS once, in the combined file directly. Though this is great to have in theory, I'd like to point out that the implementation is going to be fraught with (1) many fault-tolerance conditions and (2) potentially very long delays in aggregation due to costs of coordination and fault-recovery. Like I was [originally saying|https://issues.apache.org/jira/browse/YARN-2942?focusedCommentId=14326912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326912], we really need all of this functionality in the file-system. Overall, today's log-aggregation is fairly on the edge (you can imagine putting in a different aggregation mechanism by replacing the module present in the NM); we need to think twice before hard-wiring the notion of concurrent log-append right into the platform. The ZK solution was less intrusive as it was still on the edge with the downside of adding external dependencies. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540407#comment-14540407 ] Jason Lowe commented on YARN-2942: -- bq. Can you give some more details on this? Is it something you can share? It's a hack to help mitigate the log aggregation namespace scaling issues on our large clusters. Essentially its a periodic process to run an Oozie workflow that does the following: # determines which applications are good candidates for log archiving (i.e.: lots of files and total size is not that big) # runs a streaming job with a shell script that uses the list of applications to aggregate as input # for each application it runs a local-mode archive job to archive the log contents # when the archive has been created it swaps out the application directory with a symlink into the har archive The symlink makes the archive transparent to the readers. Both the JHS and the "yarn logs" command use FileContext and "just worked" with the symlink into the har without modifications. So yes, we are running a MapReduce job to archive the logs which itself will create more logs. However it processes many application logs for each archiving job. If there is sufficient interest we can pursue how to share it, but the script is specific to how we configure our nodes and clusters and relies on unsupported symlinks. I'm hoping the outcome of this JIRA allows us to move away from the need for it. bq. We'd have to implement your last bullet point to have the NMs serve the logs in the meantime, as I don't think that's there today. That feature is indeed there today. Links to the app logs on the NM will try to serve the local app logs first, then redirect to the log server if the local logs are unavailable. See NMController and ContainerLogsPage. It only becomes an issue when things link to the aggregated log server directly before the NM has finished aggregating them. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540288#comment-14540288 ] Robert Kanter commented on YARN-2942: - Thanks [~jlowe] for your feedback. It's good to get more views on this. {quote} If I understand them correctly they both propose that the NMs upload the original per-node aggregated log to HDFS and then something (either the NMs or the RM) later comes along and creates the aggregate-of-aggregates log{quote} Yes. That's correct. {quote}However I didn't see details on solving the race condition where a log reader comes along, sees from the index file that the desired log isn't in the aggregate-of-aggregates, then opens the log and reads from it just as the log is deleted by the entity appending to the aggregate-of-aggregates.{quote} That's a good point. I hadn't thought of that issue. Thinking about it now, I think there's a few options here: - We could simply have the reader try again if it runs into a problem - We could have the last NM delete the aggregated log files, so that it's less likely that this situation can occur - Each NM could wait some amount of time (e.g. a few mins) after appending it's log file before deleting the original file, so that it's less likely that this situation can occur {quote}We have an internal solution where we create per-application har files of the logs{quote} Can you give some more details on this? Is it something you can share? If you've already solved this issue, then perhaps we can just use that. Though doesn't creating har files require running an MR job? {quote}Another issue from log aggregation we've seen in practice is that the proposals don't address the significant write load the per-node aggregate files place on the namenode.{quote} That's a good point. Shortly after a job finishes, all of the involved NMs would upload their log files around the same time, which puts stress on the NN. The NM giving the RM reports of the current aggregation progress was recently added by YARN-1376 and related. Having the RM coordinate the aggregation is similar to my design with ZK, but instead of a ZK lock, the RM orchestrates things. I like the idea of getting rid of the original aggregation and having the NMs all write to HDFS once, in the combined file directly. We'd have to implement your last bullet point to have the NMs serve the logs in the meantime, as I don't think that's there today. I'll try to flesh this design out a bit more and see where it goes. Unless we should use har files; though that adds an MR dependency. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538087#comment-14538087 ] Jason Lowe commented on YARN-2942: -- My apologies for taking so long to respond. I took a look at the v6 and v7 proposals. If I understand them correctly they both propose that the NMs upload the original per-node aggregated log to HDFS and then something (either the NMs or the RM) later comes along and creates the aggregate-of-aggregates log with a side-index for faster searching and ability to correct for failed appends. These are reasonable ideas, and I prefer the simpler approach. However I didn't see details on solving the race condition where a log reader comes along, sees from the index file that the desired log isn't in the aggregate-of-aggregates, then opens the log and reads from it just as the log is deleted by the entity appending to the aggregate-of-aggregates. Since we don't have UNIX-style refcounting of open files in HDFS, deleting the log while the reader is trying to read from it is going to be disruptive. One thing to consider in the proposals -- do we want a threshold for a per-node log file where we do not try to append it to the aggregate-of-aggregates file? We have an internal solution where we create per-application har files of the logs, and that process intentionally skips files that are already "big enough" on their own. Saves significant time and network traffic aggregating files that are already beefy enough on their own to justify their existence, as we're primarily concerned with cleaning up the tiny logs per node, per app. Another issue from log aggregation we've seen in practice is that the proposals don't address the significant write load the per-node aggregate files place on the namenode. This isn't an absolute requirement for the design, but we've noticed it's not just about the number of files and blocks being created but also the overall write load associated with those files. It would be really nice to reduce that load significantly. Thinking off the top of my head, one possibility is to have the RM coordinate log aggregation across the nodes. It would work something like this: - NMs do not upload logs for an application to the aggregate file until told to do so by the RM (probably in NM heartbeat response) - NMs provide periodic progress reports in their heartbeat on how aggregation is proceeding and when it succeeds/fails. - RM coordinates and tracks aggregation process (which NM is "active", revoking NMs that have taken too long without progress, etc.) - Logs would remain on NM local disk and served from there until they are uploaded into the app aggregate file, similar to how they work today with the per-node aggregate file This has the advantages of only uploading the logs to HDFS once, only as a single aggregate file (plus index), and doesn't require ZooKeeper. A significant downside is that it prolongs the average time the logs will be available on HDFS for an application due to the serialized upload process. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524338#comment-14524338 ] Robert Kanter commented on YARN-2942: - I've been playing around with the LogAggregationStatus stuff and I think we should be able to build on top of it. I'm working on a new design document that I'll hopefully post sometime early next week. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522779#comment-14522779 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- That will definitely simplify things a lot more IMO, we will no longer need a ZK dependency on core of YARN (outside if HA). > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522642#comment-14522642 ] Robert Kanter commented on YARN-2942: - Thanks for pointing me to YARN-1376 and related. I'll have to look into the code to get a better idea, but perhaps we can take advantage of this to do a completely different approach for combining the logs. Now that we have a way of checking the status of log aggregation across all nodes in the cluster, instead of having to use ZK locks to coordinate all the NMs to append the logs, we can have a single server append the logs (maybe a small thread pool in the RM that handles this?). We'd still use append, and the new format, but we wouldn't need to use ZooKeeper, and using a single Server to do the combining should simplify things. We'd probably need to add a new {{LogAggregationStatus}} enums for "COMBINING" and "COMBINED" or something. I'll look into this some more, though what do you think [~vinodkv], [~jlowe], [~knoguchi]? > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518576#comment-14518576 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- Tx for the updated docs, [~rkanter]! The proposal really is a poor man's replacement for the absence of concurrency control in HDFS. The good thing about the proposal is that it is not shipping logs across the wire multiple times. The challenge is going to be fault handling. We need to make sure that there is someone centrally listening to node membership changes too (for e.g. to handle lost nodes). It's sort of spelled out in the doc, but repeating for clarity: I am assuming that we still continue to write the per-node file and have an aggregated-file by the side. IAC, we should have a way for folks to alternate to this, with existing implementation as a backup. Regarding log-aggregation status, YARN-1376 and friends added some support (I am reviewing them after the fact). I am still interesting in pursuing variable-length files as an orthogonal feature. /cc [~jlowe], [~knoguchi] who have experience with log aggregation at large scale. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CombinedAggregatedLogsProposal_v6.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482030#comment-14482030 ] Robert Kanter commented on YARN-2942: - [~vinodkv], I was discussing this with some of our HDFS people, and they think using concat would do less (potentially much less) to actually result in NN metadata savings; instead of the original design of using append and rereading the files. I agree that it would be best if HDFS supported atomic append (with concurrent writers) and rereading the files isn't ideal, but it seems like the original design is the best solution for the issue at hand for now. Thoughts? > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393717#comment-14393717 ] Robert Kanter commented on YARN-2942: - Yes, it does a blocking wait. I think this will end up being in a separate thread anyway because it's being done after uploading the logs to HDFS. However, I think making it a separate service is a good idea anyway. As you said, this handles NM restart, and allows us to later add more flexibility. If you upgrade the JHS before the NM, it's not the end of the world. New logs wouldn't be found by the JHS, but that only hurts users trying to view those logs through the JHS. Once the JHS is updated, they would be viewable. In any case, having the two configs is probably more confusing than it needs to be for the user, and we'd have to take care of the case where the new format is disabled but concatenation is enabled (which is invalid). I think we should just make this one config: the new format and concatenation is enabled or neither is. I'll post an updated doc shortly. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393700#comment-14393700 ] Karthik Kambatla commented on YARN-2942: (Canceled the patch to stop Jenkins from evaluating the design doc :) ) [~rkanter] - thanks for updating the design doc. A couple of comments: # If there is an NM X actively concatenating its logs and NM Y can't acquire the lock, what happens? ## Does it do a blocking-wait? If yes, this should likely be in a separate thread. ## I would like for it to be non-blocking. How about a LogConcatenationService in the NM? This service is brought up if you enable log concatenation. This service would periodically go through all of its past aggregated logs and concatenate those that it can acquire a lock for. Delayed concatenation should be okay because we are doing this primarily to handle the problem HDFS has with small files. Also, this way, we don't have do anything different for NM restart. Forward looking, this concat service could potentially take input on how busy HDFS is. # I didn't completely understand the point about a config to specify the format. Are you suggesting we have two different on/off configs - one to turn on concatenation and one to specify the format JHS should be reading. I think just one config that clearly states that the turning on this on an NM (writer) requires the JHS (reader) already has this enabled. In case of rolling upgrades, this translates to requiring a JHS upgrade prior to NM upgrade. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393450#comment-14393450 ] Hadoop QA commented on YARN-2942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709065/ConcatableAggregatedLogsProposal_v4.pdf against trunk revision 6a6a59d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7204//console This message is automatically generated. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326958#comment-14326958 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- bq. The problem here is that the aggregated log files are not in an append-friendly format (TFile). We'd have to change the file format that they're in (perhaps reusing the similar format I created in this patch), but this wouldn't be backwards compatible. Precisely the point, I think we should have an append-friendly format - an extension of today's TFile. YARN-2548 also needs the same extension. We can try making this a compatible evolution. Even if we cannot, we can simply just support both the formats for compat. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326953#comment-14326953 ] Robert Kanter commented on YARN-2942: - {quote}We should try to avoid rereading the entire log file and rewriting again. How about we try the concat approach (with variable length blocks) first before we try the reread+rewrite?{quote} The problem here is that the aggregated log files are not in an append-friendly format (TFile). We'd have to change the file format that they're in (perhaps reusing the similar format I created in this patch), but this wouldn't be backwards compatible. {quote}The long term solution for the later really is HDFS supporting atomic append (with concurrent writers){quote} This would be very useful. Even with the design implemented by this patch, it sounds like it would eventually allow us to get rid of the ZooKeeper locks. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326912#comment-14326912 ] Vinod Kumar Vavilapalli commented on YARN-2942: --- Apologies for coming in real late. I've been thinking about this problem for a long time, since before YARN came to Apache :) I think HDFS-3689 will help a lot in this area. Offline I was requesting HDFS folks to help make progress there. Now that that got in, I think we should consider using that as the first step. It should help reduce the file-count completely, even though the block count problem is still unresolved. The long term solution for the later really is HDFS supporting atomic append (with concurrent writers) - it's better to get the problem fixed at the storage layer. We should try to avoid rereading the entire log file and rewriting again. How about we try the concat approach (with variable length blocks) first before we try the reread+rewrite? > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326821#comment-14326821 ] Robert Kanter commented on YARN-2942: - I've created 4 subtasks (one is in HADOOP): # HADOOP-11612: Workaround for Curator's ChildReaper requiring Guava 15+ # YARN-3218: Implement CombinedAggregatedLogFormat Reader and Writer # YARN-3219: Use CombinedAggregatedLogFormat Writer to combine aggregated log files # YARN-3220: JHS should display Combined Aggregated Logs when available > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326744#comment-14326744 ] Hadoop QA commented on YARN-2942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699570/CombinedAggregatedLogsProposal_v3.pdf against trunk revision 9a3e292. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6662//console This message is automatically generated. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)