[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-06-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600358#comment-14600358
 ] 

Robert Kanter commented on YARN-2942:
-

After some offline discussion, we've decided to put this on hold for now.  
There's concerns that HDFS-3689 hasn't had enough time to bake, so building the 
aggregated log changes on top of it isn't a good idea yet.  In the meantime, to 
help alleviate the log problem, I've created MAPREDUCE-6415, which is based on 
the procedure Jason described earlier with HAR files.  

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, 
> ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556926#comment-14556926
 ] 

Karthik Kambatla commented on YARN-2942:


Thanks for your persistence through the multiple versions of this design, 
Robert. I think we have an actionable plan now, thanks Jason and Vinod for your 
inputs on the JIRA and offline. 

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, 
> ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-21 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555382#comment-14555382
 ] 

Robert Kanter commented on YARN-2942:
-

[~kasha], [~vinodkv], [~jlowe], and I had a discussion earlier today about the 
best way to more forward on this.  We came up with a design that mostly 
picks-and-chooses from the previous designs:
- The log files get aggregated to HDFS by each NM as they do now, except we use 
a concat-friendly format (like the one I've used in patches on this JIRA)
- We'd then concat the aggregated log files for an application into a single 
file.  Designs v4 and v5 had this, but they were using ZooKeeper to coordinate 
the NMs concatenating their own file.  Now that the RM knows when all NMs are 
done aggregating, it can take care of the concatenation via a new Service which 
concats the aggregated log files for a particular job into a single file at 
some interval.  (So ZooKeeper isn't required and no coordination is really 
needed)
- In the discussion, we talked about having another new RM service that would 
periodically compact the concatenated files (i.e. copy and replace them) to 
cleanup the blocks.  Ideally, this would be something that HDFS could add 
itself, and we wouldn't need this step.  However, [~kasha] and I talked with 
some HDFS folks and they're not sure this is something they want to put in 
HDFS.  In order to ensure that the compaction doesn't run while the NN is busy, 
they suggested having it triggered by a command that the admin runs (like 
what's done with HDFS balancing).  I think that's a better idea than having the 
RM automatically do it arbitrarily, in the meantime.  If HDFS ever adds this in 
the future, this last step is something that can be easily deprecated.

I'll write a v8 document with the formal details and upload it sometime 
tomorrow.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553384#comment-14553384
 ] 

Karthik Kambatla commented on YARN-2942:


Thanks everyone for the discussion. Clearly, there are trade-offs to make 
between (1) a single aggregation across nodes for an application with a 
slightly higher chance of losing a container's logs if a node were to go down 
vs (2) a two-step aggregation that places more load on HDFS. While looking at 
this trade-off, we should consider HDFS state today and possible improvements 
in the future. If HDFS were to support concurrent-append, option 1 seems like a 
better approach. 



> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-14 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544109#comment-14544109
 ] 

Robert Kanter commented on YARN-2942:
-

The log delays in aggregation should be fine if the logs are available directly 
from the NMs in the meantime.

{quote}Like I was originally saying, we really need all of this functionality 
in the file-system.{quote}
I agree with you that this would be great and make things super easy for us.  
However, I don't see them adding the functionality that we need any time soon.  
Given that, I think we need to come up with our own solution with what we have 
currently available.

{quote}Overall, today's log-aggregation is fairly on the edge...we need to 
think twice before hard-wiring the notion of concurrent log-append right into 
the platform. The ZK solution was less intrusive as it was still on the edge 
with the downside of adding external dependencies.{quote}
I think that the v7 design could also be easily replaced as well.  Most of it 
would live in an RM service, which could be turned off or replaced.  However, 
you are correct that the design based on what Jason said would be more invasive 
and not really replaceable.  That said, I don't think anyone's wanted/tried to 
do that.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540970#comment-14540970
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

bq. Having the RM coordinate the aggregation is similar to my design with ZK, 
but instead of a ZK lock, the RM orchestrates things. I like the idea of 
getting rid of the original aggregation and having the NMs all write to HDFS 
once, in the combined file directly.
Though this is great to have in theory, I'd like to point out that the 
implementation is going to be fraught with (1) many fault-tolerance conditions 
and (2) potentially very long delays in aggregation due to costs of 
coordination and fault-recovery. Like I was [originally 
saying|https://issues.apache.org/jira/browse/YARN-2942?focusedCommentId=14326912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14326912],
 we really need all of this functionality in the file-system.

Overall, today's log-aggregation is fairly on the edge (you can imagine putting 
in a different aggregation mechanism by replacing the module present in the 
NM); we need to think twice before hard-wiring the notion of concurrent 
log-append right into the platform. The ZK solution was less intrusive as it 
was still on the edge with the downside of adding external dependencies.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540407#comment-14540407
 ] 

Jason Lowe commented on YARN-2942:
--

bq. Can you give some more details on this? Is it something you can share?

It's a hack to help mitigate the log aggregation namespace scaling issues on 
our large clusters.  Essentially its a periodic process to run an Oozie 
workflow that does the following:

# determines which applications are good candidates for log archiving (i.e.: 
lots of files and total size is not that big)
# runs a streaming job with a shell script that uses the list of applications 
to aggregate as input
# for each application it runs a local-mode archive job to archive the log 
contents
# when the archive has been created it swaps out the application directory with 
a symlink into the har archive

The symlink makes the archive transparent to the readers.  Both the JHS and the 
"yarn logs" command use FileContext and "just worked" with the symlink into the 
har without modifications.

So yes, we are running a MapReduce job to archive the logs which itself will 
create more logs.  However it processes many application logs for each 
archiving job.  If there is sufficient interest we can pursue how to share it, 
but the script is specific to how we configure our nodes and clusters and 
relies on unsupported symlinks.  I'm hoping the outcome of this JIRA allows us 
to move away from the need for it.

bq. We'd have to implement your last bullet point to have the NMs serve the 
logs in the meantime, as I don't think that's there today. 

That feature is indeed there today.  Links to the app logs on the NM will try 
to serve the local app logs first, then redirect to the log server if the local 
logs are unavailable.  See NMController and ContainerLogsPage.  It only becomes 
an issue when things link to the aggregated log server directly before the NM 
has finished aggregating them.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-12 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540288#comment-14540288
 ] 

Robert Kanter commented on YARN-2942:
-

Thanks [~jlowe] for your feedback.  It's good to get more views on this.

{quote} If I understand them correctly they both propose that the NMs upload 
the original per-node aggregated log to HDFS and then something (either the NMs 
or the RM) later comes along and creates the aggregate-of-aggregates log{quote}
Yes.  That's correct.  

{quote}However I didn't see details on solving the race condition where a log 
reader comes along, sees from the index file that the desired log isn't in the 
aggregate-of-aggregates, then opens the log and reads from it just as the log 
is deleted by the entity appending to the aggregate-of-aggregates.{quote}
That's a good point.  I hadn't thought of that issue.  Thinking about it now, I 
think there's a few options here:
- We could simply have the reader try again if it runs into a problem
- We could have the last NM delete the aggregated log files, so that it's less 
likely that this situation can occur
- Each NM could wait some amount of time (e.g. a few mins) after appending it's 
log file before deleting the original file, so that it's less likely that this 
situation can occur

{quote}We have an internal solution where we create per-application har files 
of the logs{quote}
Can you give some more details on this?  Is it something you can share?  If 
you've already solved this issue, then perhaps we can just use that.  Though 
doesn't creating har files require running an MR job?  

{quote}Another issue from log aggregation we've seen in practice is that the 
proposals don't address the significant write load the per-node aggregate files 
place on the namenode.{quote}
That's a good point.  Shortly after a job finishes, all of the involved NMs 
would upload their log files around the same time, which puts stress on the NN. 
 The NM giving the RM reports of the current aggregation progress was recently 
added by YARN-1376 and related.  Having the RM coordinate the aggregation is 
similar to my design with ZK, but instead of a ZK lock, the RM orchestrates 
things.  I like the idea of getting rid of the original aggregation and having 
the NMs all write to HDFS once, in the combined file directly.  We'd have to 
implement your last bullet point to have the NMs serve the logs in the 
meantime, as I don't think that's there today.  

I'll try to flesh this design out a bit more and see where it goes.  Unless we 
should use har files; though that adds an MR dependency.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538087#comment-14538087
 ] 

Jason Lowe commented on YARN-2942:
--

My apologies for taking so long to respond.  I took a look at the v6 and v7 
proposals.  If I understand them correctly they both propose that the NMs 
upload the original per-node aggregated log to HDFS and then something (either 
the NMs or the RM) later comes along and creates the aggregate-of-aggregates 
log with a side-index for faster searching and ability to correct for failed 
appends.  These are reasonable ideas, and I prefer the simpler approach.  
However I didn't see details on solving the race condition where a log reader 
comes along, sees from the index file that the desired log isn't in the 
aggregate-of-aggregates, then opens the log and reads from it just as the log 
is deleted by the entity appending to the aggregate-of-aggregates.  Since we 
don't have UNIX-style refcounting of open files in HDFS, deleting the log while 
the reader is trying to read from it is going to be disruptive.

One thing to consider in the proposals -- do we want a threshold for a per-node 
log file where we do not try to append it to the aggregate-of-aggregates file?  
We have an internal solution where we create per-application har files of the 
logs, and that process intentionally skips files that are already "big enough" 
on their own.  Saves significant time and network traffic aggregating files 
that are already beefy enough on their own to justify their existence, as we're 
primarily concerned with cleaning up the tiny logs per node, per app.

Another issue from log aggregation we've seen in practice is that the proposals 
don't address the significant write load the per-node aggregate files place on 
the namenode.  This isn't an absolute requirement for the design, but we've 
noticed it's not just about the number of files and blocks being created but 
also the overall write load associated with those files.  It would be really 
nice to reduce that load significantly.  Thinking off the top of my head, one 
possibility is to have the RM coordinate log aggregation across the nodes.  It 
would work something like this:
- NMs do not upload logs for an application to the aggregate file until told to 
do so by the RM (probably in NM heartbeat response)
- NMs provide periodic progress reports in their heartbeat on how aggregation 
is proceeding and when it succeeds/fails.
- RM coordinates and tracks aggregation process (which NM is "active", revoking 
NMs that have taken too long without progress, etc.)
- Logs would remain on NM local disk and served from there until they are 
uploaded into the app aggregate file, similar to how they work today with the 
per-node aggregate file

This has the advantages of only uploading the logs to HDFS once, only as a 
single aggregate file (plus index), and doesn't require ZooKeeper.  A 
significant downside is that it prolongs the average time the logs will be 
available on HDFS for an application due to the serialized upload process.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-01 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524338#comment-14524338
 ] 

Robert Kanter commented on YARN-2942:
-

I've been playing around with the LogAggregationStatus stuff and I think we 
should be able to build on top of it.  I'm working on a new design document 
that I'll hopefully post sometime early next week.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522779#comment-14522779
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

That will definitely simplify things a lot more IMO, we will no longer need a 
ZK dependency on core of YARN (outside if HA).

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-30 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522642#comment-14522642
 ] 

Robert Kanter commented on YARN-2942:
-

Thanks for pointing me to YARN-1376 and related.  I'll have to look into the 
code to get a better idea, but perhaps we can take advantage of this to do a 
completely different approach for combining the logs.  Now that we have a way 
of checking the status of log aggregation across all nodes in the cluster, 
instead of having to use ZK locks to coordinate all the NMs to append the logs, 
we can have a single server append the logs (maybe a small thread pool in the 
RM that handles this?).  We'd still use append, and the new format, but we 
wouldn't need to use ZooKeeper, and using a single Server to do the combining 
should simplify things.  We'd probably need to add a new 
{{LogAggregationStatus}} enums for "COMBINING" and "COMBINED" or something.  
I'll look into this some more, though what do you think [~vinodkv], [~jlowe], 
[~knoguchi]?

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-28 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518576#comment-14518576
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

Tx for the updated docs, [~rkanter]!

The proposal really is a poor man's replacement for the absence of concurrency 
control in HDFS. The good thing about the proposal is that it is not shipping 
logs across the wire multiple times. The challenge is going to be fault 
handling. We need to make sure that there is someone centrally listening to 
node membership changes too (for e.g. to handle lost nodes).

It's sort of spelled out in the doc, but repeating for clarity: I am assuming 
that we still continue to write the per-node file and have an aggregated-file 
by the side. IAC, we should have a way for folks to alternate to this, with 
existing implementation as a backup.

Regarding log-aggregation status, YARN-1376 and friends added some support (I 
am reviewing them after the fact).

I am still interesting in pursuing variable-length files as an orthogonal 
feature. 

/cc [~jlowe], [~knoguchi] who have experience with log aggregation at large 
scale.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482030#comment-14482030
 ] 

Robert Kanter commented on YARN-2942:
-

[~vinodkv], I was discussing this with some of our HDFS people, and they think 
using concat would do less (potentially much less) to actually result in NN 
metadata savings; instead of the original design of using append and rereading 
the files.  I agree that it would be best if HDFS supported atomic append (with 
concurrent writers) and rereading the files isn't ideal, but it seems like the 
original design is the best solution for the issue at hand for now.  Thoughts?

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393717#comment-14393717
 ] 

Robert Kanter commented on YARN-2942:
-

Yes, it does a blocking wait.  I think this will end up being in a separate 
thread anyway because it's being done after uploading the logs to HDFS.  
However, I think making it a separate service is a good idea anyway.  As you 
said, this handles NM restart, and allows us to later add more flexibility.

If you upgrade the JHS before the NM, it's not the end of the world.  New logs 
wouldn't be found by the JHS, but that only hurts users trying to view those 
logs through the JHS.  Once the JHS is updated, they would be viewable.  In any 
case, having the two configs is probably more confusing than it needs to be for 
the user, and we'd have to take care of the case where the new format is 
disabled but concatenation is enabled (which is invalid).  I think we should 
just make this one config: the new format and concatenation is enabled or 
neither is.

I'll post an updated doc shortly.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393700#comment-14393700
 ] 

Karthik Kambatla commented on YARN-2942:


(Canceled the patch to stop Jenkins from evaluating the design doc :) ) 

[~rkanter] - thanks for updating the design doc. A couple of comments:
# If there is an NM X actively concatenating its logs and NM Y can't acquire 
the lock, what happens? 
## Does it do a blocking-wait? If yes, this should likely be in a separate 
thread.
## I would like for it to be non-blocking. How about a LogConcatenationService 
in the NM? This service is brought up if you enable log concatenation. This 
service would periodically go through all of its past aggregated logs and 
concatenate those that it can acquire a lock for. Delayed concatenation should 
be okay because we are doing this primarily to handle the problem HDFS has with 
small files. Also, this way, we don't have do anything different for NM 
restart. Forward looking, this concat service could potentially take input on 
how busy HDFS is. 
# I didn't completely understand the point about a config to specify the 
format. Are you suggesting we have two different on/off configs - one to turn 
on concatenation and one to specify the format JHS should be reading. I think 
just one config that clearly states that the turning on this on an NM (writer) 
requires the JHS (reader) already has this enabled. In case of rolling 
upgrades, this translates to requiring a JHS upgrade prior to NM upgrade.  

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393450#comment-14393450
 ] 

Hadoop QA commented on YARN-2942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709065/ConcatableAggregatedLogsProposal_v4.pdf
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7204//console

This message is automatically generated.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-02-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326958#comment-14326958
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

bq. The problem here is that the aggregated log files are not in an 
append-friendly format (TFile). We'd have to change the file format that 
they're in (perhaps reusing the similar format I created in this patch), but 
this wouldn't be backwards compatible.
Precisely the point, I think we should have an append-friendly format - an 
extension of today's TFile. YARN-2548 also needs the same extension. We can try 
making this a compatible evolution. Even if we cannot, we can simply just 
support both the formats for compat.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-02-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326953#comment-14326953
 ] 

Robert Kanter commented on YARN-2942:
-

{quote}We should try to avoid rereading the entire log file and rewriting 
again. How about we try the concat approach (with variable length blocks) first 
before we try the reread+rewrite?{quote}
The problem here is that the aggregated log files are not in an append-friendly 
format (TFile).  We'd have to change the file format that they're in (perhaps 
reusing the similar format I created in this patch), but this wouldn't be 
backwards compatible.

{quote}The long term solution for the later really is HDFS supporting atomic 
append (with concurrent writers){quote}
This would be very useful.  Even with the design implemented by this patch, it 
sounds like it would eventually allow us to get rid of the ZooKeeper locks.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-02-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326912#comment-14326912
 ] 

Vinod Kumar Vavilapalli commented on YARN-2942:
---

Apologies for coming in real late. I've been thinking about this problem for a 
long time, since before YARN came to Apache :)

I think HDFS-3689 will help a lot in this area. Offline I was requesting HDFS 
folks to help make progress there. Now that that got in, I think we should 
consider using that as the first step. It should help reduce the file-count 
completely, even though the block count problem is still unresolved. The long 
term solution for the later really is HDFS supporting atomic append (with 
concurrent writers) - it's better to get the problem fixed at the storage layer.

We should try to avoid rereading the entire log file and rewriting again. How 
about we try the concat approach (with variable length blocks) first before we 
try the reread+rewrite?

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-02-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326821#comment-14326821
 ] 

Robert Kanter commented on YARN-2942:
-

I've created 4 subtasks (one is in HADOOP):
# HADOOP-11612: Workaround for Curator's ChildReaper requiring Guava 15+
# YARN-3218: Implement CombinedAggregatedLogFormat Reader and Writer
# YARN-3219: Use CombinedAggregatedLogFormat Writer to combine aggregated log 
files
# YARN-3220: JHS should display Combined Aggregated Logs when available

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-02-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326744#comment-14326744
 ] 

Hadoop QA commented on YARN-2942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12699570/CombinedAggregatedLogsProposal_v3.pdf
  against trunk revision 9a3e292.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6662//console

This message is automatically generated.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)