[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872300#comment-13872300
 ] 

Robert Joseph Evans commented on YARN-1530:
-------------------------------------------

I agree that we need to think about load and plan for something that can handle 
at least 20x the current load but preferably 100x.  However, I am not that sure 
that the load will be a huge problem at least for current MR clusters.  We have 
seen very large jobs as well, but 700 MB history file job does not finish 
instantly.  I took a look at a 3500 node cluster we have that is under fairly 
heavy load, and looking at the done directory for yesterday, I saw what 
amounted to about 1.7MB/sec of data on average.  Gigabit Ethernet should be 
able to handle 15 to 20 times this (assuming that we read as much data as we 
write, and that the storage may require some replication).

I am fine with the proposed solution by [~lohit] so long as the history service 
always provides a restful interface and the AM can decide if it wants to use 
it, or go through a different higher load channel.  Otherwise non-java based 
AMs would not necessarily be able to write to the history service.

I am also a bit nervous about using the history service for recovery or as a 
backend for the current MR APIs if we have a pub/sub system as a link between 
the applications and the history service.  I don't think it is a show stopper, 
it just opens the door for a number of corner cases that will have to be dealt 
with, like an MR AM crashes badly and the client goes to the history service to 
get the counters/etc, when does the history service know that all of the events 
for the MR AM have been processed so it can return those counters, or perhaps 
other data?  I am not totally sure what data may be a show stopper for this, 
but the lag means all applications have to be sure that they don't use the 
history service for split brain problems or things like that.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: application timeline design-20140108.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to