[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143401#comment-14143401
 ] 

bc Wong commented on YARN-1530:
-------------------------------

Hi [~zjshen]. First, glad to see that we're discussing approaches. You seem to 
agree with the premise that *ATS write path should not slow down apps*.

bq. Therefore, is making the timeline server reliable (or always-up) the 
essential solution? If the timeline server is reliable, ...

In theory, you can make the ATS *always-up*. In practice, we both know what 
real life distributed systems do. "Always-up" isn't the only thing. The write 
path needs to have good uptime and latency regardless of what's happening to 
the read path or the backing store.

HDFS is a good default for the write channel because:
* We don't have to design an ATS that is always-up. If you really want to, I'm 
sure you can eventually build something with good uptime. But it took other 
projects (HDFS, ZK) lots of hard work to get to that point.
* If we reuse HDFS, cluster admins know how to operate HDFS and get good uptime 
from it. But it'll take training and hard-learned lessons for operators to 
figure out how to get good uptime from ATS, even after you build an always-up 
ATS.
* All the popular YARN app frameworks (MR, Spark, etc.) already rely on HDFS by 
default. So do most of the 3rd party applications that I know of. 
Architecturally, it seems easier for admins to accept that ATS write path 
depends on HDFS for reliability, instead of a new component that (we claim) 
will be as reliable as HDFS/ZK.

bq. given the whole roadmap of the timeline service, let's think critically of 
work that can improve the timeline service most significantly.

Exactly. Strong +1. If we can drop the high uptime + low write latency 
requirement from the ATS service, we can avoid tons of effort. ATS doesn't need 
to be as reliable as HDFS. We don't need to worry about insulating the write 
path from the read path. We don't need to worry about occasional hiccups in 
HBase (or whatever the store is). And at the end of all this, everybody sleeps 
better because "ATS service going down" isn't a catastrophic failure.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
> ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
> application timeline design-20140116.pdf, application timeline 
> design-20140130.pdf, application timeline design-20140210.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework 
> data all by itself as YARN doesn't have a common solution. This JIRA attempts 
> to solve the storage, management and serving of per-framework data from 
> various applications, both running and finished. The aim is to change YARN to 
> collect and store data in a generic manner with plugin points for frameworks 
> to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to