[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449912#comment-15449912 ]
Li Lu commented on YARN-3981: ----------------------------- Thanks [~rohithsharma]! bq. As part of NM daemon, start new service same as TimeLineWriterWebService. Idea is NM reports all these collector address to RM. Introduce new API in clientRMservice to get collector address. Address is given by RM in random(This can be decided later). This address is used by timeline client. TimeLineClient exposes new constructor with an flowName. So system properties can be written at flow level. Actually this looks a little bit similar to the current collector discovery mechanism, where the NM reports app level collector information to RM, and RM distributes such information to all containers. The difference is we need to explicitly decide where and when to launch the collectors. The RM can decide where to launch collectors, but as of now, all collectors are associated with some concrete application's life-cycles (launched as aux-services). We can launch collectors as separate process for this use case? One concern is this will increase the load on the RM again. Not sure if this will be a problem on busy clusters with a lot of client connections. However, this is definitely better than launching a central server daemon to handle all client requests (which falls back to old ATS v1 architecture). For storing those entities posted from clients, can we put them in the entity table, but just leave some unknown fields empty? Will that be a concern for the storage API's semantics? > support timeline clients not associated with an application > ----------------------------------------------------------- > > Key: YARN-3981 > URL: https://issues.apache.org/jira/browse/YARN-3981 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Rohith Sharma K S > Labels: YARN-5355 > > In the current v.2 design, all timeline writes must belong in a > flow/application context (cluster + user + flow + flow run + application). > But there are use cases that require writing data outside the context of an > application. One such example is a higher level client (e.g. tez client or > hive/oozie/cascading client) writing flow-level data that spans multiple > applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org