[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495039#comment-14495039 ]
Zhijie Shen commented on YARN-3051: ----------------------------------- bq. I believe in Timeline Service v.2 it is (cluster id, entity type, entity id) that uniquely identify an entity. Semantically, it matters whether we allow users to define the entity of the same identifier <type, id> in different app or not. If we allow, for example, MR Job_1 can create an entity <CURRENT_USER, zjshen> and MR Job_2 can create another entity <CURRENT_USER, zjshen>. Otherwise, it's going to be a invalid use case to create entity <CURRENT_USER, zjshen> in different apps. This is some rule we need to explicitly tell users if they can do this entity naming more not, though unlike the given example, as far as I can tell, the entity identifier is usually unique enough not to be conflict with each other. And I guess due to this reason, <cluster id, entity type, entity id> is usually sufficient to identify an entity. But I'm not sure it is semantically useful. It means that I have Job_1 and Job_2 run on YARN_Cluster_1, and Job_3 run on YARN_Cluster_2. Then, I can define the entities of the same identifier between Job_1/2 and Job_3, but not between Job_1 and Job_2. bq. The remaining attributes (user id, flow name, flow run id, app id) are part of the primary key, and are required when a new entity is inserted. This may have some issue with the storage too. Since PK will include <user id, flow name, flow run id, app id>, the following two example PKs are going to be valid: * <cluster_1, user_1, flow_1, 1.0, 12345678, *app_1*, entity_type_1, entity_id_1> * <cluster_1, user_1, flow_1, 1.0, 12345678, *app_2*, entity_type_1, entity_id_1> However, if we look at <cluster Id, entity type, entity Id> only, these two entities are going to be duplicated. Then, either we use <cluster Id, entity type, entity Id> or <entity type, entity id> to get the entity, we are likely to get more than one entities. Another problem is that due PK is defined in different schema, we can lookup the entity, but scan through the whole table for it. > [Storage abstraction] Create backing storage read interface for ATS readers > --------------------------------------------------------------------------- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Varun Saxena > Attachments: YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)