[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

Zhijie Shen (JIRA) Tue, 14 Apr 2015 15:34:13 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495039#comment-14495039
 ]


Zhijie Shen commented on YARN-3051:
-----------------------------------

bq. I believe in Timeline Service v.2 it is (cluster id, entity type, entity 
id) that uniquely identify an entity.

Semantically, it matters whether we allow users to define the entity of the 
same identifier <type, id> in different app or not. If we allow, for example, 
MR Job_1 can create an entity <CURRENT_USER, zjshen> and MR Job_2 can create 
another entity <CURRENT_USER, zjshen>. Otherwise, it's going to be a invalid 
use case to create entity <CURRENT_USER, zjshen> in different apps.

This is some rule we need to explicitly tell users if they can do this entity 
naming more not, though unlike the given example, as far as I can tell, the 
entity identifier is usually unique enough not to be conflict with each other. 
And I guess due to this reason, <cluster id, entity type, entity id> is usually 
sufficient to identify an entity. But I'm not sure it is semantically useful. 
It means that I have Job_1 and Job_2 run on YARN_Cluster_1, and Job_3 run on 
YARN_Cluster_2. Then, I can define the entities of the same identifier between 
Job_1/2 and Job_3, but not between Job_1 and Job_2.

bq. The remaining attributes (user id, flow name, flow run id, app id) are part 
of the primary key, and are required when a new entity is inserted. 

This may have some issue with the storage too. Since PK will include <user id, 
flow name, flow run id, app id>, the following two example PKs are going to be 
valid:

* <cluster_1, user_1, flow_1, 1.0, 12345678, *app_1*, entity_type_1, 
entity_id_1>
* <cluster_1, user_1, flow_1, 1.0, 12345678, *app_2*, entity_type_1, 
entity_id_1>

However, if we look at <cluster Id, entity type, entity Id> only, these two 
entities are going to be duplicated. Then, either we use <cluster Id, entity 
type, entity Id> or <entity type, entity id> to get the entity, we are likely 
to get more than one entities. Another problem is that due PK is defined in 
different schema, we can lookup the entity, but scan through the whole table 
for it.


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3051
>                 URL: https://issues.apache.org/jira/browse/YARN-3051
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>         Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

Reply via email to