[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429456#comment-16429456
 ] 

Haibo Chen commented on YARN-3401:
----------------------------------

The goal/scope of this Jira to is ensure that AMs cannot forge or tamper with 
data generated by YARN iteself, which we code as system entities

Here are some thoughts to get the discussion going. An entity can be 
categorized as either

1) a system entity generated by YARN including FLOW_ACTIVITY, FLOW_RUN, 
YARN_APPLICATION, YARN_APPLICATION_ATTEMPT, YARN_CONTAINER (,YARN_QUEUE and 
YARN_USER in the future), that can only be posted/modified by YARN

or

2) an entity generated by AM which can be either a subAppEntity of any custom 
type, or an application entity of any custom type.

 

The proposal is:

a) Since YARN does not write any SubAppEntity, AMs are free to do whatever they 
like with SubAppEntity. Upon receiving any SubAppEntity, there is no check 
being done, just store the SubAppEntity into SubAppEntityTable

b) For entities in scopes other than application, FLOW_ACTIVITY or FLOW_RUN is 
populated within HBaseTimelineWriter upon application creation/finish events, 
we shall ignore any entity of such type sent from anyone;

c) Both YARN and AMs can generate application-scoped entities.  We can reserve 
YARN_APPLICATION, YARN_APPLICATION_ATTEMPT and YARN_CONTAINER. AMs are free to 
create application entities of any custom type excluding those that are 
reserved.  In terms of where each type of application entity is stored in HBase,
|Source|Type|Destination|
|yarn|YARN_APPLICATION|ApplicationTable|
|yarn|YARN_APP_ATTEMPT|EntityTable|
|yarn|YARN_CONTAINER|EntityTable|
|am|any unreserved type|EntityTable|

*A  prerequisite to achieve this is that we can identify the source of the 
entity, or from whom an entity is sent.* The user indicated in the REST request 
should be a good indicator of the source of an entity (If AMs are running as 
yarn user, that means the admin don't care about security at all, AMs can do 
all sorts of crazy things in addition to override/forge system entities). 

In terms of pseudo-code, the above proposal is
{code:java}
HBaseTimelineWriterImpl.writeEntity(TimelineCollectorContext context,
      TimelineEntity data, UserGroupInformation callerUgi) {
  
   if (data instanceof SubAppEntity) {
      //store data in SubAppEntityTable
   } 
   else if (data.type == YARN_FLOW_ACTIVITY || data.type == YARN_FLOW_RUN) {
      //ignore
   }   
   else if (data.type == YARN_APPLICATION) {
      // verify callerUgi is yarn && store data in ApplicationTable
      // update flow run upon application creation/finish events
   } 
   else if (data.type == YARN_CONTAINER || data.type == 
YARN_APPLICATION_ATTEMPT) {
      //verify callerUgi is yarn && store data in EntityTable
   } 
   else {
      // this is a custom application entity
      // store data in EntityTable
   }
}{code}
The complication, IMO, is around the aggregated application-level metrics 
posted by TimelineCollector. There are metrics such as MEMORY and CPU that are 
rolled up from container metrics. In that sense, MEMORY and CPU posted by 
TimelineCollector are system entities. And there are other aggregates rolled up 
from AM custom metrics. 

TimelineCollectors today run as yarn inside NMs, and they write 
application-level aggregated metrics as YARN_APPLICATION entities (system 
entities) to ApplicationTable (It is safe for yarn user to take app-custom 
metrics and write them as system entities, but not for AM users to write system 
entities).

But if we were to run TimelineCollector along with the AM as the AM user (not 
sure how far away we are, but that was indicated as the ultimate mode in 
YARN-2928), they should not be trusted any more. An implication of that is we 
cannot trust MEMORY and CPU sent from TimelineCollector either.

I am not sure what is the best strategy is to address the intrinsic conflict 
between the fact that aggregated app-level MEMORY, CPU usage should be 
generated by YARN and the possibility of TimelineCollector not running as yarn.

> [Security] users should not be able to create a generic TimelineEntity and 
> associate arbitrary type
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3401
>                 URL: https://issues.apache.org/jira/browse/YARN-3401
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Haibo Chen
>            Priority: Major
>              Labels: YARN-5355
>
> IIUC it is possible for users to create a generic TimelineEntity and set an 
> arbitrary entity type. For example, for a YARN app, the right entity API is 
> ApplicationEntity. However, today nothing stops users from instantiating a 
> base TimelineEntity class and set the application type on it. This presents a 
> problem in handling these YARN system entities in the storage layer for 
> example.
> We need to ensure that the API allows only the right type of the class to be 
> created for a given entity type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to