[ 
https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7399:
----------------------------
    Description: 
In Slider, metadata is stored in user's home directory. Slider command line 
interface interacts with HDFS directly to list deployed applications and invoke 
YARN API or HDFS API to provide information to user. This design works for a 
single user manage his/her own applications. When this design has been ported 
to Yarn services, it becomes apparent that this design is difficult to list all 
deployed applications on Hadoop cluster for administrator to manage 
applications. Resource Manager needs to crawl through every user's home 
directory to compile metadata about deployed applications. This can trigger 
high load on namenode to list hundreds or thousands of list directory calls 
owned by different users. Hence, it might be best to centralize the metadata 
storage to Solr or HBase to reduce number of IO calls to namenode for manage 
applications.

In Slider, one application is composed of metainfo, specifications in json, and 
payload of zip file that contains application code and deployment code. Both 
meta information, and zip file payload are stored in the same application 
directory in HDFS. This works well for distributed applications without central 
application manager that oversee all application.
In the next generation of application management, we like to centralize 
metainfo and specifications in json to a centralized storage managed by YARN 
user, and keep the payload zip file in user's home directory or in docker 
registry. This arrangement can provide a faster lookup for metainfo when we 
list all deployed applications and services on YARN dashboard.

When we centralize metainfo to YARN user, we also need to build ACL to enforce 
who can manage applications, and make update. The current proposal is:
yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all 
applications
normal users - submit/reconfigure/pause/kill his/her own applications

  was:In Slider, metadata is stored in user's home directory. Slider command 
line interface interacts with HDFS directly to list deployed applications and 
invoke YARN API or HDFS API to provide information to user. This design works 
for a single user manage his/her own applications. When this design has been 
ported to Yarn services, it becomes apparent that this design is difficult to 
list all deployed applications on Hadoop cluster for administrator to manage 
applications. Resource Manager needs to crawl through every user's home 
directory to compile metadata about deployed applications. This can trigger 
high load on namenode to list hundreds or thousands of list directory calls 
owned by different users. Hence, it might be best to centralize the metadata 
storage to Solr or HBase to reduce number of IO calls to namenode for manage 
applications.


> Yarn services metadata storage improvement
> ------------------------------------------
>
>                 Key: YARN-7399
>                 URL: https://issues.apache.org/jira/browse/YARN-7399
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn-native-services
>            Reporter: Eric Yang
>
> In Slider, metadata is stored in user's home directory. Slider command line 
> interface interacts with HDFS directly to list deployed applications and 
> invoke YARN API or HDFS API to provide information to user. This design works 
> for a single user manage his/her own applications. When this design has been 
> ported to Yarn services, it becomes apparent that this design is difficult to 
> list all deployed applications on Hadoop cluster for administrator to manage 
> applications. Resource Manager needs to crawl through every user's home 
> directory to compile metadata about deployed applications. This can trigger 
> high load on namenode to list hundreds or thousands of list directory calls 
> owned by different users. Hence, it might be best to centralize the metadata 
> storage to Solr or HBase to reduce number of IO calls to namenode for manage 
> applications.
> In Slider, one application is composed of metainfo, specifications in json, 
> and payload of zip file that contains application code and deployment code. 
> Both meta information, and zip file payload are stored in the same 
> application directory in HDFS. This works well for distributed applications 
> without central application manager that oversee all application.
> In the next generation of application management, we like to centralize 
> metainfo and specifications in json to a centralized storage managed by YARN 
> user, and keep the payload zip file in user's home directory or in docker 
> registry. This arrangement can provide a faster lookup for metainfo when we 
> list all deployed applications and services on YARN dashboard.
> When we centralize metainfo to YARN user, we also need to build ACL to 
> enforce who can manage applications, and make update. The current proposal is:
> yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all 
> applications
> normal users - submit/reconfigure/pause/kill his/her own applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to