[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172816#comment-16172816
 ] 

Eric Yang commented on YARN-7215:
-

Slider list was not robust.  It take multiple seconds to respond to the query.  
If there are several hundred users using a UI to manage user's own application. 
 The retrieval of information should have low range of millisecond response 
time.  When application data are persisted in the same metastore with index and 
search capability, it will be easier to use the same storage mechanism to build 
application catalog.  Although it is easy to build a view of YARN deployed 
applications base on computing metadata stored on HDFS and ZooKeeper.  However, 
those services are not optimized for serving web application REST API.  Let's 
take one step further on reducing too many small file problem on HDFS and too 
big z-node on ZooKeeper in consideration of the design.  This will help to 
steer developers toward good design pattern.  I am open to suggestion to list 
yarn applications which can survive ResourceManager restart.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172740#comment-16172740
 ] 

Jian He commented on YARN-7215:
---

To clarify a bit more, the new YARN UI service tab already list the services, 
it does this by passing a yarn-service type filter to the RM.  I meant to 
implement similar thing in CLI to list services. Well, user can still get same 
result by passing a "yarn-service" filter using "yarn application -list" 
command today. I think make it more explicit with a "yarn service list" command 
would be more convenient to the user.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172724#comment-16172724
 ] 

Jian He commented on YARN-7215:
---

I didn't mean to make RM store the app configs in fact, RM has even no way 
to get the app configs, yarn-service is just an app to RM's point of view. RM 
can only store the metaData of YARN.

Isn't this jira, by description,  to implement "slider list" ? that is as 
simple as get the list of apps with some meta status info, which "yarn 
application -list" command already does today.  I don't  think we need a solr 
backend to support such simple use-case. That is also how "slider list" worked 
before... User should be able to simply list services without solr in the 
picture, just similar to listing apps.

I guess you meant bigger things in YARN-7129 to index apps by configs with solr 
or something? 
If this jira is meant to implement bigger things in YARN-7129,  I can probably 
open a separate jira to implement "yarn service list" command, which is a 
fairly simple patch


> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172617#comment-16172617
 ] 

Eric Yang commented on YARN-7215:
-

Data stored in ZooKeeper can not exceed 1MB per node.  It is possible for large 
scale application to exceed that limit when the hostnames and config key/value 
pairs are stored in the state or spec file.  Application state maybe fine, but 
I can't recommend to use ZooKeeper as low latency storage for application 
configuration.  Ambari version 0.0 (HMS) had implemented similar use case, and 
it quickly hits z-node size limitation.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172482#comment-16172482
 ] 

Jian He commented on YARN-7215:
---

bq. How does RM handle a service that is in stopped state?
Actually, RM today already remembers the stopped apps in ZooKeeper, it also has 
its own way to lookup the applications.  I'm not suggesting making RM do any 
more reads/writes.
What is the scope of this jira ? By the description,  it looks only to support 
the old slider list, the slider was also looking up  from  RM, it wasn't 
reading from HDFS.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172434#comment-16172434
 ] 

Eric Yang commented on YARN-7215:
-

[~jianhe] How does RM handle a service that is in stopped state?  Stopped 
slider application does not have any record in resource manager.  Same slider 
application can have multiple Application ID when the application has been 
restarted.  Slider uses HDFS file to persist the paused application, but having 
resource manager to crawl through lists of HDFS directories to find stopped 
service seems like potential load attack to namenode.  It would be better to 
have the operational record index, and cached by well known mechanism like a 
SOLR collection.  This also reduces having to brew another random read/write, 
low latency, index, cache mechanism in YARN.  Both HBase and SOLR have solved 
random read/write on top of HDFS with some success.  It would be better to we 
use existing libraries that have been baked for several years than inventing 
something new for specialized purpose.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7215) REST API to list all deployed services by the same user

2017-09-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172369#comment-16172369
 ] 

Jian He commented on YARN-7215:
---

Another approach is, we can simply get the list of services from RM by a type 
filter set to "yarn-service", in fact, I was trying to implement that but then 
ran into a bug YARN-7076.

> REST API to list all deployed services by the same user
> ---
>
> Key: YARN-7215
> URL: https://issues.apache.org/jira/browse/YARN-7215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, it is possible to list deployed applications from the same user by 
> using:
> {code}
> slider list
> {code}
> This API can help UI to display application and services deployed by the same 
> user.
> Apiserver does not have ability to list all applications/services at this 
> time.  This API requires fast response to list all applications because it is 
> a common UI operation.  ApiServer deployed applications persist configuration 
> in HDFS similar to slider, but using directory listing to display deployed 
> application might cost too much overhead to namenode.  We may want to use 
> alternative storage mechanism to cache deployed application configuration to 
> accelerate the response time of list deployed applications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org