from:"Li Lu \(Jira\)"

[jira] [Commented] (YARN-10556) Web-app server does not work for Timeline V2

2020-12-30 Thread Li Lu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256674#comment-17256674
 ] 

Li Lu commented on YARN-10556:
--

It has been quite a while and I barely remember my fix was for binding 
conflicts on Yarn WebApps. We used HttpServer2 instead of Yarn WebApp to host 
the web server. After all these years the codebase may have changed quite a 
lot. 

In YARN-3087 the problem is on the conflict between NM and per-node timeline 
collector. Checking the exception here it looks like it's from timeline reader 
server? I remember it's a standalone process and a conflict is less likely (I 
remember the root cause is a static variable). Maybe worth the effort to look 
into the reader server for more info. cc [~varun_saxena]

> Web-app server does not work for Timeline V2
> 
>
> Key: YARN-10556
> URL: https://issues.apache.org/jira/browse/YARN-10556
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Ahmed Hussein
>Priority: Major
>
> {{TestDistributedShell}} for timeline version 2.0 shows the following errors 
> in the log files, with the below exception.
> There is a previous YARN-3087 that added a fix to the same issue before. 
> There is a need to investigate whether it is a testing issue or it the error 
> has resurfaced. 
> {code:bash}
> org.apache.hadoop.yarn.webapp.WebAppException: 
> /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: 
> controller for v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>   at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>   at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>   at 
> com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>   at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>   at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>   at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110)
>   at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702)
>   at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at org.apache.hadoop.http.NoCac

[jira] [Commented] (YARN-7075) [YARN-3368] Improvement of Web UI

2017-08-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147892#comment-16147892
 ] 

Li Lu commented on YARN-7075:
-

Maybe it worth the effort to make the donuts slightly thicker? If there are a 
lot of small pieces within one donut, the current thickness looks not enough? 

> [YARN-3368] Improvement of Web UI 
> --
>
> Key: YARN-7075
> URL: https://issues.apache.org/jira/browse/YARN-7075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Da Ding
>Assignee: Da Ding
> Attachments: Screen Shot 2017-08-22 at 8.36.07 PM.png, Screen Shot 
> 2017-08-29 at 4.36.45 PM.png, yarn-7075.001.patch
>
>
> 1. Adjusted donut chart size to be slimmer
> 2. Modified chart container style to have modern feel.
> 3. Other changes like background and font.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7109) Extend aggregation operation for new ATS design

2017-08-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144140#comment-16144140
 ] 

Li Lu commented on YARN-7109:
-

BTW [~Zian Chen] you may want to find out some documentations here:
http://hadoop.apache.org/docs/r3.0.0-alpha3/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html

> Extend aggregation operation for new ATS design
> ---
>
> Key: YARN-7109
> URL: https://issues.apache.org/jira/browse/YARN-7109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zian Chen
>Assignee: Zian Chen
>  Labels: patch
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7109) Extend aggregation operation for new ATS design

2017-08-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144138#comment-16144138
 ] 

Li Lu commented on YARN-7109:
-

Thanks for the proposal [~Zian Chen]! I've already added you to the contributor 
list and assigned the ticket to you. Please feel free to work on it. 

> Extend aggregation operation for new ATS design
> ---
>
> Key: YARN-7109
> URL: https://issues.apache.org/jira/browse/YARN-7109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zian Chen
>Assignee: Zian Chen
>  Labels: patch
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7109) Extend aggregation operation for new ATS design

2017-08-28 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-7109:
---

Assignee: Zian Chen

> Extend aggregation operation for new ATS design
> ---
>
> Key: YARN-7109
> URL: https://issues.apache.org/jira/browse/YARN-7109
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zian Chen
>Assignee: Zian Chen
>  Labels: patch
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-25 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6999:

Labels: newbie  (was: beginner)

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0-beta1, 2.9
>
> Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-25 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6999:

Fix Version/s: 2.9

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0-beta1, 2.9
>
> Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-24 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141133#comment-16141133
 ] 

Li Lu commented on YARN-6999:
-

Patch LGTM. The patch is trivial for unit tests. Findbugs warning appears to be 
irrelevant. I'll wait for ~24 hrs before commit. 

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: beginner
> Fix For: 3.0.0-beta1
>
> Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-24 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140353#comment-16140353
 ] 

Li Lu commented on YARN-6999:
-

This looks much better, thanks for the work [~littlestone00]! Could you please 
rename the patch to .patch so that we can rerun Jenkins again? Also, the 
concerns raised by checkstyle appears to be valid, could you please fix that as 
well? The warning from findbugs appears to be irrelevant, so let's focus on 
checkstyle and whitespaces first. 

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: beginner
> Fix For: 3.0.0-beta1
>
> Attachments: yarn-6999.patch, yarn-6999.patch.002
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-17 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131720#comment-16131720
 ] 

Li Lu commented on YARN-6999:
-

I kicked Jenkins for a precommit build. Not sure why this was missed. 

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: beginner
> Fix For: 3.0.0-beta1
>
> Attachments: yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-17 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131718#comment-16131718
 ] 

Li Lu commented on YARN-6999:
-

Thanks for the work [~littlestone00], this appears to be a real usability issue 
for a lot of new Hadoop developers. Since you have already uploaded a patch, 
I'm assigning this JIRA to you. The general direction of the fix looks fine. 
Adding log message clearly acknowledge users potential root cause sounds quite 
helpful. 

One potential issue is that the fix appears to be in node manager's code, but 
there is logic specifically for MapReduce. Maybe we can make this error message 
less hard-coded? (I'm still thinking about possible ways to improve this but so 
far I've got no trivial answer...)

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: beginner
> Fix For: 3.0.0-beta1
>
> Attachments: yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-17 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-6999:
---

Assignee: Linlin Zhou

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: beginner
> Fix For: 3.0.0-beta1
>
> Attachments: yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033890#comment-16033890
 ] 

Li Lu commented on YARN-5094:
-

Sorry about the delay. Please feel free to take it. Let's not touch 
AbstractEvent as a whole but treat different events separately? Also, even for 
NM related events we should be careful about the actual performance. I barely 
remember that my last conclusion (a year ago) was it's fine (to assign a 
timestamp for NM events), but let's be careful. 

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094.00.patch, YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6323) Rolling upgrade/config change is broken on timeline v2.

2017-03-10 Thread Li Lu (JIRA)

Li Lu created YARN-6323:
---

 Summary: Rolling upgrade/config change is broken on timeline v2. 
 Key: YARN-6323
 URL: https://issues.apache.org/jira/browse/YARN-6323
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu


Found this issue when deploying on real clusters. If there are apps running 
when we enable timeline v2 (with work preserving restart enabled), node 
managers will fail to start due to missing app context data. We should probably 
assign some default names to these "left over" apps. I believe it's suboptimal 
to let users clean up the whole cluster before enabling timeline v2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6316) Provide help information and documentation for TimelineSchemaCreator

2017-03-09 Thread Li Lu (JIRA)

Li Lu created YARN-6316:
---

 Summary: Provide help information and documentation for 
TimelineSchemaCreator
 Key: YARN-6316
 URL: https://issues.apache.org/jira/browse/YARN-6316
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu


Right now there is no help information for timeline schema creator. We may 
probably want to provide an option to print help. Also, ideally, if users 
passed in no argument, we may want to print out help, instead of directly 
create the tables. This will simplify cluster operations and timeline v2 
deployments. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6294) ATS client should better handle Socket closed case

2017-03-07 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6294:

Attachment: YARN-6294-trunk.001.patch
YARN-6294-branch-2.001.patch

Since trunk and branch-2 diverge on TimelineClientImpl, I've created two 
patches. We may probably want to focus our review effort on the trunk one, and 
then before commit we can finalize all changes and apply to branch-2. 

> ATS client should better handle Socket closed case
> --
>
> Key: YARN-6294
> URL: https://issues.apache.org/jira/browse/YARN-6294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineclient
>Reporter: Sumana Sathish
>Assignee: Li Lu
> Attachments: YARN-6294-branch-2.001.patch, YARN-6294-trunk.001.patch
>
>
> Exception stack:
> {noformat}
> 17/02/06 07:11:30 INFO distributedshell.ApplicationMaster: Container 
> completed successfully., containerId=container_1486362713048_0037_01_02
> 17/02/06 07:11:30 ERROR distributedshell.ApplicationMaster: Error in 
> RMCallbackHandler: 
> com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: 
> Socket closed
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:248)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:346)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1145)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:296)
> Caused by: java.net.SocketException: Socket closed
>   at java.net.SocketInputStream.read(SocketInputStream.java:204)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
>   ... 20 more
> Exception in thread "AMRM Callback Handler Thread" 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issue

[jira] [Commented] (YARN-6293) Investigate Java 7 compatibility for new YARN UI

2017-03-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898221#comment-15898221
 ] 

Li Lu commented on YARN-6293:
-

Actually I just directly changed the ui module's pom.xml. I changed the parent 
and the current module's version to 2.x. Right now the build passed. UI 
experts, does this hide any potential issues? Thanks! 

> Investigate Java 7 compatibility for new YARN UI
> 
>
> Key: YARN-6293
> URL: https://issues.apache.org/jira/browse/YARN-6293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>
> Right now when trying the YARN new UI with Java 7, I can get the following 
> warning:
> {code}
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (dist-enforce) @ 
> hadoop-yarn-ui ---
> [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed 
> with message:
> Detected JDK Version: 1.7.0-67 is not in the allowed range [1.8,).
> {code}
> While right now this warning does not cause any troubles for trunk 
> integration, when some users would like to package the new UI with some 
> branch-2 based code, the JDK requirement would block the effort. So the 
> question here is, is there any specific component in new UI codebase that 
> prevent us using Java 7? I remember it should be a JS based implementation, 
> right? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6293) Investigate Java 7 compatibility for new YARN UI

2017-03-06 Thread Li Lu (JIRA)

Li Lu created YARN-6293:
---

 Summary: Investigate Java 7 compatibility for new YARN UI
 Key: YARN-6293
 URL: https://issues.apache.org/jira/browse/YARN-6293
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu


Right now when trying the YARN new UI with Java 7, I can get the following 
warning:
{code}
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (dist-enforce) @ hadoop-yarn-ui 
---
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed 
with message:
Detected JDK Version: 1.7.0-67 is not in the allowed range [1.8,).
{code}

While right now this warning does not cause any troubles for trunk integration, 
when some users would like to package the new UI with some branch-2 based code, 
the JDK requirement would block the effort. So the question here is, is there 
any specific component in new UI codebase that prevent us using Java 7? I 
remember it should be a JS based implementation, right? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl

2017-02-24 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883782#comment-15883782
 ] 

Li Lu commented on YARN-6030:
-

I think so. Please feel free to check and close. Thanks! 

> Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
> --
>
> Key: YARN-6030
> URL: https://issues.apache.org/jira/browse/YARN-6030
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Priority: Minor
>
> I just discovered that we're still using a boolean flag {{timelineServiceV2}} 
> after we introduced {{timelineServiceVersion}}. This sounds a little bit 
> error-pruning. After the discussion I think we should only use and trust 
> {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client 
> creation. Instead of creating a v2 client and set this flag, maybe we'd like 
> to do some sanity check and make sure the creation call is consistent with 
> the configuration? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.

2017-02-23 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6228:

Attachment: YARN-6228-trunk.002.patch

I cannot reproduce the failures locally. Try again... 

> EntityGroupFSTimelineStore should allow configurable cache stores. 
> ---
>
> Key: YARN-6228
> URL: https://issues.apache.org/jira/browse/YARN-6228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-6228-trunk.001.patch, YARN-6228-trunk.002.patch
>
>
> We should allow users to config which cache store to use for 
> EntityGroupFSTimelineStore. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.

2017-02-23 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6228:

Attachment: YARN-6228-trunk.001.patch

Patch to make cache stores configurable. 

> EntityGroupFSTimelineStore should allow configurable cache stores. 
> ---
>
> Key: YARN-6228
> URL: https://issues.apache.org/jira/browse/YARN-6228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-6228-trunk.001.patch
>
>
> We should allow users to config which cache store to use for 
> EntityGroupFSTimelineStore. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.

2017-02-23 Thread Li Lu (JIRA)

Li Lu created YARN-6228:
---

 Summary: EntityGroupFSTimelineStore should allow configurable 
cache stores. 
 Key: YARN-6228
 URL: https://issues.apache.org/jira/browse/YARN-6228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


We should allow users to config which cache store to use for 
EntityGroupFSTimelineStore. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6069) CORS support in timeline v2

2017-02-21 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876765#comment-15876765
 ] 

Li Lu commented on YARN-6069:
-

Sorry to chime in this late, but one general question about CORS itself. I'm 
not an expert in this area so my concern may sound silly. In ATS v1, the only 
server will serve as both reader and writer server, so my feeling is the CORS 
setting will affect both sides? In ATS v2, we're only applying this setting to 
the reader server, but not on collectors. Is this generally fine? Are writer 
APIs irrelevant in this case? Or, is this difference significant enough that we 
need to separate configs or specially note this? Thanks! 

> CORS support in timeline v2
> ---
>
> Key: YARN-6069
> URL: https://issues.apache.org/jira/browse/YARN-6069
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Sreenath Somarajapuram
>Assignee: Rohith Sharma K S
> Attachments: YARN-6069-YARN-5355.0001.patch, 
> YARN-6069-YARN-5355.0002.patch, YARN-6069-YARN-5355.0003.patch, 
> YARN-6069-YARN-5355.0004.patch
>
>
> By default the browser prevents accessing resources from multiple domains. In 
> most cases the UIs would be loaded form a domain different from that of  
> timeline server. Hence without CORS support, it would be difficult for the 
> UIs to load data from timeline v2.
> YARN-2277 must provide more info on the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-16 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870725#comment-15870725
 ] 

Li Lu commented on YARN-6177:
-

Committing... 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch, YARN-6177.06.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-16 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870541#comment-15870541
 ] 

Li Lu commented on YARN-6177:
-

LGTM. Will commit in a few hours if nobody objects. 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch, YARN-6177.06.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869304#comment-15869304
 ] 

Li Lu commented on YARN-6177:
-

bq. Set yarn.timeline-service.client.best-effort to true with this patch, so 
yarn client doesn't treat such failure as a fatal error.
This is actually my concern... My feeling is we may not want dealing Errors as 
a part of best effort. Not sure about this, cc/[~jlowe]...
Hi Jason, I saw you committed the original timelineBestEffort patch, so just a 
quick inquiry to see if you think handling this Error under best effort mode a 
good idea. Thanks! 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868817#comment-15868817
 ] 

Li Lu commented on YARN-6177:
-

OK I see. Is it possible to disable timeline service for those affected 
clients? 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5718) TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior

2017-02-15 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5718:

Hadoop Flags: Incompatible change

> TimelineClient (and other places in YARN) shouldn't over-write HDFS client 
> retry settings which could cause unexpected behavior
> ---
>
> Key: YARN-5718
> URL: https://issues.apache.org/jira/browse/YARN-5718
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineclient
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5718.patch, YARN-5718-v2.1.patch, YARN-5718-v2.patch
>
>
> In one HA cluster, after NN failed over, we noticed that job is getting 
> failed as TimelineClient failed to retry connection to proper NN. This is 
> because we are overwrite hdfs client settings that hard code retry policy to 
> be enabled that conflict NN failed-over case - hdfs client should fail fast 
> so can retry on another NN.
> We shouldn't assume any retry policy for hdfs client at all places in YARN. 
> This should keep consistent with HDFS settings that has different retry 
> polices in different deployment case. Thus, we should clean up these hard 
> code settings in YARN, include: FileSystemTimelineWriter, 
> FileSystemRMStateStore and FileSystemNodeLabelsStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6027) Support fromid(offset) filter for /flows API

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868746#comment-15868746
 ] 

Li Lu commented on YARN-6027:
-

Thanks [~rohithsharma]! Generally fine but one nit is that we're exposing a lot 
of immediate values in the parsing process of {{FlowActivityEntityReader}}. I 
understand managing those values after the split would be troublesome, but I 
think keep exposing them may cause some future issues. Any plans to have some 
centralized managements of those values? 

> Support fromid(offset) filter for /flows API
> 
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6027-YARN-5355.0001.patch, 
> YARN-6027-YARN-5355.0002.patch
>
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868682#comment-15868682
 ] 

Li Lu commented on YARN-6177:
-

Thanks [~cheersyang]. My concern is with these lines:
{code}
379 } catch (NoClassDefFoundError e) {
380   if (timelineServiceBestEffort) {
381 LOG.warn("Ignore a NoClassDefFoundError when attempting to get"
382 + " delegation token from the timeline server: " + 
e.getMessage());
383 return null;
384   }
385 
{code}

So if {{timelineServiceBestEffort}} is set to true, we'll leave a message and 
then proceed? I was think we may not need to treat 
{{timelineServiceBestEffort}} separately here since even with best effort we do 
not need to keep running on errors. 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimelineClient and TimelineClientImpl into separate classes for ATSv1.x and ATSv2

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868593#comment-15868593
 ] 

Li Lu commented on YARN-4675:
-

V10 patch looks good to me. 

> Reorganize TimelineClient and TimelineClientImpl into separate classes for 
> ATSv1.x and ATSv2
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4675.v2.002.patch, YARN-4675.v2.003.patch, 
> YARN-4675.v2.004.patch, YARN-4675.v2.005.patch, YARN-4675.v2.006.patch, 
> YARN-4675.v2.007.patch, YARN-4675.v2.008.patch, YARN-4675.v2.009.patch, 
> YARN-4675.v2.010.patch, YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868451#comment-15868451
 ] 

Li Lu commented on YARN-6177:
-

Thanks for the hard work [~cheersyang]! Keep using 
{{YarnConfiguration.TIMELINE_SERVICE_CLIENT_BEST_EFFORT}} looks fine with me. 
However, I'm still a little bit hesitate on swallowing the error when 
{{timelineServiceBestEffort}} is set to true. To me handling errors (but not 
exceptions) is beyond the range of our "best effort". I would like to 
understand if there's anything I'm missing that makes the community think it is 
especially appealing to do so. 

Other than this, the patch LGTM. 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, 
> YARN-6177.04.patch, YARN-6177.05.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-13 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865034#comment-15865034
 ] 

Li Lu commented on YARN-6177:
-

[~cheersyang] Looks fine to me. Thanks! 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-13 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865015#comment-15865015
 ] 

Li Lu commented on YARN-6177:
-

bq. Use timeline best effort flag seems a better option for me than disabling 
it, are you suggesting we should still ask users to disable it?
Even with our "best effort", I don't think we should keep the program running 
on errors... Thoughts? 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client

2017-02-13 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864950#comment-15864950
 ] 

Li Lu commented on YARN-6177:
-

One quick inquiry: are we catching every throwable and swallow them if 
{{timelineServiceBestEffort}} is set to true? That sounds scary since we're 
swallowing OutOfMemoryError, etc... 

I think we should limit the range of {{timelineServiceBestEffort}} to 
exceptions, but we still preserve the program's behavior on errors. Meanwhile, 
we can improve the output message if we hit {{NoClassDefFoundError}} to hint 
users to disable timeline service? 

> Yarn client should exit with an informative error message if an incompatible 
> Jersey library is used at client
> -
>
> Key: YARN-6177
> URL: https://issues.apache.org/jira/browse/YARN-6177
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: spark2-job-output-after-besteffort.out, 
> spark2-job-output-after.out, spark2-job-output-before.out, 
> YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch
>
>
> Per discussion in YARN-5271, lets provide an error message to suggest user to 
> disable timeline service instead of disabling for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861715#comment-15861715
 ] 

Li Lu commented on YARN-5271:
-

Thanks [~jojochuang]! Let's not revert the change directly since the code base 
changed a lot since the commit. [~cheersyang] maybe you'd like to open a new 
JIRA and fix the issue there? Thanks! 

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid

2017-02-08 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858664#comment-15858664
 ] 

Li Lu commented on YARN-6027:
-

Thanks for the patch [~rohithsharma]! One big picture question: I'm still not 
100% sure the meaning of "collapse". Seems like the use case behind this is to 
list all flow activities for a certain user, or group flow activities by user? 
If this is the case, maybe we want some parameters like groupby=user or groupby 
= userflow for future improvements? 

> Improve /flows API for more flexible filters fromid, collapse, userid
> -
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6027-YARN-5355.0001.patch
>
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-08 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858544#comment-15858544
 ] 

Li Lu commented on YARN-6137:
-

Thanks [~jlowe] for the review and commit! 

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
> Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3
>
> Attachments: YARN-6137-trunk.001.patch, YARN-6137-trunk.002.patch
>
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-07 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6137:

Attachment: YARN-6137-trunk.002.patch

Thanks [~jlowe] for the review! A new patch to address all review comments. 

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
> Attachments: YARN-6137-trunk.001.patch, YARN-6137-trunk.002.patch
>
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856995#comment-15856995
 ] 

Li Lu commented on YARN-5271:
-

Thanks [~cheersyang]. 
bq.  The fix here was trying to alleviate this pain, it prints a warning on 
console and warns user timeline client could not be initialized because of 
dependency issue, more user friendly.
The goad sounds reasonable but I don't think that justifies the behavior to 
catch and swallow an Error. What we can do is to clearly document this behavior 
as a known issue, *suggest* uses to *try* disable timeline services when seeing 
this error, instead of directly assume the root cause of an error? 

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854576#comment-15854576
 ] 

Li Lu commented on YARN-5271:
-

Thanks for the work [~cheersyang]! This looks like a pretty unfortunate case 
for uses to use the YARN client. I noticed we're not creating timeline clients 
if timeline service is turned off in the config. One inquiry is, can we fail 
fast and let the user disable timeline service? Raising errors as early as 
possible may avoid much troubles in the future? 

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-03 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6137:

Attachment: YARN-6137-trunk.001.patch

First patch to fix this issue. Note that we always start a timeline client when 
we start a YarnClientImpl when timeline service is enabled. In ATS v1.5, 
timeline client will check HDFS access upon service start, this requires the 
yarn client user to be authenticated when it's started. In fact, users only 
need this client to renew timeline tokens under secured environment, and it's 
totally fine to firstly start the client user process, authenticate it, and 
then renew the delegation token.

So in this patch I'm delaying the start of the timeline client to the first 
time user needs a delegation token. For secured environments, this allows the 
parent process (running this client) to finish authentication after service 
start, then use the timeline client to renew tokens. One thing I'm not sure 
about is if yarn client itself should be thread safe. If this is the case I can 
add some synchronization for the time client initialization. 

Another change I made is to remove one unit test to check if YarnClient would 
catch an Error, and fails the test when we did not catch the Error. To me this 
does not appear to be a reasonable behavior. Since it blocks testing, I'm 
removing it. 

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
> Attachments: YARN-6137-trunk.001.patch
>
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-02 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reopened YARN-5271:
-

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-02 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850784#comment-15850784
 ] 

Li Lu commented on YARN-5271:
-

Quick note: are we catching an error here and disables timeline service based 
on this? Catching errors seems to be inadequate as per Java API doc:
bq. An Error is a subclass of Throwable that indicates serious problems that a 
reasonable application should not try to catch. Most such errors are abnormal 
conditions. 
(https://docs.oracle.com/javase/7/docs/api/java/lang/Error.html)

Reopen this JIRA for more investigation. 

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-01 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849020#comment-15849020
 ] 

Li Lu commented on YARN-6137:
-

This appears to be an ATS v1.5 only issue, but a bigger question is, why do we 
need a timeline client within YarnClientImpl? To me the only thing needed is to 
renew delegation token. If this reference is inevitable, can we avoid creating 
the client at service start of yarn client impl? We can lazily create the 
client only when we need to renew the token? 

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-01 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-6137:
---

Assignee: Li Lu

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container

2017-01-24 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2355:

Hadoop Flags: Incompatible change

> MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
> --
>
> Key: YARN-2355
> URL: https://issues.apache.org/jira/browse/YARN-2355
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Darrell Taylor
>  Labels: newbie
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-2355.001.patch
>
>
> After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether 
> it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be 
> able to notify the application of the up-to-date remaining retry quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2017-01-19 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830892#comment-15830892
 ] 

Li Lu commented on YARN-4675:
-

Thanks [~Naganarasimha]! I took a look at the 003 patch. Not sure why but I 
found some duplicated code in TimelineClientImpl with the newly introduced 
helper and v2 impl. Detailed comments:

AMRMClient (and AMRMClientAsync):
  - Maybe we'd like to be more clear about registerTimelineV2Client? This is 
something new and quite different to v1 clients. 
  - Do we allow registering timeline clients when the AMRMClient's timeline 
version is not set to 2? Maybe we should at least leave a warning/error there? 
This can save some time when debugging a misconfigured cluster. 
  
DistributedShell, ApplicationMaster.java:
  DS is an app that we demo how to use a lot of YARN features, so maybe we want 
to tidy up timeline client related code pieces a little bit...
  - Can we unify all {{if}}s for publishing timeline event? We may want to have 
centralized methods to dispatch timeline client calls to v1 and v2.
  - Also, instead of checking if a timeline client is null, shall we use flags 
like limelineServiceV2? 
  
TimelineClient:
  - Let's refer to TimelineV2Client in the Java doc for v2 use cases? 
  {code}
  Creates an instance of the timeline v.1.x client.
  {code}
  - We may also want to update the class's javadoc to reflect API changes over 
Hadoop 2. At least mention this class is for timeline v1.x ONLY. 
  
TimelineV2Client:
  - Same javadoc issue. 
  - Shall we close the constructor to protected since we've experienced some 
unexpected calls to it in v1? Or at least add a testing only tag? 
  - Does client users need to know the context app id? If so, we may need to 
slightly relax the visibility of getContextAppId? 
  - Why do we need a setter for context app id? Maybe we want to make this 
information immutable for timeline clients? Do we allow reusing timeline v2 
clients across multiple applications? 
  
TimelineClientImpl:
  - Why do we need RESOURCE_URI_STR_V2? We need to further polish 
constructResURI as well. 
  - serviceRetryInterval is never used. 
  - Duplicated code for TimelineClientConnectionRetry and JerseyRetryFilter as 
in TimelineServiceHelper. 
  - pollTimelineServiceAddress, initConnConfigurator never called? Duplicates 
with V2. 
  - new ConnectionConfigurator in initSslConnConfigurator duplicates some code 
in TimelineServiceHelper. 
  
TimelineServiceHelper:
  - There are two {{TimelineServiceHelper}}s in our codebase? One is really 
trivial. Shall we merge them or eliminate one of them? 
  
TimelineV2ClientImpl:
  - connectionRetry is never used. 

Not necessarily addressed in this JIRA, but to bring into attention: We have a 
TimelineClient in YarnClientImpl. Shall we do this even though the cluster is 
configured with ATS v2? 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4675.v2.002.patch, YARN-4675.v2.003.patch, 
> YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815912#comment-15815912
 ] 

Li Lu commented on YARN-6054:
-

Thanks [~raviprak]. The committed patch LGTM. Once the old file is backed up we 
don't need to worry if the repair process would make things worse. 

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6054.01.patch, YARN-6054.02.patch, 
> YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-10 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815599#comment-15815599
 ] 

Li Lu commented on YARN-6054:
-

Oops sorry [~Naganarasimha] I was trying to take a closer look at the updated 
patch, but never mind... Also, is the UT failure traced somewhere else? 

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6054.01.patch, YARN-6054.02.patch, 
> YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803022#comment-15803022
 ] 

Li Lu commented on YARN-6054:
-

Thanks [~raviprak], fail the second attempt sounds like a right choice. I'm not 
very familiar with the repair method for leveldb jni, but would just like to 
verify that even though a repair fails, the data corruption will not be in a 
worsened form. We would like to avoid the case where the data was recoverable 
by some approaches (other than repair) but becomes not recoverable after a 
repair. Is this possible? Thanks! 

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: YARN-6054.01.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802926#comment-15802926
 ] 

Li Lu commented on YARN-6054:
-

Thanks [~raviprak] for the patch! One quick concern is what will happen if the 
repair fails. IIUC we're repairing every time there are IOEs, will this cause 
any false alarms and/or accidentally make things worse? Thanks! 

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: YARN-6054.01.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782102#comment-15782102
 ] 

Li Lu commented on YARN-6029:
-

Thanks [~wangda]! 
bq. But it could cause inconsistency read data, for example, queue acl could be 
updated while it being updated.
Makes sense to me. Let's keep and fix the synchronized blocks then... 

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved

2016-12-27 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782007#comment-15782007
 ] 

Li Lu commented on YARN-6029:
-

I'm not a scheduler expert, but "not affecting any data structure" sounds like 
a wrong reason to not to synchronize. [~wangda] will there be any potential 
data races according to Java memory model[1]? If not we can safely remove those 
synchronize keywords. Otherwise we have to stick to it no matter how appealing 
it appears to be. 

[1]:  http://www.cs.umd.edu/~pugh/java/memoryModel/

> CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by 
> Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to 
> release a reserved container
> --
>
> Key: YARN-6029
> URL: https://issues.apache.org/jira/browse/YARN-6029
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Blocker
> Attachments: YARN-6029.001.patch, deadlock.jstack
>
>
> When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls 
> YarnClient#getQueueAclsInfo) just at the moment that 
> LeafQueue#assignContainers is called and before notifying parent queue to 
> release resource (should release a reserved container), then ResourceManager 
> can deadlock. I found this problem on our testing environment for hadoop2.8.
> Reproduce the deadlock in chronological order
> * 1. Thread A (ResourceManager Event Processor) calls synchronized 
> LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
> * 2. Thread B (IPC Server handler) calls synchronized 
> ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue 
> root), iterates over children queue acls and is blocked when calling 
> synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of 
> queue root.a is hold by Thread A)
> * 3. Thread A wants to inform the parent queue that a container is being 
> completed and is blocked when invoking synchronized 
> ParentQueue#internalReleaseResource method (the ParentQueue instance lock of 
> queue root is hold by Thread B)
> I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be 
> removed to solve this problem, since this method appears to not affect fields 
> of LeafQueue instance.
> Attach patch with UT for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl

2016-12-27 Thread Li Lu (JIRA)

Li Lu created YARN-6030:
---

 Summary: Eliminate timelineServiceV2 boolean flag in 
TimelineClientImpl
 Key: YARN-6030
 URL: https://issues.apache.org/jira/browse/YARN-6030
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-5355
Reporter: Li Lu
Priority: Minor


I just discovered that we're still using a boolean flag {{timelineServiceV2}} 
after we introduced {{timelineServiceVersion}}. This sounds a little bit 
error-pruning. After the discussion I think we should only use and trust 
{{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client creation. 
Instead of creating a v2 client and set this flag, maybe we'd like to do some 
sanity check and make sure the creation call is consistent with the 
configuration? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2016-12-22 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771143#comment-15771143
 ] 

Li Lu commented on YARN-5585:
-

I don't have a strong opinion on fromIdPrefix and fromId. Both ways make sense 
to me. 

bq. This JIRA is focusing only on general entities pagination. But should also 
implement pagination for other REST API's. 

Yes, we can open another JIRA for pagination for other APIs. Let's finish 
pagination for entity table here. 

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2016-12-20 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765533#comment-15765533
 ] 

Li Lu commented on YARN-5585:
-

Some of my comments:

TimelineUIDConverter
  - consistency with comments: let's make the numbers consistent in the 
comments. 
  - Shall we avoid using those constants? We can set an enum to represent each 
part of the tuple list.

EntityRowKeyPrefix
  - I'm confused by the changes in EntityRowKeyPrefix(String clusterId, String 
userId, String flowName, Long flowRunId, String appId, String entityType, Long 
entityIdPrefix, String entityId). Why are we changing this method, but do not 
overload a new one? Some changes to existing callsites seems irrelevant to the 
changes here. 
  - Inconsistent javadocs. We need to be very clear on what prefix are we 
generating, especially on the final qualifier.

TestRowKeys
  - Though there is no specific rule, let's not put specific author names in 
the test data? 

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2016-12-20 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765001#comment-15765001
 ] 

Li Lu commented on YARN-5585:
-

I'm fine with only supporting inputs with idPrefix for fromId. Once users can 
*query* entities/entity without a prefix it sounds fine to me. 

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2016-12-19 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762691#comment-15762691
 ] 

Li Lu commented on YARN-5585:
-

Thanks [~rohithsharma] for the update! With regards to the APIs, I think we can 
pretty much reuse the current set of APIs. IMO, we should not force a prefix 
all the time. Of course, if the user knows the exact entity prefix it's 
certainly beneficial to include it in the query (so that we can save a range 
scan and just use a get). When referring to timeline entity ids, how about the 
following patterns:
1. !: string 1 is the prefix and string 2 is the id
2.  or *\!: string is the entity id and the storage needs to 
query the entity prefix. If we have problems distinguishing  from the 
above case maybe we can use *\!

bq. If we plan to reuse same API's, then we need to handle one scenario where 
same entityId is published with 2 entityIdPrefix.
This sounds like a really messy situation. Semantically, we've got two ways to 
decide this: 1) we explicitly claim that entity prefix id is a part of the id 
system. This means two entities are different even if they only only differ in 
entity prefixes and 2)  we claim that entity prefix is _not_ a part of the id 
system. Under this assumption, it is up to the storage system to decide how to 
deal the case which prefixes are updated. Therefore the behavior when one 
entity is associated with two prefixes, from the API level, is undefined. As 
[~varun_saxena] suggested, the storage may throw exceptions or return errors 
when multiple prefixes are found for the same entity. 

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2016-12-19 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762438#comment-15762438
 ] 

Li Lu commented on YARN-4061:
-

Thanks [~jrottinghuis]! 
bq. For our usecase, that makes the puts idempotent. Other use-cases may not 
need this requirement, but they do need to deal with duplicate puts.
That makes sense to me. However do we think this is too specific (at least for 
now) for our use case in timeline v2? I can understand if there are concerns in 
the HBase community if we'd like to put this immediately into HBase codebase... 
Maybe what we can do is to expose buffered mutators from Hbase, and implement 
our own spooling buffered mutator in timeline code? 

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Joep Rottinghuis
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2016-12-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752682#comment-15752682
 ] 

Li Lu commented on YARN-4061:
-

I went through the new design doc in HBASE-17018 and I think it's mostly good. 
As we discussed in the weekly sync meeting, one thing we may want to sort out 
here is how to handle the case when the collectors started up and the HBase 
cluster was down. From my point of view, the most conservative approach is to 
assume the HBase cluster was always BAD upon start up. However, the problem is 
we have to spool the very first writes anyways. Can we have a "PROBING" state 
in the coordinator, where we may tolerate slightly longer submission time, to 
let the spooling mutator firstly probe the state of the HBase cluster? Also, 
this probing process may happen before the first write ever comes, so that we 
can do out-of-band probing? 

Another my question is on the idempotent write requirements. Moving my comments 
from google doc to here: 
bq. The spooling mutator itself guarantees an "at least once" semantic? One 
thing I'd like to discuss here is about the write timestamp of each timeline 
writes. I'm not familiar with the HBase code, but are we generating one unique 
timestamp for each write when we actually write them to HBase? If this is the 
case, replaying timeline writes may generate different timestamp and those 
repeated writes may not be idempotent in timeline's perspective?

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Joep Rottinghuis
>  Labels: YARN-5355
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5976) Update hbase version to 1.2

2016-12-14 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749184#comment-15749184
 ] 

Li Lu commented on YARN-5976:
-

Let's remove the Phoenix dependency and revert affected patches. We can work on 
the Phoenix stuffs later on. 

> Update hbase version to 1.2
> ---
>
> Key: YARN-5976
> URL: https://issues.apache.org/jira/browse/YARN-5976
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
> Fix For: YARN-5355
>
> Attachments: YARN-5976.001.wip.patch
>
>
> I believe phoenix now works with hbase 1.2. We should now upgrade timeline 
> service to use hbase 1.2 now. 
> And also update documentation in timelineservice to reflect that hbase mode 
> of all daemons in single jvm but writing to hdfs is supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals

2016-12-13 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746392#comment-15746392
 ] 

Li Lu commented on YARN-5647:
-

Sorry for the late reply folks, as I kept being stuck with some other issues 
recently. So after putting some thoughts and having discussions with some other 
community folks, I think for now it's fine to proceed with the current proposal 
on reusing delegation tokens in this JIRA. Some extra facts/questions:
1. The most common reaction people had, when I talked about our security/token 
design, was "why not directly reuse the mechanisms like NM token or block 
token". Those two tokens reflect the typical security mechanism. A central 
server (RM or NN) shares a secret with slave nodes, and issues tokens to 
requestors on behalf of the slave nodes. Tokens are passed to requestors by the 
central server and the central server will handle all renewals. 
2. Our current proposal is a distributed solution: each launched collectors 
will issue tokens by itself, the token information is passed to a central 
server (RM) and the RM will further distribute those tokens to the right party. 
3. I believe the fundamental difference between the two approaches are a) who 
issues the token and b) the channel through which we distribute the token. For 
a), if we have a working E2E POC for collectors to issue tokens, I'm fine with 
it. For b), seems like we're utilizing our collector discovery mechanism to 
distribute tokens. So we will change collector discovery once again? Are there 
any concerns with this? 

> [Security] Collector and reader side changes for loading auth filters and 
> principals
> 
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: oct16-hard
> Attachments: YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-08 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733773#comment-15733773
 ] 

Li Lu commented on YARN-5974:
-

The timed out UT looks weird since I could not find anything useful in the test 
report. I tried to reproduce it locally, but the test passed successfully in 
47s. Not sure what happened on Jenkins. 

> Remove direct reference to TimelineClientImpl
> -
>
> Key: YARN-5974
> URL: https://issues.apache.org/jira/browse/YARN-5974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: newbie++
> Attachments: YARN-5974-YARN-5355.001.patch
>
>
> [~sjlee0]'s quick audit shows that things that are referencing 
> TimelineClientImpl directly today:
> JobHistoryFileReplayMapperV1 (MR)
> SimpleEntityWriterV1 (MR)
> TestDistributedShell (DS)
> TestDSAppMaster (DS)
> TestNMTimelinePublisher (node manager)
> TestTimelineWebServicesWithSSL (AHS)
> This is not the right way to use TimelineClient and we should avoid direct 
> reference to TimelineClientImpl as much as possible. 
> Any newcomers to the community are more than welcome to take this. If this 
> remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-07 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5974:

Attachment: YARN-5974-YARN-5355.001.patch

Took a look at all direct references to TimelineClientImpl in our code base. 
There are actually 3 types of non-trivial references:
a) Directly creating TimelineClientImpl in code. This is wrong. 
b) Creating anonymous class with a super class of TimelineClientImpl in test. 
c) Checking test-visible fields of TimelineClientImpl in related unit tests. 

The current (small) patch fixes all type a) problems in our code base. I 
believe type c) references are mostly fine since the author clearly knows the 
implication of the explicit test-visible method calls. I haven't decided yet on 
all type b) references. On one hand they're fine, since people also know the 
implication of an anonymous class in test code. On the other hand they're a 
little bit messy: once we'd like to split TimelineClientImpl we have to 
duplicate the work. We can have some intermediate class like 
TimelineClientImplV1ForTest extends TimelineClientImpl, and put that in test 
only. However, I'm not sure if the benefit justifies the efforts. Thoughts? 


> Remove direct reference to TimelineClientImpl
> -
>
> Key: YARN-5974
> URL: https://issues.apache.org/jira/browse/YARN-5974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: newbie++
> Attachments: YARN-5974-YARN-5355.001.patch
>
>
> [~sjlee0]'s quick audit shows that things that are referencing 
> TimelineClientImpl directly today:
> JobHistoryFileReplayMapperV1 (MR)
> SimpleEntityWriterV1 (MR)
> TestDistributedShell (DS)
> TestDSAppMaster (DS)
> TestNMTimelinePublisher (node manager)
> TestTimelineWebServicesWithSSL (AHS)
> This is not the right way to use TimelineClient and we should avoid direct 
> reference to TimelineClientImpl as much as possible. 
> Any newcomers to the community are more than welcome to take this. If this 
> remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730203#comment-15730203
 ] 

Li Lu commented on YARN-4675:
-

For TimelineClientImpl, I'm totally fine to separate v1 and v2. I'm not 
worrying too much on code duplication for security related parts since they're 
yet to be finalized. For the rest part, I'm totally fine with separating them. 
Let me work on YARN-5974 to unblock this. 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730205#comment-15730205
 ] 

Li Lu commented on YARN-5974:
-

Time is up... So I'll take this work...

> Remove direct reference to TimelineClientImpl
> -
>
> Key: YARN-5974
> URL: https://issues.apache.org/jira/browse/YARN-5974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>  Labels: newbie++
>
> [~sjlee0]'s quick audit shows that things that are referencing 
> TimelineClientImpl directly today:
> JobHistoryFileReplayMapperV1 (MR)
> SimpleEntityWriterV1 (MR)
> TestDistributedShell (DS)
> TestDSAppMaster (DS)
> TestNMTimelinePublisher (node manager)
> TestTimelineWebServicesWithSSL (AHS)
> This is not the right way to use TimelineClient and we should avoid direct 
> reference to TimelineClientImpl as much as possible. 
> Any newcomers to the community are more than welcome to take this. If this 
> remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-07 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-5974:
---

Assignee: Li Lu

> Remove direct reference to TimelineClientImpl
> -
>
> Key: YARN-5974
> URL: https://issues.apache.org/jira/browse/YARN-5974
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: newbie++
>
> [~sjlee0]'s quick audit shows that things that are referencing 
> TimelineClientImpl directly today:
> JobHistoryFileReplayMapperV1 (MR)
> SimpleEntityWriterV1 (MR)
> TestDistributedShell (DS)
> TestDSAppMaster (DS)
> TestNMTimelinePublisher (node manager)
> TestTimelineWebServicesWithSSL (AHS)
> This is not the right way to use TimelineClient and we should avoid direct 
> reference to TimelineClientImpl as much as possible. 
> Any newcomers to the community are more than welcome to take this. If this 
> remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals

2016-12-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727044#comment-15727044
 ] 

Li Lu commented on YARN-5647:
-

Thanks [~varun_saxena]! Can we split the kerberos related work from the rest 
part of the patch and focus on kerberos here? I can see we reused some logic 
for TimelineDelegationToken which previously got questioned by [~jianhe]. Shall 
we put aside the possible changes on tokens and focus on making all timeline 
related server-side components kerberos authenticated? In this way YARN-5647 
can focus on kerberos login, YARN-5648 on authentication filters, and we can 
have another JIRA for the new timeline tokens (generation and distribution)? 

> [Security] Collector and reader side changes for loading auth filters and 
> principals
> 
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: oct16-hard
> Attachments: YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726783#comment-15726783
 ] 

Li Lu commented on YARN-4675:
-

Created YARN-5974 for removing unnecessary references to TimelineClientImpl. 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5974) Remove direct reference to TimelineClientImpl

2016-12-06 Thread Li Lu (JIRA)

Li Lu created YARN-5974:
---

 Summary: Remove direct reference to TimelineClientImpl
 Key: YARN-5974
 URL: https://issues.apache.org/jira/browse/YARN-5974
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-5355
Reporter: Li Lu


[~sjlee0]'s quick audit shows that things that are referencing 
TimelineClientImpl directly today:

JobHistoryFileReplayMapperV1 (MR)
SimpleEntityWriterV1 (MR)
TestDistributedShell (DS)
TestDSAppMaster (DS)
TestNMTimelinePublisher (node manager)
TestTimelineWebServicesWithSSL (AHS)

This is not the right way to use TimelineClient and we should avoid direct 
reference to TimelineClientImpl as much as possible. 

Any newcomers to the community are more than welcome to take this. If this 
remains unassigned for ~24hrs I'll jump in and do a quick fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-06 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726776#comment-15726776
 ] 

Li Lu commented on YARN-4675:
-

I'm fine with separating the v1 and v2 interfaces. Right now mixing v1 and v2 
interfaces in one interface looks pretty confusing to me. Since we've decided 
timeline v2 is not backward compatible at the very beginning, I think it's fine 
to let users choose between TimelineClient v1 and v2. 

bq. things that are referencing TimelineClientImpl directly today
Yes, we should not directly refer to TimelineClientImpl in downstream usages. 
Shall I open a JIRA and remove all of them? 

bq.  the facility for getting delegation token and renewing it would be common 
to both the clients. We would not want to repeat such large amounts of code in 
both V1 and V2 client implementations. 
That's certainly a very valid concern, and addressing this may bring in much 
discussions on security itself. My bottomline here is that let's *assume* any 
security facilities do not exist in timeline v2, and let's start the design 
from the scratch. We may then think about how to merge and reuse the code 
afterwards. For now, let's not think about maximize code reuse for timeline v1 
and v2, especially for security? 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5756) Add state-machine implementation for queues

2016-12-05 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723639#comment-15723639
 ] 

Li Lu commented on YARN-5756:
-

Thanks [~xgong]. Looks fine but I'm not extremely familiar with 
queue/schedulers. Maybe [~wangda] or [~jianhe] can take a look at it? 

> Add state-machine implementation for queues
> ---
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, 
> YARN-5756.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-12-05 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.007.patch

Thanks [~varun_saxena] for the comments. Addressed all of them. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch, YARN-5739-YARN-5355.007.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5756) Add state-machine implementation for queues

2016-12-01 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713505#comment-15713505
 ] 

Li Lu commented on YARN-5756:
-

Thanks [~xgong] for the patch! Generally fine, some comments:

QueueState.java
  - STOP_RUNNING state is a little bit confusing? How about RUNNING, CLOSED (or 
DRAINING), and STOPPED? 
  - Java doc inconsistencies: at the very beginning of the enum we said there 
are only two possible states? 

QueueStateManager.java
  - Consistency issues with stop and activate queue? We're using fine grained 
locking to change each queue's status. We need to make the process of stopping 
each queue and its subqueues atomic (as in concurrency, not in db). Otherwise, 
concurrent activate queue calls may result in inconsistent results. If coarse 
grained locking is fine with the current use case, we may want to make 
activateQueues and stopQueues synchronized? 
  - QueueStateManager only needs the queue mapping in SchedulerQueueManager, so 
we do not need to reference the whole SchedulerQueueManager here? I don't have 
a strong opinion here though...

> Add state-machine implementation for queues
> ---
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-12-01 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.006.patch

New 006 patch to make Jenkins happy. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-12-01 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: (was: YARN-5739-YARN-5355.006.patch)

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5756) Add state-machine implementation for queues

2016-12-01 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713290#comment-15713290
 ] 

Li Lu commented on YARN-5756:
-

Seems like the second submission has been ignored by Jenkins. Kick it for one 
more round of testing. 

> Add state-machine implementation for queues
> ---
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch, YARN-5756.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-12-01 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.006.patch

New patch to address review comments. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, 
> YARN-5739-YARN-5355.006.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2016-12-01 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712881#comment-15712881
 ] 

Li Lu commented on YARN-4675:
-

Yes I agree we need to decide on this issue soon. +1 for reorganizing 
TimelineClientImpl. Do we also need to distinguish v2 APIs from timeline 
clients as well? As of now we will have timeline APIs for v1, v1.5, and v2 so I 
think it may be helpful to distinguish at least v2 APIs. 

> Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
> 
>
> Key: YARN-4675
> URL: https://issues.apache.org/jira/browse/YARN-4675
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: YARN-5355, oct16-medium
> Attachments: YARN-4675-YARN-2928.v1.001.patch
>
>
> We need to reorganize TimeClientImpl into TimeClientV1Impl ,  
> TimeClientV2Impl and if required a base class, so that its clear which part 
> of the code belongs to which version and thus better maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710412#comment-15710412
 ] 

Li Lu commented on YARN-5739:
-

Also, the two augmentParams in GenericEntityReader and in 
ApplicationEntityReader seems quite similar. The only difference is we need to 
distinguish if a read is single entity read when we actually augment the 
params. Can we merge the two logic together? I can expose the base 
implementation on augmentParams but was wondering if we can further simplify 
the logic here to just let ApplicationEntityReader#augmentParams call super? 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710400#comment-15710400
 ] 

Li Lu commented on YARN-5739:
-

But I'd certainly appreciate if there are better names! 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710357#comment-15710357
 ] 

Li Lu commented on YARN-5739:
-

bq. I hate to nitpick on the name, but AbstractTimelineStorageReader sounds a 
little awkward to me. Can we stick to the entity reader names? How about 
AbstractTimelineEntityReader or BaseTimelineEntityReader? Thoughts?
Avoiding the term "Entity" is a deliberate choice here. Now EntityTypeReader 
will not return any entity types so I'm avoiding using the term Entity in the 
base class. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710108#comment-15710108
 ] 

Li Lu commented on YARN-5933:
-

bq. AppLogs#parseSummaryLogs() can skip subsequent getAppState for Unknown apps 
and move them to complete after unknownActiveSecs
This will abandon the ability for the ATS to quickly "recover" an application's 
state from unknown to known? Once an application's status becomes unknown, the 
timeline server will no longer check the application's status. Therefore, it is 
not possible to change the app's status back to known. For example, if the ATS 
server got isolated from the rest of the cluster temporarily, it will stop 
checking any app's status even though the isolation is only a short while. To 
solve this we need to have a separate scanning thread for "lost" applications, 
scanning at a different pace. 

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which f

[jira] [Commented] (YARN-5756) Add state-machine implementation for queues

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710022#comment-15710022
 ] 

Li Lu commented on YARN-5756:
-

Hi [~xgong], I tried to apply the patch locally but there were several issues 
to apply to the latest trunk. One significant issue is 
SchedulerQueueContext.java is missing in trunk? Could you please rebase your 
patch? Thanks! 

> Add state-machine implementation for queues
> ---
>
> Key: YARN-5756
> URL: https://issues.apache.org/jira/browse/YARN-5756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5756.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5761) Separate QueueManager from Scheduler

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709840#comment-15709840
 ] 

Li Lu commented on YARN-5761:
-

Will commit this patch shortly. 

> Separate QueueManager from Scheduler
> 
>
> Key: YARN-5761
> URL: https://issues.apache.org/jira/browse/YARN-5761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: oct16-medium
> Attachments: YARN-5761.1.patch, YARN-5761.1.rebase.patch, 
> YARN-5761.2.patch, YARN-5761.3.patch, YARN-5761.4.patch, YARN-5761.5.patch, 
> YARN-5761.6.patch, YARN-5761.7.patch, YARN-5761.7.patch, YARN-5761.8.patch
>
>
> Currently, in scheduler code, we are doing queue manager and scheduling work. 
> We'd better separate the queue manager out of scheduler logic. In that case, 
> it would be much easier and safer to extend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709830#comment-15709830
 ] 

Li Lu commented on YARN-5739:
-

Any more comments folks? Thanks! 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2016-11-29 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707125#comment-15707125
 ] 

Li Lu commented on YARN-5933:
-

Thanks [~Prabhu Joseph] for the clarification. Now I got the point for the 
flooded exceptions. Checking through the code it seems like in 
ApplicationClientProtocolPBServiceImpl we're converting the app not found 
exception into a service exception. We can ignore app not found exception here, 
but this feels risky as well. There seems to be no real quick solution to this 
issue, but one mitigation is to reduce unknownActiveSecs set by 
yarn.timeline-service.entity-group-fs-store.unknown-active-seconds. This 
decides the "wait time" of timeline server before it declares a lost app to be 
done. The default value is one full day but for some use cases this can be 
reduced to hours. 

For long term maybe we need another interval to check applications in unknown 
states? 

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> Applicati

[jira] [Commented] (YARN-5761) Separate QueueManager from Scheduler

2016-11-29 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706732#comment-15706732
 ] 

Li Lu commented on YARN-5761:
-

+1 LGTM. Will wait for ~24 hrs before committing this. 

> Separate QueueManager from Scheduler
> 
>
> Key: YARN-5761
> URL: https://issues.apache.org/jira/browse/YARN-5761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: oct16-medium
> Attachments: YARN-5761.1.patch, YARN-5761.1.rebase.patch, 
> YARN-5761.2.patch, YARN-5761.3.patch, YARN-5761.4.patch, YARN-5761.5.patch, 
> YARN-5761.6.patch, YARN-5761.7.patch, YARN-5761.7.patch, YARN-5761.8.patch
>
>
> Currently, in scheduler code, we are doing queue manager and scheduling work. 
> We'd better separate the queue manager out of scheduler logic. In that case, 
> it would be much easier and safer to extend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-29 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706169#comment-15706169
 ] 

Li Lu commented on YARN-5739:
-

Kick Jenkins again for the new patch.

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-28 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.005.patch

Refactored EntityTypeReader and TimelineEntityReader. EntityTypeReader has been 
separated from EntityReaders after this refactoring. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703452#comment-15703452
 ] 

Li Lu commented on YARN-5739:
-

Sure. Let me try with some refactoring...

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2016-11-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703312#comment-15703312
 ] 

Li Lu commented on YARN-5933:
-

After putting some thoughts on this issue I have some hesitation to directly 
remove the active directory when we see an unknown application exception. The 
RM does not recognize the application ID does not mean the application is not 
running. It certainly does not mean there is no concurrent writer to this 
active directory, although in this reported case this is true. Therefore, 
simply removing the active directory may not work for the cases where some 
"hidden" applications are actually writing the directory although the RM does 
not recognize this app. 

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicati

[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2016-11-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703184#comment-15703184
 ] 

Li Lu commented on YARN-5933:
-

bq. ATS will keep on requesting and which floods RM.

[~Prabhu Joseph] by saying "flood" do you mean the ATS launched requests to RM 
in a frequency higher than expected? 

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 5
> 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 172.26.71.120:37699 Call#26 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1479897867169_0005' doesn't exist in RM.
>   at 
> org.apa

[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-22 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688032#comment-15688032
 ] 

Li Lu commented on YARN-5739:
-

Sure. Let's wait for more comments on this. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-22 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.004.patch

Address comments above. 

bq. Moreover, REST endpoint suggestion was both entity-types and entitytypes. I 
am fine with both as we do use hyphen in other REST endpoints in YARN. Let us 
go with majority opinion. 
Right now I'm following our practices in node label related web services in the 
RM. Please do let me know if the hyphens will cause any troubles. Thanks! 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, 
> YARN-5739-YARN-5355.004.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application

2016-11-21 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-5739:

Attachment: YARN-5739-YARN-5355.003.patch

Version 003 patch that addresses more review comments. Specifically:
1. Added a get next row key API shared with the patch in YARN-5585. 
2. Removed setCache call for scans according to a discussion with Enis in HBase 
community. Now we're just using setPageFilter(1) to limit scan size. Enis's 
suggestion is that this should be sufficient. 

> Provide timeline reader API to list available timeline entity types for one 
> application
> ---
>
> Key: YARN-5739
> URL: https://issues.apache.org/jira/browse/YARN-5739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-5739-YARN-5355.001.patch, 
> YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch
>
>
> Right now we only show a part of available timeline entity data in the new 
> YARN UI. However, some data (especially library specific data) are not 
> possible to be queried out by the web UI. It will be appealing for the UI to 
> provide an "entity browser" for each YARN application. Actually, simply 
> dumping out available timeline entities (with proper pagination, of course) 
> would be pretty helpful for UI users. 
> On timeline side, we're not far away from this goal. Right now I believe the 
> only thing missing is to list all available entity types within one 
> application. The challenge here is that we're not storing this data for each 
> application, but given this kind of call is relatively rare (compare to 
> writes and updates) we can perform some scanning during the read time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3053) [Security] Review and implement security in ATS v.2

2016-11-15 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668416#comment-15668416
 ] 

Li Lu commented on YARN-3053:
-

bq. Can we capture that aspect as a future work as part of implementing the 
timeline collector as a full user container?
Sure. For now let's make the current (aux service) based model work with 
security. We may do a slight extension to allow collectors in a separate 
process also work if it's a low hanging fruit. 

> [Security] Review and implement security in ATS v.2
> ---
>
> Key: YARN-3053
> URL: https://issues.apache.org/jira/browse/YARN-3053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>  Labels: YARN-5355
> Attachments: ATSv2Authentication(draft).pdf
>
>
> Per design in YARN-2928, we want to evaluate and review the system for 
> security, and ensure proper security in the system.
> This includes proper authentication, token management, access control, and 
> any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service

2016-11-14 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665393#comment-15665393
 ] 

Li Lu commented on YARN-5814:
-

Thanks [~BINGXUE QIU] for the doc! I have some quick questions:
1. According to the Design section, the writer may require tranquility and/or 
kafka as intermediate layers. I'm wondering if there are any issues with these 
dependencies? 
2. For the table design, right now in timeline v.2, container is not a 
top-level concept (although it is a top-level concept for YARN). Therefore I'm 
not sure if it is helpful to generalize the container table to an entity table, 
just as the HBase implementation? We may still put container level data into 
this table, but maybe it's possible to not to limit this table to container 
only? 

>  Add druid as storage backend in YARN Timeline Service
> --
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Bingxue Qiu
> Attachments: Add-Druid-in-YARN-Timeline-Service.pdf
>
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in 
> Alibaba Clusters with thousands of nodes. We need to collect and store 
> meta/events/metrics data, online analyze the utilization reports of various 
> dimensions and display the trends of allocation/usage resources for cluster 
> by joining and aggregating data. It helps us to manage and optimize the 
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of 
> HBase and have achieved sub-second OLAP performance in our production 
> environment for few months. 
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow 
> level by FlowRunCoprocessor and b) application level metrics aggregating by 
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for 
> flows/users/queues for reporting and analysis is planned but not yet 
> implemented. YARN Timeline Service chooses Apache HBase as the primary 
> storage backend. As we all know that HBase doesn't fit for OLAP.
>  For arbitrary exploration of data,such as online analyze the utilization 
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by 
> joining and aggregating data, Druid's custom column format enables ad-hoc 
> queries without pre-computation. The format also enables fast scans on 
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of 
> various dimensions, display the variation trends of allocation/usage 
> resources for cluster, and arbitrary exploration of data, we propose to add 
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service

2016-11-14 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665337#comment-15665337
 ] 

Li Lu commented on YARN-5814:
-

Linking this issue the the umbrella JIRA of timeline v.2. 

>  Add druid as storage backend in YARN Timeline Service
> --
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Bingxue Qiu
> Attachments: Add-Druid-in-YARN-Timeline-Service.pdf
>
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in 
> Alibaba Clusters with thousands of nodes. We need to collect and store 
> meta/events/metrics data, online analyze the utilization reports of various 
> dimensions and display the trends of allocation/usage resources for cluster 
> by joining and aggregating data. It helps us to manage and optimize the 
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of 
> HBase and have achieved sub-second OLAP performance in our production 
> environment for few months. 
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow 
> level by FlowRunCoprocessor and b) application level metrics aggregating by 
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for 
> flows/users/queues for reporting and analysis is planned but not yet 
> implemented. YARN Timeline Service chooses Apache HBase as the primary 
> storage backend. As we all know that HBase doesn't fit for OLAP.
>  For arbitrary exploration of data,such as online analyze the utilization 
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by 
> joining and aggregating data, Druid's custom column format enables ad-hoc 
> queries without pre-computation. The format also enables fast scans on 
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of 
> various dimensions, display the variation trends of allocation/usage 
> resources for cluster, and arbitrary exploration of data, we propose to add 
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1328 matches

Mail list logo