[jira] [Commented] (YARN-10556) Web-app server does not work for Timeline V2
[ https://issues.apache.org/jira/browse/YARN-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256674#comment-17256674 ] Li Lu commented on YARN-10556: -- It has been quite a while and I barely remember my fix was for binding conflicts on Yarn WebApps. We used HttpServer2 instead of Yarn WebApp to host the web server. After all these years the codebase may have changed quite a lot. In YARN-3087 the problem is on the conflict between NM and per-node timeline collector. Checking the exception here it looks like it's from timeline reader server? I remember it's a standalone process and a conflict is less likely (I remember the root cause is a static variable). Maybe worth the effort to look into the reader server for more info. cc [~varun_saxena] > Web-app server does not work for Timeline V2 > > > Key: YARN-10556 > URL: https://issues.apache.org/jira/browse/YARN-10556 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Ahmed Hussein >Priority: Major > > {{TestDistributedShell}} for timeline version 2.0 shows the following errors > in the log files, with the below exception. > There is a previous YARN-3087 that added a fix to the same issue before. > There is a need to investigate whether it is a testing issue or it the error > has resurfaced. > {code:bash} > org.apache.hadoop.yarn.webapp.WebAppException: > /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: > controller for v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at > com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at org.apache.hadoop.http.NoCac
[jira] [Commented] (YARN-7075) [YARN-3368] Improvement of Web UI
[ https://issues.apache.org/jira/browse/YARN-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147892#comment-16147892 ] Li Lu commented on YARN-7075: - Maybe it worth the effort to make the donuts slightly thicker? If there are a lot of small pieces within one donut, the current thickness looks not enough? > [YARN-3368] Improvement of Web UI > -- > > Key: YARN-7075 > URL: https://issues.apache.org/jira/browse/YARN-7075 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Da Ding >Assignee: Da Ding > Attachments: Screen Shot 2017-08-22 at 8.36.07 PM.png, Screen Shot > 2017-08-29 at 4.36.45 PM.png, yarn-7075.001.patch > > > 1. Adjusted donut chart size to be slimmer > 2. Modified chart container style to have modern feel. > 3. Other changes like background and font. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7109) Extend aggregation operation for new ATS design
[ https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144140#comment-16144140 ] Li Lu commented on YARN-7109: - BTW [~Zian Chen] you may want to find out some documentations here: http://hadoop.apache.org/docs/r3.0.0-alpha3/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html > Extend aggregation operation for new ATS design > --- > > Key: YARN-7109 > URL: https://issues.apache.org/jira/browse/YARN-7109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zian Chen >Assignee: Zian Chen > Labels: patch > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7109) Extend aggregation operation for new ATS design
[ https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144138#comment-16144138 ] Li Lu commented on YARN-7109: - Thanks for the proposal [~Zian Chen]! I've already added you to the contributor list and assigned the ticket to you. Please feel free to work on it. > Extend aggregation operation for new ATS design > --- > > Key: YARN-7109 > URL: https://issues.apache.org/jira/browse/YARN-7109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zian Chen >Assignee: Zian Chen > Labels: patch > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7109) Extend aggregation operation for new ATS design
[ https://issues.apache.org/jira/browse/YARN-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-7109: --- Assignee: Zian Chen > Extend aggregation operation for new ATS design > --- > > Key: YARN-7109 > URL: https://issues.apache.org/jira/browse/YARN-7109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zian Chen >Assignee: Zian Chen > Labels: patch > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6999: Labels: newbie (was: beginner) > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: newbie > Fix For: 3.0.0-beta1, 2.9 > > Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6999: Fix Version/s: 2.9 > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: newbie > Fix For: 3.0.0-beta1, 2.9 > > Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141133#comment-16141133 ] Li Lu commented on YARN-6999: - Patch LGTM. The patch is trivial for unit tests. Findbugs warning appears to be irrelevant. I'll wait for ~24 hrs before commit. > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: beginner > Fix For: 3.0.0-beta1 > > Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140353#comment-16140353 ] Li Lu commented on YARN-6999: - This looks much better, thanks for the work [~littlestone00]! Could you please rename the patch to .patch so that we can rerun Jenkins again? Also, the concerns raised by checkstyle appears to be valid, could you please fix that as well? The warning from findbugs appears to be irrelevant, so let's focus on checkstyle and whitespaces first. > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: beginner > Fix For: 3.0.0-beta1 > > Attachments: yarn-6999.patch, yarn-6999.patch.002 > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131720#comment-16131720 ] Li Lu commented on YARN-6999: - I kicked Jenkins for a precommit build. Not sure why this was missed. > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: beginner > Fix For: 3.0.0-beta1 > > Attachments: yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131718#comment-16131718 ] Li Lu commented on YARN-6999: - Thanks for the work [~littlestone00], this appears to be a real usability issue for a lot of new Hadoop developers. Since you have already uploaded a patch, I'm assigning this JIRA to you. The general direction of the fix looks fine. Adding log message clearly acknowledge users potential root cause sounds quite helpful. One potential issue is that the fix appears to be in node manager's code, but there is logic specifically for MapReduce. Maybe we can make this error message less hard-coded? (I'm still thinking about possible ways to improve this but so far I've got no trivial answer...) > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: beginner > Fix For: 3.0.0-beta1 > > Attachments: yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
[ https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-6999: --- Assignee: Linlin Zhou > Add log about how to solve Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > -- > > Key: YARN-6999 > URL: https://issues.apache.org/jira/browse/YARN-6999 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, security >Affects Versions: 3.0.0-beta1 > Environment: All operating systems. >Reporter: Linlin Zhou >Assignee: Linlin Zhou >Priority: Minor > Labels: beginner > Fix For: 3.0.0-beta1 > > Attachments: yarn-6999.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > According Setting up a Single Node Cluster > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html], > we would still failed to run the MapReduce job example. Due to a security > fix, yarn use user's environment variables to init, and user's environment > variable usually doesn't include MapReduce related settings. So we need to > add the related config in etc/hadoop/mapred-site.xml manually. Currently the > log only tells there is an Error: > Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to > solve it. I want to add the useful suggestion in log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1
[ https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033890#comment-16033890 ] Li Lu commented on YARN-5094: - Sorry about the delay. Please feel free to take it. Let's not touch AbstractEvent as a whole but treat different events separately? Also, even for NM related events we should be careful about the actual performance. I barely remember that my last conclusion (a year ago) was it's fine (to assign a timestamp for NM events), but let's be careful. > some YARN container events have timestamp of -1 > --- > > Key: YARN-5094 > URL: https://issues.apache.org/jira/browse/YARN-5094 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Haibo Chen > Labels: YARN-5355 > Attachments: YARN-5094.00.patch, YARN-5094-YARN-2928.001.patch > > > Some events in the YARN container entities have timestamp of -1. The > RM-generated container events have proper timestamps. It appears that it's > the NM-generated events that have -1: YARN_CONTAINER_CREATED, > YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, > YARN_NM_CONTAINER_LOCALIZATION_STARTED. > In the YARN container page, > {noformat} > { > id: "YARN_CONTAINER_CREATED", > timestamp: -1, > info: { } > }, > { > id: "YARN_CONTAINER_FINISHED", > timestamp: -1, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: -1, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: -1, > info: { } > } > {noformat} > I think the data itself is OK, but the values are not being populated in the > REST output? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6323) Rolling upgrade/config change is broken on timeline v2.
Li Lu created YARN-6323: --- Summary: Rolling upgrade/config change is broken on timeline v2. Key: YARN-6323 URL: https://issues.apache.org/jira/browse/YARN-6323 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Found this issue when deploying on real clusters. If there are apps running when we enable timeline v2 (with work preserving restart enabled), node managers will fail to start due to missing app context data. We should probably assign some default names to these "left over" apps. I believe it's suboptimal to let users clean up the whole cluster before enabling timeline v2. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6316) Provide help information and documentation for TimelineSchemaCreator
Li Lu created YARN-6316: --- Summary: Provide help information and documentation for TimelineSchemaCreator Key: YARN-6316 URL: https://issues.apache.org/jira/browse/YARN-6316 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Right now there is no help information for timeline schema creator. We may probably want to provide an option to print help. Also, ideally, if users passed in no argument, we may want to print out help, instead of directly create the tables. This will simplify cluster operations and timeline v2 deployments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6294) ATS client should better handle Socket closed case
[ https://issues.apache.org/jira/browse/YARN-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6294: Attachment: YARN-6294-trunk.001.patch YARN-6294-branch-2.001.patch Since trunk and branch-2 diverge on TimelineClientImpl, I've created two patches. We may probably want to focus our review effort on the trunk one, and then before commit we can finalize all changes and apply to branch-2. > ATS client should better handle Socket closed case > -- > > Key: YARN-6294 > URL: https://issues.apache.org/jira/browse/YARN-6294 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineclient >Reporter: Sumana Sathish >Assignee: Li Lu > Attachments: YARN-6294-branch-2.001.patch, YARN-6294-trunk.001.patch > > > Exception stack: > {noformat} > 17/02/06 07:11:30 INFO distributedshell.ApplicationMaster: Container > completed successfully., containerId=container_1486362713048_0037_01_02 > 17/02/06 07:11:30 ERROR distributedshell.ApplicationMaster: Error in > RMCallbackHandler: > com.sun.jersey.api.client.ClientHandlerException: java.net.SocketException: > Socket closed > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:248) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) > at > org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:346) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishContainerEndEvent(ApplicationMaster.java:1145) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.access$400(ApplicationMaster.java:169) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster$RMCallbackHandler.onContainersCompleted(ApplicationMaster.java:779) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:296) > Caused by: java.net.SocketException: Socket closed > at java.net.SocketInputStream.read(SocketInputStream.java:204) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240) > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) > ... 20 more > Exception in thread "AMRM Callback Handler Thread" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issue
[jira] [Commented] (YARN-6293) Investigate Java 7 compatibility for new YARN UI
[ https://issues.apache.org/jira/browse/YARN-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898221#comment-15898221 ] Li Lu commented on YARN-6293: - Actually I just directly changed the ui module's pom.xml. I changed the parent and the current module's version to 2.x. Right now the build passed. UI experts, does this hide any potential issues? Thanks! > Investigate Java 7 compatibility for new YARN UI > > > Key: YARN-6293 > URL: https://issues.apache.org/jira/browse/YARN-6293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu > > Right now when trying the YARN new UI with Java 7, I can get the following > warning: > {code} > [INFO] --- maven-enforcer-plugin:1.4.1:enforce (dist-enforce) @ > hadoop-yarn-ui --- > [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed > with message: > Detected JDK Version: 1.7.0-67 is not in the allowed range [1.8,). > {code} > While right now this warning does not cause any troubles for trunk > integration, when some users would like to package the new UI with some > branch-2 based code, the JDK requirement would block the effort. So the > question here is, is there any specific component in new UI codebase that > prevent us using Java 7? I remember it should be a JS based implementation, > right? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6293) Investigate Java 7 compatibility for new YARN UI
Li Lu created YARN-6293: --- Summary: Investigate Java 7 compatibility for new YARN UI Key: YARN-6293 URL: https://issues.apache.org/jira/browse/YARN-6293 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Right now when trying the YARN new UI with Java 7, I can get the following warning: {code} [INFO] --- maven-enforcer-plugin:1.4.1:enforce (dist-enforce) @ hadoop-yarn-ui --- [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message: Detected JDK Version: 1.7.0-67 is not in the allowed range [1.8,). {code} While right now this warning does not cause any troubles for trunk integration, when some users would like to package the new UI with some branch-2 based code, the JDK requirement would block the effort. So the question here is, is there any specific component in new UI codebase that prevent us using Java 7? I remember it should be a JS based implementation, right? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883782#comment-15883782 ] Li Lu commented on YARN-6030: - I think so. Please feel free to check and close. Thanks! > Eliminate timelineServiceV2 boolean flag in TimelineClientImpl > -- > > Key: YARN-6030 > URL: https://issues.apache.org/jira/browse/YARN-6030 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Li Lu >Priority: Minor > > I just discovered that we're still using a boolean flag {{timelineServiceV2}} > after we introduced {{timelineServiceVersion}}. This sounds a little bit > error-pruning. After the discussion I think we should only use and trust > {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client > creation. Instead of creating a v2 client and set this flag, maybe we'd like > to do some sanity check and make sure the creation call is consistent with > the configuration? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.
[ https://issues.apache.org/jira/browse/YARN-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6228: Attachment: YARN-6228-trunk.002.patch I cannot reproduce the failures locally. Try again... > EntityGroupFSTimelineStore should allow configurable cache stores. > --- > > Key: YARN-6228 > URL: https://issues.apache.org/jira/browse/YARN-6228 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-6228-trunk.001.patch, YARN-6228-trunk.002.patch > > > We should allow users to config which cache store to use for > EntityGroupFSTimelineStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.
[ https://issues.apache.org/jira/browse/YARN-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6228: Attachment: YARN-6228-trunk.001.patch Patch to make cache stores configurable. > EntityGroupFSTimelineStore should allow configurable cache stores. > --- > > Key: YARN-6228 > URL: https://issues.apache.org/jira/browse/YARN-6228 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-6228-trunk.001.patch > > > We should allow users to config which cache store to use for > EntityGroupFSTimelineStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.
Li Lu created YARN-6228: --- Summary: EntityGroupFSTimelineStore should allow configurable cache stores. Key: YARN-6228 URL: https://issues.apache.org/jira/browse/YARN-6228 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We should allow users to config which cache store to use for EntityGroupFSTimelineStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6069) CORS support in timeline v2
[ https://issues.apache.org/jira/browse/YARN-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876765#comment-15876765 ] Li Lu commented on YARN-6069: - Sorry to chime in this late, but one general question about CORS itself. I'm not an expert in this area so my concern may sound silly. In ATS v1, the only server will serve as both reader and writer server, so my feeling is the CORS setting will affect both sides? In ATS v2, we're only applying this setting to the reader server, but not on collectors. Is this generally fine? Are writer APIs irrelevant in this case? Or, is this difference significant enough that we need to separate configs or specially note this? Thanks! > CORS support in timeline v2 > --- > > Key: YARN-6069 > URL: https://issues.apache.org/jira/browse/YARN-6069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Sreenath Somarajapuram >Assignee: Rohith Sharma K S > Attachments: YARN-6069-YARN-5355.0001.patch, > YARN-6069-YARN-5355.0002.patch, YARN-6069-YARN-5355.0003.patch, > YARN-6069-YARN-5355.0004.patch > > > By default the browser prevents accessing resources from multiple domains. In > most cases the UIs would be loaded form a domain different from that of > timeline server. Hence without CORS support, it would be difficult for the > UIs to load data from timeline v2. > YARN-2277 must provide more info on the implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870725#comment-15870725 ] Li Lu commented on YARN-6177: - Committing... > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch, YARN-6177.06.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870541#comment-15870541 ] Li Lu commented on YARN-6177: - LGTM. Will commit in a few hours if nobody objects. > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch, YARN-6177.06.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869304#comment-15869304 ] Li Lu commented on YARN-6177: - bq. Set yarn.timeline-service.client.best-effort to true with this patch, so yarn client doesn't treat such failure as a fatal error. This is actually my concern... My feeling is we may not want dealing Errors as a part of best effort. Not sure about this, cc/[~jlowe]... Hi Jason, I saw you committed the original timelineBestEffort patch, so just a quick inquiry to see if you think handling this Error under best effort mode a good idea. Thanks! > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868817#comment-15868817 ] Li Lu commented on YARN-6177: - OK I see. Is it possible to disable timeline service for those affected clients? > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5718) TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior
[ https://issues.apache.org/jira/browse/YARN-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5718: Hadoop Flags: Incompatible change > TimelineClient (and other places in YARN) shouldn't over-write HDFS client > retry settings which could cause unexpected behavior > --- > > Key: YARN-5718 > URL: https://issues.apache.org/jira/browse/YARN-5718 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineclient >Reporter: Junping Du >Assignee: Junping Du > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5718.patch, YARN-5718-v2.1.patch, YARN-5718-v2.patch > > > In one HA cluster, after NN failed over, we noticed that job is getting > failed as TimelineClient failed to retry connection to proper NN. This is > because we are overwrite hdfs client settings that hard code retry policy to > be enabled that conflict NN failed-over case - hdfs client should fail fast > so can retry on another NN. > We shouldn't assume any retry policy for hdfs client at all places in YARN. > This should keep consistent with HDFS settings that has different retry > polices in different deployment case. Thus, we should clean up these hard > code settings in YARN, include: FileSystemTimelineWriter, > FileSystemRMStateStore and FileSystemNodeLabelsStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Support fromid(offset) filter for /flows API
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868746#comment-15868746 ] Li Lu commented on YARN-6027: - Thanks [~rohithsharma]! Generally fine but one nit is that we're exposing a lot of immediate values in the parsing process of {{FlowActivityEntityReader}}. I understand managing those values after the split would be troublesome, but I think keep exposing them may cause some future issues. Any plans to have some centralized managements of those values? > Support fromid(offset) filter for /flows API > > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6027-YARN-5355.0001.patch, > YARN-6027-YARN-5355.0002.patch > > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868682#comment-15868682 ] Li Lu commented on YARN-6177: - Thanks [~cheersyang]. My concern is with these lines: {code} 379 } catch (NoClassDefFoundError e) { 380 if (timelineServiceBestEffort) { 381 LOG.warn("Ignore a NoClassDefFoundError when attempting to get" 382 + " delegation token from the timeline server: " + e.getMessage()); 383 return null; 384 } 385 {code} So if {{timelineServiceBestEffort}} is set to true, we'll leave a message and then proceed? I was think we may not need to treat {{timelineServiceBestEffort}} separately here since even with best effort we do not need to keep running on errors. > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimelineClient and TimelineClientImpl into separate classes for ATSv1.x and ATSv2
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868593#comment-15868593 ] Li Lu commented on YARN-4675: - V10 patch looks good to me. > Reorganize TimelineClient and TimelineClientImpl into separate classes for > ATSv1.x and ATSv2 > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: YARN-4675.v2.002.patch, YARN-4675.v2.003.patch, > YARN-4675.v2.004.patch, YARN-4675.v2.005.patch, YARN-4675.v2.006.patch, > YARN-4675.v2.007.patch, YARN-4675.v2.008.patch, YARN-4675.v2.009.patch, > YARN-4675.v2.010.patch, YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868451#comment-15868451 ] Li Lu commented on YARN-6177: - Thanks for the hard work [~cheersyang]! Keep using {{YarnConfiguration.TIMELINE_SERVICE_CLIENT_BEST_EFFORT}} looks fine with me. However, I'm still a little bit hesitate on swallowing the error when {{timelineServiceBestEffort}} is set to true. To me handling errors (but not exceptions) is beyond the range of our "best effort". I would like to understand if there's anything I'm missing that makes the community think it is especially appealing to do so. Other than this, the patch LGTM. > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch, > YARN-6177.04.patch, YARN-6177.05.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865034#comment-15865034 ] Li Lu commented on YARN-6177: - [~cheersyang] Looks fine to me. Thanks! > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865015#comment-15865015 ] Li Lu commented on YARN-6177: - bq. Use timeline best effort flag seems a better option for me than disabling it, are you suggesting we should still ask users to disable it? Even with our "best effort", I don't think we should keep the program running on errors... Thoughts? > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6177) Yarn client should exit with an informative error message if an incompatible Jersey library is used at client
[ https://issues.apache.org/jira/browse/YARN-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864950#comment-15864950 ] Li Lu commented on YARN-6177: - One quick inquiry: are we catching every throwable and swallow them if {{timelineServiceBestEffort}} is set to true? That sounds scary since we're swallowing OutOfMemoryError, etc... I think we should limit the range of {{timelineServiceBestEffort}} to exceptions, but we still preserve the program's behavior on errors. Meanwhile, we can improve the output message if we hit {{NoClassDefFoundError}} to hint users to disable timeline service? > Yarn client should exit with an informative error message if an incompatible > Jersey library is used at client > - > > Key: YARN-6177 > URL: https://issues.apache.org/jira/browse/YARN-6177 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: spark2-job-output-after-besteffort.out, > spark2-job-output-after.out, spark2-job-output-before.out, > YARN-6177.01.patch, YARN-6177.02.patch, YARN-6177.03.patch > > > Per discussion in YARN-5271, lets provide an error message to suggest user to > disable timeline service instead of disabling for them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath
[ https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861715#comment-15861715 ] Li Lu commented on YARN-5271: - Thanks [~jojochuang]! Let's not revert the change directly since the code base changed a lot since the commit. [~cheersyang] maybe you'd like to open a new JIRA and fix the issue there? Thanks! > ATS client doesn't work with Jersey 2 on the classpath > -- > > Key: YARN-5271 > URL: https://issues.apache.org/jira/browse/YARN-5271 > Project: Hadoop YARN > Issue Type: Bug > Components: client, timelineserver >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Weiwei Yang > Labels: oct16-medium > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5271.01.patch, YARN-5271.02.patch, > YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch > > > see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a > timeline client, *even if the server is an ATS1.5 server and publishing is > via the FS* -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid
[ https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858664#comment-15858664 ] Li Lu commented on YARN-6027: - Thanks for the patch [~rohithsharma]! One big picture question: I'm still not 100% sure the meaning of "collapse". Seems like the use case behind this is to list all flow activities for a certain user, or group flow activities by user? If this is the case, maybe we want some parameters like groupby=user or groupby = userflow for future improvements? > Improve /flows API for more flexible filters fromid, collapse, userid > - > > Key: YARN-6027 > URL: https://issues.apache.org/jira/browse/YARN-6027 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6027-YARN-5355.0001.patch > > > In YARN-5585 , fromId is supported for retrieving entities. We need similar > filter for flows/flowRun apps and flow run and flow as well. > Along with supporting fromId, this JIRA should also discuss following points > * Should we throw an exception for entities/entity retrieval if duplicates > found? > * TimelieEntity : > ** Should equals method also check for idPrefix? > ** Does idPrefix is part of identifiers? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
[ https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858544#comment-15858544 ] Li Lu commented on YARN-6137: - Thanks [~jlowe] for the review and commit! > Yarn client implicitly invoke ATS client which accesses HDFS > > > Key: YARN-6137 > URL: https://issues.apache.org/jira/browse/YARN-6137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Li Lu > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6137-trunk.001.patch, YARN-6137-trunk.002.patch > > > Yarn is implicitly trying to invoke ATS Client even though client does not > need it. and ATSClient code is trying to access hdfs. Due to that service is > hitting GSS exception. > Yarnclient is implicitly creating ats client that tries to access Hdfs. > All servers that use yarnclient cannot be expected to change to accommodate > this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
[ https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6137: Attachment: YARN-6137-trunk.002.patch Thanks [~jlowe] for the review! A new patch to address all review comments. > Yarn client implicitly invoke ATS client which accesses HDFS > > > Key: YARN-6137 > URL: https://issues.apache.org/jira/browse/YARN-6137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Li Lu > Attachments: YARN-6137-trunk.001.patch, YARN-6137-trunk.002.patch > > > Yarn is implicitly trying to invoke ATS Client even though client does not > need it. and ATSClient code is trying to access hdfs. Due to that service is > hitting GSS exception. > Yarnclient is implicitly creating ats client that tries to access Hdfs. > All servers that use yarnclient cannot be expected to change to accommodate > this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath
[ https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856995#comment-15856995 ] Li Lu commented on YARN-5271: - Thanks [~cheersyang]. bq. The fix here was trying to alleviate this pain, it prints a warning on console and warns user timeline client could not be initialized because of dependency issue, more user friendly. The goad sounds reasonable but I don't think that justifies the behavior to catch and swallow an Error. What we can do is to clearly document this behavior as a known issue, *suggest* uses to *try* disable timeline services when seeing this error, instead of directly assume the root cause of an error? > ATS client doesn't work with Jersey 2 on the classpath > -- > > Key: YARN-5271 > URL: https://issues.apache.org/jira/browse/YARN-5271 > Project: Hadoop YARN > Issue Type: Bug > Components: client, timelineserver >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Weiwei Yang > Labels: oct16-medium > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5271.01.patch, YARN-5271.02.patch, > YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch > > > see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a > timeline client, *even if the server is an ATS1.5 server and publishing is > via the FS* -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath
[ https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854576#comment-15854576 ] Li Lu commented on YARN-5271: - Thanks for the work [~cheersyang]! This looks like a pretty unfortunate case for uses to use the YARN client. I noticed we're not creating timeline clients if timeline service is turned off in the config. One inquiry is, can we fail fast and let the user disable timeline service? Raising errors as early as possible may avoid much troubles in the future? > ATS client doesn't work with Jersey 2 on the classpath > -- > > Key: YARN-5271 > URL: https://issues.apache.org/jira/browse/YARN-5271 > Project: Hadoop YARN > Issue Type: Bug > Components: client, timelineserver >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Weiwei Yang > Labels: oct16-medium > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5271.01.patch, YARN-5271.02.patch, > YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch > > > see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a > timeline client, *even if the server is an ATS1.5 server and publishing is > via the FS* -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
[ https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-6137: Attachment: YARN-6137-trunk.001.patch First patch to fix this issue. Note that we always start a timeline client when we start a YarnClientImpl when timeline service is enabled. In ATS v1.5, timeline client will check HDFS access upon service start, this requires the yarn client user to be authenticated when it's started. In fact, users only need this client to renew timeline tokens under secured environment, and it's totally fine to firstly start the client user process, authenticate it, and then renew the delegation token. So in this patch I'm delaying the start of the timeline client to the first time user needs a delegation token. For secured environments, this allows the parent process (running this client) to finish authentication after service start, then use the timeline client to renew tokens. One thing I'm not sure about is if yarn client itself should be thread safe. If this is the case I can add some synchronization for the time client initialization. Another change I made is to remove one unit test to check if YarnClient would catch an Error, and fails the test when we did not catch the Error. To me this does not appear to be a reasonable behavior. Since it blocks testing, I'm removing it. > Yarn client implicitly invoke ATS client which accesses HDFS > > > Key: YARN-6137 > URL: https://issues.apache.org/jira/browse/YARN-6137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Li Lu > Attachments: YARN-6137-trunk.001.patch > > > Yarn is implicitly trying to invoke ATS Client even though client does not > need it. and ATSClient code is trying to access hdfs. Due to that service is > hitting GSS exception. > Yarnclient is implicitly creating ats client that tries to access Hdfs. > All servers that use yarnclient cannot be expected to change to accommodate > this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath
[ https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reopened YARN-5271: - > ATS client doesn't work with Jersey 2 on the classpath > -- > > Key: YARN-5271 > URL: https://issues.apache.org/jira/browse/YARN-5271 > Project: Hadoop YARN > Issue Type: Bug > Components: client, timelineserver >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Weiwei Yang > Labels: oct16-medium > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5271.01.patch, YARN-5271.02.patch, > YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch > > > see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a > timeline client, *even if the server is an ATS1.5 server and publishing is > via the FS* -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath
[ https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850784#comment-15850784 ] Li Lu commented on YARN-5271: - Quick note: are we catching an error here and disables timeline service based on this? Catching errors seems to be inadequate as per Java API doc: bq. An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch. Most such errors are abnormal conditions. (https://docs.oracle.com/javase/7/docs/api/java/lang/Error.html) Reopen this JIRA for more investigation. > ATS client doesn't work with Jersey 2 on the classpath > -- > > Key: YARN-5271 > URL: https://issues.apache.org/jira/browse/YARN-5271 > Project: Hadoop YARN > Issue Type: Bug > Components: client, timelineserver >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Weiwei Yang > Labels: oct16-medium > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5271.01.patch, YARN-5271.02.patch, > YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch > > > see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a > timeline client, *even if the server is an ATS1.5 server and publishing is > via the FS* -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
[ https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849020#comment-15849020 ] Li Lu commented on YARN-6137: - This appears to be an ATS v1.5 only issue, but a bigger question is, why do we need a timeline client within YarnClientImpl? To me the only thing needed is to renew delegation token. If this reference is inevitable, can we avoid creating the client at service start of yarn client impl? We can lazily create the client only when we need to renew the token? > Yarn client implicitly invoke ATS client which accesses HDFS > > > Key: YARN-6137 > URL: https://issues.apache.org/jira/browse/YARN-6137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Li Lu > > Yarn is implicitly trying to invoke ATS Client even though client does not > need it. and ATSClient code is trying to access hdfs. Due to that service is > hitting GSS exception. > Yarnclient is implicitly creating ats client that tries to access Hdfs. > All servers that use yarnclient cannot be expected to change to accommodate > this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
[ https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-6137: --- Assignee: Li Lu > Yarn client implicitly invoke ATS client which accesses HDFS > > > Key: YARN-6137 > URL: https://issues.apache.org/jira/browse/YARN-6137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Li Lu > > Yarn is implicitly trying to invoke ATS Client even though client does not > need it. and ATSClient code is trying to access hdfs. Due to that service is > hitting GSS exception. > Yarnclient is implicitly creating ats client that tries to access Hdfs. > All servers that use yarnclient cannot be expected to change to accommodate > this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2355: Hadoop Flags: Incompatible change > MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container > -- > > Key: YARN-2355 > URL: https://issues.apache.org/jira/browse/YARN-2355 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Darrell Taylor > Labels: newbie > Fix For: 3.0.0-alpha1 > > Attachments: YARN-2355.001.patch > > > After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether > it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be > able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830892#comment-15830892 ] Li Lu commented on YARN-4675: - Thanks [~Naganarasimha]! I took a look at the 003 patch. Not sure why but I found some duplicated code in TimelineClientImpl with the newly introduced helper and v2 impl. Detailed comments: AMRMClient (and AMRMClientAsync): - Maybe we'd like to be more clear about registerTimelineV2Client? This is something new and quite different to v1 clients. - Do we allow registering timeline clients when the AMRMClient's timeline version is not set to 2? Maybe we should at least leave a warning/error there? This can save some time when debugging a misconfigured cluster. DistributedShell, ApplicationMaster.java: DS is an app that we demo how to use a lot of YARN features, so maybe we want to tidy up timeline client related code pieces a little bit... - Can we unify all {{if}}s for publishing timeline event? We may want to have centralized methods to dispatch timeline client calls to v1 and v2. - Also, instead of checking if a timeline client is null, shall we use flags like limelineServiceV2? TimelineClient: - Let's refer to TimelineV2Client in the Java doc for v2 use cases? {code} Creates an instance of the timeline v.1.x client. {code} - We may also want to update the class's javadoc to reflect API changes over Hadoop 2. At least mention this class is for timeline v1.x ONLY. TimelineV2Client: - Same javadoc issue. - Shall we close the constructor to protected since we've experienced some unexpected calls to it in v1? Or at least add a testing only tag? - Does client users need to know the context app id? If so, we may need to slightly relax the visibility of getContextAppId? - Why do we need a setter for context app id? Maybe we want to make this information immutable for timeline clients? Do we allow reusing timeline v2 clients across multiple applications? TimelineClientImpl: - Why do we need RESOURCE_URI_STR_V2? We need to further polish constructResURI as well. - serviceRetryInterval is never used. - Duplicated code for TimelineClientConnectionRetry and JerseyRetryFilter as in TimelineServiceHelper. - pollTimelineServiceAddress, initConnConfigurator never called? Duplicates with V2. - new ConnectionConfigurator in initSslConnConfigurator duplicates some code in TimelineServiceHelper. TimelineServiceHelper: - There are two {{TimelineServiceHelper}}s in our codebase? One is really trivial. Shall we merge them or eliminate one of them? TimelineV2ClientImpl: - connectionRetry is never used. Not necessarily addressed in this JIRA, but to bring into attention: We have a TimelineClient in YarnClientImpl. Shall we do this even though the cluster is configured with ATS v2? > Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: YARN-4675.v2.002.patch, YARN-4675.v2.003.patch, > YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815912#comment-15815912 ] Li Lu commented on YARN-6054: - Thanks [~raviprak]. The committed patch LGTM. Once the old file is backed up we don't need to worry if the repair process would make things worse. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815599#comment-15815599 ] Li Lu commented on YARN-6054: - Oops sorry [~Naganarasimha] I was trying to take a closer look at the updated patch, but never mind... Also, is the UT failure traced somewhere else? > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803022#comment-15803022 ] Li Lu commented on YARN-6054: - Thanks [~raviprak], fail the second attempt sounds like a right choice. I'm not very familiar with the repair method for leveldb jni, but would just like to verify that even though a repair fails, the data corruption will not be in a worsened form. We would like to avoid the case where the data was recoverable by some approaches (other than repair) but becomes not recoverable after a repair. Is this possible? Thanks! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802926#comment-15802926 ] Li Lu commented on YARN-6054: - Thanks [~raviprak] for the patch! One quick concern is what will happen if the repair fails. IIUC we're repairing every time there are IOEs, will this cause any false alarms and/or accidentally make things worse? Thanks! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved
[ https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782102#comment-15782102 ] Li Lu commented on YARN-6029: - Thanks [~wangda]! bq. But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. Makes sense to me. Let's keep and fix the synchronized blocks then... > CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by > Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to > release a reserved container > -- > > Key: YARN-6029 > URL: https://issues.apache.org/jira/browse/YARN-6029 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Blocker > Attachments: YARN-6029.001.patch, deadlock.jstack > > > When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls > YarnClient#getQueueAclsInfo) just at the moment that > LeafQueue#assignContainers is called and before notifying parent queue to > release resource (should release a reserved container), then ResourceManager > can deadlock. I found this problem on our testing environment for hadoop2.8. > Reproduce the deadlock in chronological order > * 1. Thread A (ResourceManager Event Processor) calls synchronized > LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a) > * 2. Thread B (IPC Server handler) calls synchronized > ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue > root), iterates over children queue acls and is blocked when calling > synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of > queue root.a is hold by Thread A) > * 3. Thread A wants to inform the parent queue that a container is being > completed and is blocked when invoking synchronized > ParentQueue#internalReleaseResource method (the ParentQueue instance lock of > queue root is hold by Thread B) > I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be > removed to solve this problem, since this method appears to not affect fields > of LeafQueue instance. > Attach patch with UT for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6029) CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to release a reserved
[ https://issues.apache.org/jira/browse/YARN-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782007#comment-15782007 ] Li Lu commented on YARN-6029: - I'm not a scheduler expert, but "not affecting any data structure" sounds like a wrong reason to not to synchronize. [~wangda] will there be any potential data races according to Java memory model[1]? If not we can safely remove those synchronize keywords. Otherwise we have to stick to it no matter how appealing it appears to be. [1]: http://www.cs.umd.edu/~pugh/java/memoryModel/ > CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by > Thread_A at the moment that Thread_B calls LeafQueue#assignContainers to > release a reserved container > -- > > Key: YARN-6029 > URL: https://issues.apache.org/jira/browse/YARN-6029 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Blocker > Attachments: YARN-6029.001.patch, deadlock.jstack > > > When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls > YarnClient#getQueueAclsInfo) just at the moment that > LeafQueue#assignContainers is called and before notifying parent queue to > release resource (should release a reserved container), then ResourceManager > can deadlock. I found this problem on our testing environment for hadoop2.8. > Reproduce the deadlock in chronological order > * 1. Thread A (ResourceManager Event Processor) calls synchronized > LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a) > * 2. Thread B (IPC Server handler) calls synchronized > ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue > root), iterates over children queue acls and is blocked when calling > synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of > queue root.a is hold by Thread A) > * 3. Thread A wants to inform the parent queue that a container is being > completed and is blocked when invoking synchronized > ParentQueue#internalReleaseResource method (the ParentQueue instance lock of > queue root is hold by Thread B) > I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be > removed to solve this problem, since this method appears to not affect fields > of LeafQueue instance. > Attach patch with UT for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
Li Lu created YARN-6030: --- Summary: Eliminate timelineServiceV2 boolean flag in TimelineClientImpl Key: YARN-6030 URL: https://issues.apache.org/jira/browse/YARN-6030 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-5355 Reporter: Li Lu Priority: Minor I just discovered that we're still using a boolean flag {{timelineServiceV2}} after we introduced {{timelineServiceVersion}}. This sounds a little bit error-pruning. After the discussion I think we should only use and trust {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client creation. Instead of creating a v2 client and set this flag, maybe we'd like to do some sanity check and make sure the creation call is consistent with the configuration? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771143#comment-15771143 ] Li Lu commented on YARN-5585: - I don't have a strong opinion on fromIdPrefix and fromId. Both ways make sense to me. bq. This JIRA is focusing only on general entities pagination. But should also implement pagination for other REST API's. Yes, we can open another JIRA for pagination for other APIs. Let's finish pagination for entity table here. > [Atsv2] Reader side changes for entity prefix and support for pagination via > additional filters > --- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, > YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, > YARN-5585-workaround.patch, YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765533#comment-15765533 ] Li Lu commented on YARN-5585: - Some of my comments: TimelineUIDConverter - consistency with comments: let's make the numbers consistent in the comments. - Shall we avoid using those constants? We can set an enum to represent each part of the tuple list. EntityRowKeyPrefix - I'm confused by the changes in EntityRowKeyPrefix(String clusterId, String userId, String flowName, Long flowRunId, String appId, String entityType, Long entityIdPrefix, String entityId). Why are we changing this method, but do not overload a new one? Some changes to existing callsites seems irrelevant to the changes here. - Inconsistent javadocs. We need to be very clear on what prefix are we generating, especially on the final qualifier. TestRowKeys - Though there is no specific rule, let's not put specific author names in the test data? > [Atsv2] Reader side changes for entity prefix and support for pagination via > additional filters > --- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, > YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, > YARN-5585-workaround.patch, YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765001#comment-15765001 ] Li Lu commented on YARN-5585: - I'm fine with only supporting inputs with idPrefix for fromId. Once users can *query* entities/entity without a prefix it sounds fine to me. > [Atsv2] Reader side changes for entity prefix and support for pagination via > additional filters > --- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, > YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, > YARN-5585-workaround.patch, YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters
[ https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762691#comment-15762691 ] Li Lu commented on YARN-5585: - Thanks [~rohithsharma] for the update! With regards to the APIs, I think we can pretty much reuse the current set of APIs. IMO, we should not force a prefix all the time. Of course, if the user knows the exact entity prefix it's certainly beneficial to include it in the query (so that we can save a range scan and just use a get). When referring to timeline entity ids, how about the following patterns: 1. !: string 1 is the prefix and string 2 is the id 2. or *\!: string is the entity id and the storage needs to query the entity prefix. If we have problems distinguishing from the above case maybe we can use *\! bq. If we plan to reuse same API's, then we need to handle one scenario where same entityId is published with 2 entityIdPrefix. This sounds like a really messy situation. Semantically, we've got two ways to decide this: 1) we explicitly claim that entity prefix id is a part of the id system. This means two entities are different even if they only only differ in entity prefixes and 2) we claim that entity prefix is _not_ a part of the id system. Under this assumption, it is up to the storage system to decide how to deal the case which prefixes are updated. Therefore the behavior when one entity is associated with two prefixes, from the API level, is undefined. As [~varun_saxena] suggested, the storage may throw exceptions or return errors when multiple prefixes are found for the same entity. > [Atsv2] Reader side changes for entity prefix and support for pagination via > additional filters > --- > > Key: YARN-5585 > URL: https://issues.apache.org/jira/browse/YARN-5585 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, > YARN-5585-YARN-5355.0002.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch > > > TimelineReader REST API's provides lot of filters to retrieve the > applications. Along with those, it would be good to add new filter i.e fromId > so that entities can be retrieved after the fromId. > Current Behavior : Default limit is set to 100. If there are 1000 entities > then REST call gives first/last 100 entities. How to retrieve next set of 100 > entities i.e 101 to 200 OR 900 to 801? > Example : If applications are stored database, app-1 app-2 ... app-10. > *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is > no way to achieve this. > So proposal is to have fromId in the filter like > *getApps?limit=5&&fromId=app-5* which gives list of apps from app-6 to > app-10. > Since ATS is targeting large number of entities storage, it is very common > use case to get next set of entities using fromId rather than querying all > the entites. This is very useful for pagination in web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
[ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762438#comment-15762438 ] Li Lu commented on YARN-4061: - Thanks [~jrottinghuis]! bq. For our usecase, that makes the puts idempotent. Other use-cases may not need this requirement, but they do need to deal with duplicate puts. That makes sense to me. However do we think this is too specific (at least for now) for our use case in timeline v2? I can understand if there are concerns in the HBase community if we'd like to put this immediately into HBase codebase... Maybe what we can do is to expose buffered mutators from Hbase, and implement our own spooling buffered mutator in timeline code? > [Fault tolerance] Fault tolerant writer for timeline v2 > --- > > Key: YARN-4061 > URL: https://issues.apache.org/jira/browse/YARN-4061 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Joep Rottinghuis > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: FaulttolerantwriterforTimelinev2.pdf > > > We need to build a timeline writer that can be resistant to backend storage > down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
[ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752682#comment-15752682 ] Li Lu commented on YARN-4061: - I went through the new design doc in HBASE-17018 and I think it's mostly good. As we discussed in the weekly sync meeting, one thing we may want to sort out here is how to handle the case when the collectors started up and the HBase cluster was down. From my point of view, the most conservative approach is to assume the HBase cluster was always BAD upon start up. However, the problem is we have to spool the very first writes anyways. Can we have a "PROBING" state in the coordinator, where we may tolerate slightly longer submission time, to let the spooling mutator firstly probe the state of the HBase cluster? Also, this probing process may happen before the first write ever comes, so that we can do out-of-band probing? Another my question is on the idempotent write requirements. Moving my comments from google doc to here: bq. The spooling mutator itself guarantees an "at least once" semantic? One thing I'd like to discuss here is about the write timestamp of each timeline writes. I'm not familiar with the HBase code, but are we generating one unique timestamp for each write when we actually write them to HBase? If this is the case, replaying timeline writes may generate different timestamp and those repeated writes may not be idempotent in timeline's perspective? > [Fault tolerance] Fault tolerant writer for timeline v2 > --- > > Key: YARN-4061 > URL: https://issues.apache.org/jira/browse/YARN-4061 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Joep Rottinghuis > Labels: YARN-5355 > Attachments: FaulttolerantwriterforTimelinev2.pdf > > > We need to build a timeline writer that can be resistant to backend storage > down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5976) Update hbase version to 1.2
[ https://issues.apache.org/jira/browse/YARN-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749184#comment-15749184 ] Li Lu commented on YARN-5976: - Let's remove the Phoenix dependency and revert affected patches. We can work on the Phoenix stuffs later on. > Update hbase version to 1.2 > --- > > Key: YARN-5976 > URL: https://issues.apache.org/jira/browse/YARN-5976 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C > Fix For: YARN-5355 > > Attachments: YARN-5976.001.wip.patch > > > I believe phoenix now works with hbase 1.2. We should now upgrade timeline > service to use hbase 1.2 now. > And also update documentation in timelineservice to reflect that hbase mode > of all daemons in single jvm but writing to hdfs is supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals
[ https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746392#comment-15746392 ] Li Lu commented on YARN-5647: - Sorry for the late reply folks, as I kept being stuck with some other issues recently. So after putting some thoughts and having discussions with some other community folks, I think for now it's fine to proceed with the current proposal on reusing delegation tokens in this JIRA. Some extra facts/questions: 1. The most common reaction people had, when I talked about our security/token design, was "why not directly reuse the mechanisms like NM token or block token". Those two tokens reflect the typical security mechanism. A central server (RM or NN) shares a secret with slave nodes, and issues tokens to requestors on behalf of the slave nodes. Tokens are passed to requestors by the central server and the central server will handle all renewals. 2. Our current proposal is a distributed solution: each launched collectors will issue tokens by itself, the token information is passed to a central server (RM) and the RM will further distribute those tokens to the right party. 3. I believe the fundamental difference between the two approaches are a) who issues the token and b) the channel through which we distribute the token. For a), if we have a working E2E POC for collectors to issue tokens, I'm fine with it. For b), seems like we're utilizing our collector discovery mechanism to distribute tokens. So we will change collector discovery once again? Are there any concerns with this? > [Security] Collector and reader side changes for loading auth filters and > principals > > > Key: YARN-5647 > URL: https://issues.apache.org/jira/browse/YARN-5647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: oct16-hard > Attachments: YARN-5647-YARN-5355.wip.002.patch, > YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5974) Remove direct reference to TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733773#comment-15733773 ] Li Lu commented on YARN-5974: - The timed out UT looks weird since I could not find anything useful in the test report. I tried to reproduce it locally, but the test passed successfully in 47s. Not sure what happened on Jenkins. > Remove direct reference to TimelineClientImpl > - > > Key: YARN-5974 > URL: https://issues.apache.org/jira/browse/YARN-5974 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Li Lu >Assignee: Li Lu > Labels: newbie++ > Attachments: YARN-5974-YARN-5355.001.patch > > > [~sjlee0]'s quick audit shows that things that are referencing > TimelineClientImpl directly today: > JobHistoryFileReplayMapperV1 (MR) > SimpleEntityWriterV1 (MR) > TestDistributedShell (DS) > TestDSAppMaster (DS) > TestNMTimelinePublisher (node manager) > TestTimelineWebServicesWithSSL (AHS) > This is not the right way to use TimelineClient and we should avoid direct > reference to TimelineClientImpl as much as possible. > Any newcomers to the community are more than welcome to take this. If this > remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5974) Remove direct reference to TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5974: Attachment: YARN-5974-YARN-5355.001.patch Took a look at all direct references to TimelineClientImpl in our code base. There are actually 3 types of non-trivial references: a) Directly creating TimelineClientImpl in code. This is wrong. b) Creating anonymous class with a super class of TimelineClientImpl in test. c) Checking test-visible fields of TimelineClientImpl in related unit tests. The current (small) patch fixes all type a) problems in our code base. I believe type c) references are mostly fine since the author clearly knows the implication of the explicit test-visible method calls. I haven't decided yet on all type b) references. On one hand they're fine, since people also know the implication of an anonymous class in test code. On the other hand they're a little bit messy: once we'd like to split TimelineClientImpl we have to duplicate the work. We can have some intermediate class like TimelineClientImplV1ForTest extends TimelineClientImpl, and put that in test only. However, I'm not sure if the benefit justifies the efforts. Thoughts? > Remove direct reference to TimelineClientImpl > - > > Key: YARN-5974 > URL: https://issues.apache.org/jira/browse/YARN-5974 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Li Lu >Assignee: Li Lu > Labels: newbie++ > Attachments: YARN-5974-YARN-5355.001.patch > > > [~sjlee0]'s quick audit shows that things that are referencing > TimelineClientImpl directly today: > JobHistoryFileReplayMapperV1 (MR) > SimpleEntityWriterV1 (MR) > TestDistributedShell (DS) > TestDSAppMaster (DS) > TestNMTimelinePublisher (node manager) > TestTimelineWebServicesWithSSL (AHS) > This is not the right way to use TimelineClient and we should avoid direct > reference to TimelineClientImpl as much as possible. > Any newcomers to the community are more than welcome to take this. If this > remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730203#comment-15730203 ] Li Lu commented on YARN-4675: - For TimelineClientImpl, I'm totally fine to separate v1 and v2. I'm not worrying too much on code duplication for security related parts since they're yet to be finalized. For the rest part, I'm totally fine with separating them. Let me work on YARN-5974 to unblock this. > Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, oct16-medium > Attachments: YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5974) Remove direct reference to TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730205#comment-15730205 ] Li Lu commented on YARN-5974: - Time is up... So I'll take this work... > Remove direct reference to TimelineClientImpl > - > > Key: YARN-5974 > URL: https://issues.apache.org/jira/browse/YARN-5974 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Li Lu > Labels: newbie++ > > [~sjlee0]'s quick audit shows that things that are referencing > TimelineClientImpl directly today: > JobHistoryFileReplayMapperV1 (MR) > SimpleEntityWriterV1 (MR) > TestDistributedShell (DS) > TestDSAppMaster (DS) > TestNMTimelinePublisher (node manager) > TestTimelineWebServicesWithSSL (AHS) > This is not the right way to use TimelineClient and we should avoid direct > reference to TimelineClientImpl as much as possible. > Any newcomers to the community are more than welcome to take this. If this > remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5974) Remove direct reference to TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-5974: --- Assignee: Li Lu > Remove direct reference to TimelineClientImpl > - > > Key: YARN-5974 > URL: https://issues.apache.org/jira/browse/YARN-5974 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Li Lu >Assignee: Li Lu > Labels: newbie++ > > [~sjlee0]'s quick audit shows that things that are referencing > TimelineClientImpl directly today: > JobHistoryFileReplayMapperV1 (MR) > SimpleEntityWriterV1 (MR) > TestDistributedShell (DS) > TestDSAppMaster (DS) > TestNMTimelinePublisher (node manager) > TestTimelineWebServicesWithSSL (AHS) > This is not the right way to use TimelineClient and we should avoid direct > reference to TimelineClientImpl as much as possible. > Any newcomers to the community are more than welcome to take this. If this > remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5647) [Security] Collector and reader side changes for loading auth filters and principals
[ https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727044#comment-15727044 ] Li Lu commented on YARN-5647: - Thanks [~varun_saxena]! Can we split the kerberos related work from the rest part of the patch and focus on kerberos here? I can see we reused some logic for TimelineDelegationToken which previously got questioned by [~jianhe]. Shall we put aside the possible changes on tokens and focus on making all timeline related server-side components kerberos authenticated? In this way YARN-5647 can focus on kerberos login, YARN-5648 on authentication filters, and we can have another JIRA for the new timeline tokens (generation and distribution)? > [Security] Collector and reader side changes for loading auth filters and > principals > > > Key: YARN-5647 > URL: https://issues.apache.org/jira/browse/YARN-5647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: oct16-hard > Attachments: YARN-5647-YARN-5355.wip.002.patch, > YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726783#comment-15726783 ] Li Lu commented on YARN-4675: - Created YARN-5974 for removing unnecessary references to TimelineClientImpl. > Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, oct16-medium > Attachments: YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5974) Remove direct reference to TimelineClientImpl
Li Lu created YARN-5974: --- Summary: Remove direct reference to TimelineClientImpl Key: YARN-5974 URL: https://issues.apache.org/jira/browse/YARN-5974 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-5355 Reporter: Li Lu [~sjlee0]'s quick audit shows that things that are referencing TimelineClientImpl directly today: JobHistoryFileReplayMapperV1 (MR) SimpleEntityWriterV1 (MR) TestDistributedShell (DS) TestDSAppMaster (DS) TestNMTimelinePublisher (node manager) TestTimelineWebServicesWithSSL (AHS) This is not the right way to use TimelineClient and we should avoid direct reference to TimelineClientImpl as much as possible. Any newcomers to the community are more than welcome to take this. If this remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726776#comment-15726776 ] Li Lu commented on YARN-4675: - I'm fine with separating the v1 and v2 interfaces. Right now mixing v1 and v2 interfaces in one interface looks pretty confusing to me. Since we've decided timeline v2 is not backward compatible at the very beginning, I think it's fine to let users choose between TimelineClient v1 and v2. bq. things that are referencing TimelineClientImpl directly today Yes, we should not directly refer to TimelineClientImpl in downstream usages. Shall I open a JIRA and remove all of them? bq. the facility for getting delegation token and renewing it would be common to both the clients. We would not want to repeat such large amounts of code in both V1 and V2 client implementations. That's certainly a very valid concern, and addressing this may bring in much discussions on security itself. My bottomline here is that let's *assume* any security facilities do not exist in timeline v2, and let's start the design from the scratch. We may then think about how to merge and reuse the code afterwards. For now, let's not think about maximize code reuse for timeline v1 and v2, especially for security? > Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, oct16-medium > Attachments: YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5756) Add state-machine implementation for queues
[ https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723639#comment-15723639 ] Li Lu commented on YARN-5756: - Thanks [~xgong]. Looks fine but I'm not extremely familiar with queue/schedulers. Maybe [~wangda] or [~jianhe] can take a look at it? > Add state-machine implementation for queues > --- > > Key: YARN-5756 > URL: https://issues.apache.org/jira/browse/YARN-5756 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5756.1.patch, YARN-5756.2.patch, YARN-5756.3.patch, > YARN-5756.4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.007.patch Thanks [~varun_saxena] for the comments. Addressed all of them. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, > YARN-5739-YARN-5355.006.patch, YARN-5739-YARN-5355.007.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5756) Add state-machine implementation for queues
[ https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713505#comment-15713505 ] Li Lu commented on YARN-5756: - Thanks [~xgong] for the patch! Generally fine, some comments: QueueState.java - STOP_RUNNING state is a little bit confusing? How about RUNNING, CLOSED (or DRAINING), and STOPPED? - Java doc inconsistencies: at the very beginning of the enum we said there are only two possible states? QueueStateManager.java - Consistency issues with stop and activate queue? We're using fine grained locking to change each queue's status. We need to make the process of stopping each queue and its subqueues atomic (as in concurrency, not in db). Otherwise, concurrent activate queue calls may result in inconsistent results. If coarse grained locking is fine with the current use case, we may want to make activateQueues and stopQueues synchronized? - QueueStateManager only needs the queue mapping in SchedulerQueueManager, so we do not need to reference the whole SchedulerQueueManager here? I don't have a strong opinion here though... > Add state-machine implementation for queues > --- > > Key: YARN-5756 > URL: https://issues.apache.org/jira/browse/YARN-5756 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5756.1.patch, YARN-5756.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.006.patch New 006 patch to make Jenkins happy. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, > YARN-5739-YARN-5355.006.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: (was: YARN-5739-YARN-5355.006.patch) > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, > YARN-5739-YARN-5355.006.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5756) Add state-machine implementation for queues
[ https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713290#comment-15713290 ] Li Lu commented on YARN-5756: - Seems like the second submission has been ignored by Jenkins. Kick it for one more round of testing. > Add state-machine implementation for queues > --- > > Key: YARN-5756 > URL: https://issues.apache.org/jira/browse/YARN-5756 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5756.1.patch, YARN-5756.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.006.patch New patch to address review comments. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch, > YARN-5739-YARN-5355.006.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl
[ https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712881#comment-15712881 ] Li Lu commented on YARN-4675: - Yes I agree we need to decide on this issue soon. +1 for reorganizing TimelineClientImpl. Do we also need to distinguish v2 APIs from timeline clients as well? As of now we will have timeline APIs for v1, v1.5, and v2 so I think it may be helpful to distinguish at least v2 APIs. > Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl > > > Key: YARN-4675 > URL: https://issues.apache.org/jira/browse/YARN-4675 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: YARN-5355, oct16-medium > Attachments: YARN-4675-YARN-2928.v1.001.patch > > > We need to reorganize TimeClientImpl into TimeClientV1Impl , > TimeClientV2Impl and if required a base class, so that its clear which part > of the code belongs to which version and thus better maintainable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710412#comment-15710412 ] Li Lu commented on YARN-5739: - Also, the two augmentParams in GenericEntityReader and in ApplicationEntityReader seems quite similar. The only difference is we need to distinguish if a read is single entity read when we actually augment the params. Can we merge the two logic together? I can expose the base implementation on augmentParams but was wondering if we can further simplify the logic here to just let ApplicationEntityReader#augmentParams call super? > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710400#comment-15710400 ] Li Lu commented on YARN-5739: - But I'd certainly appreciate if there are better names! > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710357#comment-15710357 ] Li Lu commented on YARN-5739: - bq. I hate to nitpick on the name, but AbstractTimelineStorageReader sounds a little awkward to me. Can we stick to the entity reader names? How about AbstractTimelineEntityReader or BaseTimelineEntityReader? Thoughts? Avoiding the term "Entity" is a deliberate choice here. Now EntityTypeReader will not return any entity types so I'm avoiding using the term Entity in the base class. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710108#comment-15710108 ] Li Lu commented on YARN-5933: - bq. AppLogs#parseSummaryLogs() can skip subsequent getAppState for Unknown apps and move them to complete after unknownActiveSecs This will abandon the ability for the ATS to quickly "recover" an application's state from unknown to known? Once an application's status becomes unknown, the timeline server will no longer check the application's status. Therefore, it is not possible to change the app's status back to known. For example, if the ATS server got isolated from the rest of the cluster temporarily, it will stop checking any app's status even though the isolation is only a short while. To solve this we need to have a separate scanning thread for "lost" applications, scanning at a different pace. > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which f
[jira] [Commented] (YARN-5756) Add state-machine implementation for queues
[ https://issues.apache.org/jira/browse/YARN-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710022#comment-15710022 ] Li Lu commented on YARN-5756: - Hi [~xgong], I tried to apply the patch locally but there were several issues to apply to the latest trunk. One significant issue is SchedulerQueueContext.java is missing in trunk? Could you please rebase your patch? Thanks! > Add state-machine implementation for queues > --- > > Key: YARN-5756 > URL: https://issues.apache.org/jira/browse/YARN-5756 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-5756.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5761) Separate QueueManager from Scheduler
[ https://issues.apache.org/jira/browse/YARN-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709840#comment-15709840 ] Li Lu commented on YARN-5761: - Will commit this patch shortly. > Separate QueueManager from Scheduler > > > Key: YARN-5761 > URL: https://issues.apache.org/jira/browse/YARN-5761 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Xuan Gong >Assignee: Xuan Gong > Labels: oct16-medium > Attachments: YARN-5761.1.patch, YARN-5761.1.rebase.patch, > YARN-5761.2.patch, YARN-5761.3.patch, YARN-5761.4.patch, YARN-5761.5.patch, > YARN-5761.6.patch, YARN-5761.7.patch, YARN-5761.7.patch, YARN-5761.8.patch > > > Currently, in scheduler code, we are doing queue manager and scheduling work. > We'd better separate the queue manager out of scheduler logic. In that case, > it would be much easier and safer to extend. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709830#comment-15709830 ] Li Lu commented on YARN-5739: - Any more comments folks? Thanks! > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707125#comment-15707125 ] Li Lu commented on YARN-5933: - Thanks [~Prabhu Joseph] for the clarification. Now I got the point for the flooded exceptions. Checking through the code it seems like in ApplicationClientProtocolPBServiceImpl we're converting the app not found exception into a service exception. We can ignore app not found exception here, but this feels risky as well. There seems to be no real quick solution to this issue, but one mitigation is to reduce unknownActiveSecs set by yarn.timeline-service.entity-group-fs-store.unknown-active-seconds. This decides the "wait time" of timeline server before it declares a lost app to be done. The default value is one full day but for some use cases this can be reduced to hours. For long term maybe we need another interval to check applications in unknown states? > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > Applicati
[jira] [Commented] (YARN-5761) Separate QueueManager from Scheduler
[ https://issues.apache.org/jira/browse/YARN-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706732#comment-15706732 ] Li Lu commented on YARN-5761: - +1 LGTM. Will wait for ~24 hrs before committing this. > Separate QueueManager from Scheduler > > > Key: YARN-5761 > URL: https://issues.apache.org/jira/browse/YARN-5761 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Xuan Gong >Assignee: Xuan Gong > Labels: oct16-medium > Attachments: YARN-5761.1.patch, YARN-5761.1.rebase.patch, > YARN-5761.2.patch, YARN-5761.3.patch, YARN-5761.4.patch, YARN-5761.5.patch, > YARN-5761.6.patch, YARN-5761.7.patch, YARN-5761.7.patch, YARN-5761.8.patch > > > Currently, in scheduler code, we are doing queue manager and scheduling work. > We'd better separate the queue manager out of scheduler logic. In that case, > it would be much easier and safer to extend. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706169#comment-15706169 ] Li Lu commented on YARN-5739: - Kick Jenkins again for the new patch. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.005.patch Refactored EntityTypeReader and TimelineEntityReader. EntityTypeReader has been separated from EntityReaders after this refactoring. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch, YARN-5739-YARN-5355.005.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703452#comment-15703452 ] Li Lu commented on YARN-5739: - Sure. Let me try with some refactoring... > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703312#comment-15703312 ] Li Lu commented on YARN-5933: - After putting some thoughts on this issue I have some hesitation to directly remove the active directory when we see an unknown application exception. The RM does not recognize the application ID does not mean the application is not running. It certainly does not mean there is no concurrent writer to this active directory, although in this reported case this is true. Therefore, simply removing the active directory may not work for the cases where some "hidden" applications are actually writing the directory although the RM does not recognize this app. > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods RM. > {code} > RM logs: > 2016-11-23 13:53:57,345 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicati
[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM
[ https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703184#comment-15703184 ] Li Lu commented on YARN-5933: - bq. ATS will keep on requesting and which floods RM. [~Prabhu Joseph] by saying "flood" do you mean the ATS launched requests to RM in a frequency higher than expected? > ATS stale entries in active directory causes ApplicationNotFoundException in > RM > --- > > Key: YARN-5933 > URL: https://issues.apache.org/jira/browse/YARN-5933 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > > On Secure cluster where ATS is down, Tez job submitted will fail while > getting TIMELINE_DELEGATION_TOKEN with below exception > {code} > 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from > alltypesorc group by csmallint; > INFO : Session is already open > INFO : Dag name: select csmallint from alltypesor...csmallint(Stage-1) > INFO : Tez session was closed. Reopening... > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Failed to connect to timeline server. Connection > retries limit exceeded. The posted timeline event may be missing > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250) > at > org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72) > at org.apache.tez.client.TezClient.start(TezClient.java:409) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) > at > org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71) > at > org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Tez YarnClient has received an applicationID from RM. On Restarting ATS now, > ATS tries to get the application report from RM and so RM will throw > ApplicationNotFoundException. ATS will keep on requesting and which floods RM. > {code} > RM logs: > 2016-11-23 13:53:57,345 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicationId: 5 > 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 8050, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 172.26.71.120:37699 Call#26 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1479897867169_0005' doesn't exist in RM. > at > org.apa
[jira] [Commented] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688032#comment-15688032 ] Li Lu commented on YARN-5739: - Sure. Let's wait for more comments on this. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.004.patch Address comments above. bq. Moreover, REST endpoint suggestion was both entity-types and entitytypes. I am fine with both as we do use hyphen in other REST endpoints in YARN. Let us go with majority opinion. Right now I'm following our practices in node label related web services in the RM. Please do let me know if the hyphens will cause any troubles. Thanks! > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch, > YARN-5739-YARN-5355.004.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
[ https://issues.apache.org/jira/browse/YARN-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5739: Attachment: YARN-5739-YARN-5355.003.patch Version 003 patch that addresses more review comments. Specifically: 1. Added a get next row key API shared with the patch in YARN-5585. 2. Removed setCache call for scans according to a discussion with Enis in HBase community. Now we're just using setPageFilter(1) to limit scan size. Enis's suggestion is that this should be sufficient. > Provide timeline reader API to list available timeline entity types for one > application > --- > > Key: YARN-5739 > URL: https://issues.apache.org/jira/browse/YARN-5739 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-5739-YARN-5355.001.patch, > YARN-5739-YARN-5355.002.patch, YARN-5739-YARN-5355.003.patch > > > Right now we only show a part of available timeline entity data in the new > YARN UI. However, some data (especially library specific data) are not > possible to be queried out by the web UI. It will be appealing for the UI to > provide an "entity browser" for each YARN application. Actually, simply > dumping out available timeline entities (with proper pagination, of course) > would be pretty helpful for UI users. > On timeline side, we're not far away from this goal. Right now I believe the > only thing missing is to list all available entity types within one > application. The challenge here is that we're not storing this data for each > application, but given this kind of call is relatively rare (compare to > writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3053) [Security] Review and implement security in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668416#comment-15668416 ] Li Lu commented on YARN-3053: - bq. Can we capture that aspect as a future work as part of implementing the timeline collector as a full user container? Sure. For now let's make the current (aux service) based model work with security. We may do a slight extension to allow collectors in a separate process also work if it's a low hanging fruit. > [Security] Review and implement security in ATS v.2 > --- > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355 > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665393#comment-15665393 ] Li Lu commented on YARN-5814: - Thanks [~BINGXUE QIU] for the doc! I have some quick questions: 1. According to the Design section, the writer may require tranquility and/or kafka as intermediate layers. I'm wondering if there are any issues with these dependencies? 2. For the table design, right now in timeline v.2, container is not a top-level concept (although it is a top-level concept for YARN). Therefore I'm not sure if it is helpful to generalize the container table to an entity table, just as the HBase implementation? We may still put container level data into this table, but maybe it's possible to not to limit this table to container only? > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665337#comment-15665337 ] Li Lu commented on YARN-5814: - Linking this issue the the umbrella JIRA of timeline v.2. > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org