[jira] [Created] (YARN-6535) Program need to exit when SLS finishes.
Yufei Gu created YARN-6535: -- Summary: Program need to exit when SLS finishes. Key: YARN-6535 URL: https://issues.apache.org/jira/browse/YARN-6535 Project: Hadoop YARN Issue Type: Bug Components: scheduler-load-simulator Affects Versions: 3.0.0-alpha2 Reporter: Yufei Gu Assignee: Yufei Gu Program need to exit when SLS finishes except in unit tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6534) ResourceManager failed due to TimelineClient try to init SSLFactory even https is not enabled
Junping Du created YARN-6534: Summary: ResourceManager failed due to TimelineClient try to init SSLFactory even https is not enabled Key: YARN-6534 URL: https://issues.apache.org/jira/browse/YARN-6534 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha3 Reporter: Junping Du Priority: Blocker In a non-secured cluster, RM get failed consistently due to TimelineServiceV1Publisher tries to init TimelineClient with SSLFactory without any checking on if https get used. {noformat} 2017-04-26 21:09:10,683 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1457)) - Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: /etc/security/clientKeys/all.jks (No such file or directory) at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:131) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetricsPublisher.serviceInit(AbstractSystemMetricsPublisher.java:59) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.serviceInit(TimelineServiceV1Publisher.java:67) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:344) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1453) Caused by: java.io.FileNotFoundException: /etc/security/clientKeys/all.jks (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:168) at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:86) at org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:219) at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:179) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.getSSLFactory(TimelineConnector.java:176) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(TimelineConnector.java:106) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 11 more {noformat} CC [~rohithsharma] and [~gtCarrera9] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: About 2.7.4 Release
> On Apr 25, 2017, at 12:35 AM, Akira Ajisakawrote: > > Maybe we should create a jira to track this? > > I think now either way (reopen or create) is fine. > > Release doc maker creates change logs by fetching information from JIRA, so > reopening the tickets should be avoided when a release process is in progress. > Keep in mind that the release documentation is part of the build process. Users who are doing their own builds will have incomplete documentation if we keep re-opening JIRAs after a release. At one point, JIRA was configured to refuse re-opening after a release is cut. I'm not sure why it stopped doing that, but it might be time to see if we can re-enable that functionality. - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6533) Race condition in writing service record to registry in yarn native services
Billie Rinaldi created YARN-6533: Summary: Race condition in writing service record to registry in yarn native services Key: YARN-6533 URL: https://issues.apache.org/jira/browse/YARN-6533 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi The ServiceRecord is written twice, once when the container is initially registered and again in the Docker provider once the IP has been obtained for the container. These occur asynchronously, so the more important record (the one with the IP) can be overwritten by the initial record. Only one record needs to be written, so we can stop writing the initial record when the Docker provider is being used. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6531) Check appStateData size before saving to Zookeeper
[ https://issues.apache.org/jira/browse/YARN-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-6531. - Resolution: Duplicate > Check appStateData size before saving to Zookeeper > -- > > Key: YARN-6531 > URL: https://issues.apache.org/jira/browse/YARN-6531 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > > Application with large size Application submission context could cause store > to Zookeeper failure due to znode size limit. Zookeeper znode limit exception > thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. > ZkStateStore will retry for configured times and will throw > ConnectionLossException after configured limit. > Which could cause Resource manager to switch from active To StandBy and other > application submitted not getting save to ZK. > Solution {{ApplicationStateData}} size to be validated before saving and > reject application so that ResourceManager is not impacted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6532) Allocated container metrics per applciation should be exposed using Yarn & ATS rest APIs
Ravi Teja Chilukuri created YARN-6532: - Summary: Allocated container metrics per applciation should be exposed using Yarn & ATS rest APIs Key: YARN-6532 URL: https://issues.apache.org/jira/browse/YARN-6532 Project: Hadoop YARN Issue Type: Improvement Components: ATSv2, resourcemanager, restapi Reporter: Ravi Teja Chilukuri Currently *allocatedMB* and *allocatedVCores* are being exposed by the RM and ATS rest APIs. But we don't have the allocatedContainers exposed per application. This metric can be exposed as a additional param in the existing rest APIs. *RM:* http:///ws/v1/cluster/apps/{appid} *ATS:* http(s):///ws/v1/applicationhistory/apps/{appid} This would be essential for application types like TEZ, where there container re-use. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6531) Sanity check appStateData size before saving to Zookeeper
Bibin A Chundatt created YARN-6531: -- Summary: Sanity check appStateData size before saving to Zookeeper Key: YARN-6531 URL: https://issues.apache.org/jira/browse/YARN-6531 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Application with large size Application submission context could cause store to Zookeeper failure due to znode size limit. Zookeeper znode limit exception thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. ZkStateStore will retry for configured times and will throw ConnectionLossException after configured limit. Which could cause Resource manager to switch from active To StandBy and other application submitted not getting save to ZK. Solution {{ApplicationStateData}} size to be validated before saving and reject application so that ResourceManager is not impacted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: About 2.7.4 Release
Okay. If you file a jira and attach a patch, I'll review it. -Akira On 2017/04/25 22:15, Brahma Reddy Battula wrote: Looks Following Jira's are not updated in CHANGES.txt HADOOP-14066,HDFS-11608,HADOOP-14293,HDFS-11628,YARN-6274,YARN-6152,HADOOP-13119,HDFS-10733,HADOOP-13958,HDFS-11280,YARN-6024. May be we can raise one Jira to track this..? --Brahma Reddy Battula -Original Message- From: Akira Ajisaka [mailto:aajis...@apache.org] Sent: 25 April 2017 15:36 To: Haohui Mai Cc: Brahma Reddy Battula; Andrew Wang; Sangjin Lee; Vinod Kumar Vavilapalli; Marton Elek; Hadoop Common; yarn-dev@hadoop.apache.org; Hdfs-dev; mapreduce-...@hadoop.apache.org Subject: Re: About 2.7.4 Release > It would be great to backport HDFS-9710 to 2.7.4 as this is one of the > critical fixes on scalability. Sounds good. > Maybe we should create a jira to track this? I think now either way (reopen or create) is fine. Release doc maker creates change logs by fetching information from JIRA, so reopening the tickets should be avoided when a release process is in progress. The issue HDFS-9710 (and HDFS-9726) have been fixed in 2.8.0 and 3.0.0-alpha1 and both versions have been released, so reopening this issue does not affect the release doc maker. -Akira On 2017/04/25 16:21, Haohui Mai wrote: It would be great to backport HDFS-9710 to 2.7.4 as this is one of the critical fixes on scalability. Maybe we should create a jira to track this? ~Haohui On Tue, Apr 25, 2017 at 12:06 AM, Akira Ajisakawrote: Ping I too can help with the release process. Now there are 0 blocker and 6 critical issues targeted for 2.7.4. https://s.apache.org/HsIu If there are critical/blocker issues that need to be fixed in branch-2.7, please set Target Version/s to 2.7.4. That way the issues can be found by the above query. I'll check if there are conflicts among JIRA, git commit log, and the change logs. Regards, Akira On 2017/04/18 15:40, Brahma Reddy Battula wrote: Hi All Any update on 2.7.4 ..? Gentle Remainder!! Let me know anything I can help on this.. Regards Brahma Reddy Battula -Original Message- From: Andrew Wang [mailto:andrew.w...@cloudera.com] Sent: 08 March 2017 04:22 To: Sangjin Lee Cc: Marton Elek; Hadoop Common; yarn-dev@hadoop.apache.org; Hdfs-dev; mapreduce-...@hadoop.apache.org Subject: Re: About 2.7.4 Release Our release steps are documented on the wiki: 2.6/2.7: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR 2.8+: https://wiki.apache.org/hadoop/HowToRelease I think given the push toward 2.8 and 3.0, there's less interest in streamlining the 2.6 and 2.7 release processes. CHANGES.txt is the biggest pain, and that's fixed in 2.8+. Current pain points for 2.8+ include: # fixing up JIRA versions and the release notes, though I somewhat addressed this with the versions script for 3.x # making and staging an RC and sending the vote email still requires a lot of manual steps # publishing the release is also quite manual I think the RC issues can be attacked with enough scripting. Steve had an ant file that automated a lot of this for slider. I think it'd be nice to have a nightly Jenkins job that builds an RC, since I've spent a day or two for each 3.x alpha fixing build issues. Publishing can be attacked via a mix of scripting and revamping the darned website. Forrest is pretty bad compared to the newer static site generators out there (e.g. need to write XML instead of markdown, it's hard to review a staging site because of all the absolute links, hard to customize, did I mention XML?), and the look and feel of the site is from the 00s. We don't actually have that much site content, so it should be possible to migrate to a new system. On Tue, Mar 7, 2017 at 9:13 AM, Sangjin Lee wrote: I don't think there should be any linkage between releasing 2.8.0 and 2.7.4. If we have a volunteer for releasing 2.7.4, we should go full speed ahead. We still need a volunteer from a PMC member or a committer as some tasks may require certain privileges, but I don't think it precludes working with others to close down the release. I for one would like to see more frequent releases, and being able to automate release steps more would go a long way. On Tue, Mar 7, 2017 at 2:16 AM, Marton Elek wrote: Is there any reason to wait for 2.8 with 2.7.4? Unfortunately the previous thread about release cadence has been ended without final decision. But if I understood well, there was more or less an agreement about that it would be great to achieve more frequent releases, if possible (with or without written rules and EOL policy). I personally prefer to be more closer to the scheduling part of the proposal: "A minor release on the latest major line should be every 6 months, and a maintenance release on a minor release (as there may be concurrently maintained minor releases) every 2 months". I don't know what is the hardest