[jira] [Created] (YARN-6535) Program need to exit when SLS finishes.

2017-04-26 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-6535:
--

 Summary: Program need to exit when SLS finishes. 
 Key: YARN-6535
 URL: https://issues.apache.org/jira/browse/YARN-6535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 3.0.0-alpha2
Reporter: Yufei Gu
Assignee: Yufei Gu


Program need to exit when SLS finishes except in unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6534) ResourceManager failed due to TimelineClient try to init SSLFactory even https is not enabled

2017-04-26 Thread Junping Du (JIRA)
Junping Du created YARN-6534:


 Summary: ResourceManager failed due to TimelineClient try to init 
SSLFactory even https is not enabled
 Key: YARN-6534
 URL: https://issues.apache.org/jira/browse/YARN-6534
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Junping Du
Priority: Blocker


In a non-secured cluster, RM get failed consistently due to 
TimelineServiceV1Publisher tries to init TimelineClient with SSLFactory without 
any checking on if https get used.

{noformat}
2017-04-26 21:09:10,683 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:main(1457)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: 
/etc/security/clientKeys/all.jks (No such file or directory)
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:131)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetricsPublisher.serviceInit(AbstractSystemMetricsPublisher.java:59)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.serviceInit(TimelineServiceV1Publisher.java:67)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:344)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1453)
Caused by: java.io.FileNotFoundException: /etc/security/clientKeys/all.jks (No 
such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:168)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:86)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:219)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:179)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector.getSSLFactory(TimelineConnector.java:176)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(TimelineConnector.java:106)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 11 more
{noformat}
CC [~rohithsharma] and [~gtCarrera9]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-04-26 Thread Allen Wittenauer

> On Apr 25, 2017, at 12:35 AM, Akira Ajisaka  wrote:
> > Maybe we should create a jira to track this?
> 
> I think now either way (reopen or create) is fine.
> 
> Release doc maker creates change logs by fetching information from JIRA, so 
> reopening the tickets should be avoided when a release process is in progress.
> 

Keep in mind that the release documentation is part of the build 
process.  Users who are doing their own builds will have incomplete 
documentation if we keep re-opening JIRAs after a release.  At one point, JIRA 
was configured to refuse re-opening after a release is cut.  I'm not sure why 
it stopped doing that, but it might be time to see if we can re-enable that 
functionality.


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6533) Race condition in writing service record to registry in yarn native services

2017-04-26 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-6533:


 Summary: Race condition in writing service record to registry in 
yarn native services
 Key: YARN-6533
 URL: https://issues.apache.org/jira/browse/YARN-6533
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


The ServiceRecord is written twice, once when the container is initially 
registered and again in the Docker provider once the IP has been obtained for 
the container. These occur asynchronously, so the more important record (the 
one with the IP) can be overwritten by the initial record. Only one record 
needs to be written, so we can stop writing the initial record when the Docker 
provider is being used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6197) CS Leaf queue am usage gets updated for unmanaged AM

2017-04-26 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-6197.

Resolution: Not A Problem

> CS Leaf queue am usage gets updated for unmanaged AM
> 
>
> Key: YARN-6197
> URL: https://issues.apache.org/jira/browse/YARN-6197
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> {{LeafQueue#activateApplication()}} for unmanaged AM  the am_usage is updated 
> with scheduler minimum allocation size. Cluster resource/AM limit headroom 
> for other apps in queue will get reduced .
> Solution: FicaScheduler unManagedAM flag can be used to check AM type.
> Based on flag the queueusage need to be updated during activation and removal
> Thoughts??



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6531) Check appStateData size before saving to Zookeeper

2017-04-26 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-6531.
-
Resolution: Duplicate

> Check appStateData size before saving to Zookeeper
> --
>
> Key: YARN-6531
> URL: https://issues.apache.org/jira/browse/YARN-6531
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Application with large size Application submission context could cause store 
> to Zookeeper failure due to znode size limit. Zookeeper znode limit exception 
> thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. 
> ZkStateStore will retry for configured times and will throw 
> ConnectionLossException after configured limit.
> Which could cause Resource manager to switch from active To StandBy and other 
> application submitted not getting save to ZK.
> Solution {{ApplicationStateData}} size to be validated before saving and 
> reject application so that ResourceManager is not impacted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6532) Allocated container metrics per applciation should be exposed using Yarn & ATS rest APIs

2017-04-26 Thread Ravi Teja Chilukuri (JIRA)
Ravi Teja Chilukuri created YARN-6532:
-

 Summary: Allocated container metrics per applciation should be 
exposed using Yarn & ATS rest APIs
 Key: YARN-6532
 URL: https://issues.apache.org/jira/browse/YARN-6532
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: ATSv2, resourcemanager, restapi
Reporter: Ravi Teja Chilukuri


Currently *allocatedMB* and *allocatedVCores* are being exposed by the RM and 
ATS rest APIs.
But we don't have the allocatedContainers exposed per application.

This metric can be exposed as a additional param in the existing rest APIs.

*RM:*  http:///ws/v1/cluster/apps/{appid}
*ATS:* http(s):///ws/v1/applicationhistory/apps/{appid}


This would be essential for application types like TEZ, where there container 
re-use. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6531) Sanity check appStateData size before saving to Zookeeper

2017-04-26 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-6531:
--

 Summary: Sanity check appStateData size before saving to Zookeeper
 Key: YARN-6531
 URL: https://issues.apache.org/jira/browse/YARN-6531
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


Application with large size Application submission context could cause store to 
Zookeeper failure due to znode size limit. Zookeeper znode limit exception 
thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. 
ZkStateStore will retry for configured times and will throw 
ConnectionLossException after configured limit.
Which could cause Resource manager to switch from active To StandBy and other 
application submitted not getting save to ZK.
Solution {{ApplicationStateData}} size to be validated before saving and reject 
application so that ResourceManager is not impacted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org