[jira] [Created] (YARN-6535) Program need to exit when SLS finishes.

2017-04-26 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-6535:
--

 Summary: Program need to exit when SLS finishes. 
 Key: YARN-6535
 URL: https://issues.apache.org/jira/browse/YARN-6535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 3.0.0-alpha2
Reporter: Yufei Gu
Assignee: Yufei Gu


Program need to exit when SLS finishes except in unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6534) ResourceManager failed due to TimelineClient try to init SSLFactory even https is not enabled

2017-04-26 Thread Junping Du (JIRA)
Junping Du created YARN-6534:


 Summary: ResourceManager failed due to TimelineClient try to init 
SSLFactory even https is not enabled
 Key: YARN-6534
 URL: https://issues.apache.org/jira/browse/YARN-6534
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Junping Du
Priority: Blocker


In a non-secured cluster, RM get failed consistently due to 
TimelineServiceV1Publisher tries to init TimelineClient with SSLFactory without 
any checking on if https get used.

{noformat}
2017-04-26 21:09:10,683 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:main(1457)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: java.io.FileNotFoundException: 
/etc/security/clientKeys/all.jks (No such file or directory)
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:131)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetricsPublisher.serviceInit(AbstractSystemMetricsPublisher.java:59)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.serviceInit(TimelineServiceV1Publisher.java:67)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:344)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1453)
Caused by: java.io.FileNotFoundException: /etc/security/clientKeys/all.jks (No 
such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:168)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:86)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:219)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:179)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector.getSSLFactory(TimelineConnector.java:176)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(TimelineConnector.java:106)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 11 more
{noformat}
CC [~rohithsharma] and [~gtCarrera9]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-04-26 Thread Allen Wittenauer

> On Apr 25, 2017, at 12:35 AM, Akira Ajisaka  wrote:
> > Maybe we should create a jira to track this?
> 
> I think now either way (reopen or create) is fine.
> 
> Release doc maker creates change logs by fetching information from JIRA, so 
> reopening the tickets should be avoided when a release process is in progress.
> 

Keep in mind that the release documentation is part of the build 
process.  Users who are doing their own builds will have incomplete 
documentation if we keep re-opening JIRAs after a release.  At one point, JIRA 
was configured to refuse re-opening after a release is cut.  I'm not sure why 
it stopped doing that, but it might be time to see if we can re-enable that 
functionality.


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6533) Race condition in writing service record to registry in yarn native services

2017-04-26 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-6533:


 Summary: Race condition in writing service record to registry in 
yarn native services
 Key: YARN-6533
 URL: https://issues.apache.org/jira/browse/YARN-6533
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


The ServiceRecord is written twice, once when the container is initially 
registered and again in the Docker provider once the IP has been obtained for 
the container. These occur asynchronously, so the more important record (the 
one with the IP) can be overwritten by the initial record. Only one record 
needs to be written, so we can stop writing the initial record when the Docker 
provider is being used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6531) Check appStateData size before saving to Zookeeper

2017-04-26 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-6531.
-
Resolution: Duplicate

> Check appStateData size before saving to Zookeeper
> --
>
> Key: YARN-6531
> URL: https://issues.apache.org/jira/browse/YARN-6531
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Application with large size Application submission context could cause store 
> to Zookeeper failure due to znode size limit. Zookeeper znode limit exception 
> thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. 
> ZkStateStore will retry for configured times and will throw 
> ConnectionLossException after configured limit.
> Which could cause Resource manager to switch from active To StandBy and other 
> application submitted not getting save to ZK.
> Solution {{ApplicationStateData}} size to be validated before saving and 
> reject application so that ResourceManager is not impacted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6532) Allocated container metrics per applciation should be exposed using Yarn & ATS rest APIs

2017-04-26 Thread Ravi Teja Chilukuri (JIRA)
Ravi Teja Chilukuri created YARN-6532:
-

 Summary: Allocated container metrics per applciation should be 
exposed using Yarn & ATS rest APIs
 Key: YARN-6532
 URL: https://issues.apache.org/jira/browse/YARN-6532
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: ATSv2, resourcemanager, restapi
Reporter: Ravi Teja Chilukuri


Currently *allocatedMB* and *allocatedVCores* are being exposed by the RM and 
ATS rest APIs.
But we don't have the allocatedContainers exposed per application.

This metric can be exposed as a additional param in the existing rest APIs.

*RM:*  http:///ws/v1/cluster/apps/{appid}
*ATS:* http(s):///ws/v1/applicationhistory/apps/{appid}


This would be essential for application types like TEZ, where there container 
re-use. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6531) Sanity check appStateData size before saving to Zookeeper

2017-04-26 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-6531:
--

 Summary: Sanity check appStateData size before saving to Zookeeper
 Key: YARN-6531
 URL: https://issues.apache.org/jira/browse/YARN-6531
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


Application with large size Application submission context could cause store to 
Zookeeper failure due to znode size limit. Zookeeper znode limit exception 
thrown {{org.apache.zookeeper.KeeperException$ConnectionLossException}}. 
ZkStateStore will retry for configured times and will throw 
ConnectionLossException after configured limit.
Which could cause Resource manager to switch from active To StandBy and other 
application submitted not getting save to ZK.
Solution {{ApplicationStateData}} size to be validated before saving and reject 
application so that ResourceManager is not impacted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-04-26 Thread Akira Ajisaka

Okay. If you file a jira and attach a patch, I'll review it.

-Akira

On 2017/04/25 22:15, Brahma Reddy Battula wrote:

Looks Following Jira's are not updated in CHANGES.txt


HADOOP-14066,HDFS-11608,HADOOP-14293,HDFS-11628,YARN-6274,YARN-6152,HADOOP-13119,HDFS-10733,HADOOP-13958,HDFS-11280,YARN-6024.

May be we can raise one Jira to track this..?


--Brahma Reddy Battula

-Original Message-
From: Akira Ajisaka [mailto:aajis...@apache.org]
Sent: 25 April 2017 15:36
To: Haohui Mai
Cc: Brahma Reddy Battula; Andrew Wang; Sangjin Lee; Vinod Kumar Vavilapalli; 
Marton Elek; Hadoop Common; yarn-dev@hadoop.apache.org; Hdfs-dev; 
mapreduce-...@hadoop.apache.org
Subject: Re: About 2.7.4 Release

 > It would be great to backport HDFS-9710 to 2.7.4 as this is one of the  > 
critical fixes on scalability.

Sounds good.

 > Maybe we should create a jira to track this?

I think now either way (reopen or create) is fine.

Release doc maker creates change logs by fetching information from JIRA, so 
reopening the tickets should be avoided when a release process is in progress.

The issue HDFS-9710 (and HDFS-9726) have been fixed in 2.8.0 and
3.0.0-alpha1 and both versions have been released, so reopening this issue does 
not affect the release doc maker.

-Akira

On 2017/04/25 16:21, Haohui Mai wrote:

It would be great to backport HDFS-9710 to 2.7.4 as this is one of the
critical fixes on scalability. Maybe we should create a jira to track
this?

~Haohui

On Tue, Apr 25, 2017 at 12:06 AM, Akira Ajisaka  wrote:

Ping

I too can help with the release process.

Now there are 0 blocker and 6 critical issues targeted for 2.7.4.
https://s.apache.org/HsIu

If there are critical/blocker issues that need to be fixed in
branch-2.7, please set Target Version/s to 2.7.4. That way the issues
can be found by the above query.

I'll check if there are conflicts among JIRA, git commit log, and the
change logs.

Regards,
Akira


On 2017/04/18 15:40, Brahma Reddy Battula wrote:


Hi All

Any update on 2.7.4 ..?  Gentle Remainder!! Let me know anything I
can help on this..



Regards
Brahma Reddy Battula

-Original Message-
From: Andrew Wang [mailto:andrew.w...@cloudera.com]
Sent: 08 March 2017 04:22
To: Sangjin Lee
Cc: Marton Elek; Hadoop Common; yarn-dev@hadoop.apache.org;
Hdfs-dev; mapreduce-...@hadoop.apache.org
Subject: Re: About 2.7.4 Release

Our release steps are documented on the wiki:

2.6/2.7:

https://wiki.apache.org/hadoop/HowToReleasePreDSBCR

2.8+:
https://wiki.apache.org/hadoop/HowToRelease

I think given the push toward 2.8 and 3.0, there's less interest in
streamlining the 2.6 and 2.7 release processes. CHANGES.txt is the
biggest pain, and that's fixed in 2.8+.

Current pain points for 2.8+ include:

# fixing up JIRA versions and the release notes, though I somewhat
addressed this with the versions script for 3.x # making and staging
an RC and sending the vote email still requires a lot of manual
steps # publishing the release is also quite manual

I think the RC issues can be attacked with enough scripting. Steve
had an ant file that automated a lot of this for slider. I think
it'd be nice to have a nightly Jenkins job that builds an RC, since
I've spent a day or two for each 3.x alpha fixing build issues.

Publishing can be attacked via a mix of scripting and revamping the
darned website. Forrest is pretty bad compared to the newer static
site generators out there (e.g. need to write XML instead of
markdown, it's hard to review a staging site because of all the
absolute links, hard to customize, did I mention XML?), and the look
and feel of the site is from the 00s. We don't actually have that
much site content, so it should be possible to migrate to a new system.

On Tue, Mar 7, 2017 at 9:13 AM, Sangjin Lee  wrote:


I don't think there should be any linkage between releasing 2.8.0
and 2.7.4. If we have a volunteer for releasing 2.7.4, we should go
full speed ahead. We still need a volunteer from a PMC member or a
committer as some tasks may require certain privileges, but I don't
think it precludes working with others to close down the release.

I for one would like to see more frequent releases, and being able
to automate release steps more would go a long way.

On Tue, Mar 7, 2017 at 2:16 AM, Marton Elek 
wrote:


Is there any reason to wait for 2.8 with 2.7.4?

Unfortunately the previous  thread about release cadence has been
ended without final decision. But if I understood well, there was
more or less


an


agreement about that it would be great to achieve more frequent
releases, if possible (with or without written rules and EOL policy).

I personally prefer to be more closer to the scheduling part of
the
proposal:

"A minor release on the latest major line should be every 6
months, and a maintenance release on a minor release (as there may
be concurrently maintained minor releases) every 2 months".

I don't know what is the hardest