[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK

2017-06-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048702#comment-16048702
 ] 

Jian He commented on YARN-6413:
---

[~ellenfkh], there are a lot of format only changes, can you clean those up ?

> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 
> 0001-WIP-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as follow:

When fair scheduler first assign a container to the application attempt, it 
will do something as blow:
!screenshot-4.png!

When fair scheduler remove the application attempt from the leaf queue, it will 
do something as blow:
!screenshot-5.png!

But when application attempt unregister itself, and all the container in the 
SchedulerApplicationAttempt#liveContainers 
are complete.  There is a APP_ATTEMPT_REMOVED event will send to fair 
scheduler, but it is asynchronous.
Before the application attempt is removed from FSLeafQueue, and there are 
pending request in FSAppAttempt.
The fair scheduler will assign container to the FSAppAttempt, because the size 
of the liveContainers will equals to
1. 
So the FSLeafQueue will add to container resource to the 
FSLeafQueue#amResourceUsage,  it will
let the value of amResourceUsage greater then itself. 
In the end, the value of FSLeafQueue#amResourceUsage is preety large although 
there is no application
it the queue.
When new application come, and the value of FSLeafQueue#amResourceUsage  
greater than the value
of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler 
never assign container to
the queue.
All of the applications in the queue will always pending.

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as flow:


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as 

[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-5.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-4.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

I find the reason why so many applications in my leaf queue is pending. I will 
describe it as flow:

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as flow:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048658#comment-16048658
 ] 

daemon commented on YARN-6710:
--

[~dan...@cloudera.com] I am sorry, I am try to express myself. But my english 
is so poor, so it is very slow
for me to express myself.

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 46 
applications pending. 
Those applications can not run after  several hours, and in the end I have to 
stop them.

I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than 
itself.
When fair scheduler try to assign container to a application attempt,  it will 
do as follow check:

!screenshot-2.png!
!screenshot-3.png!

Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
then it real value.
So when the value of amResourceUsage greater than the value of 
Resources.multiply(getFairShare(), maxAMShare) ,
and the FSLeafQueue#canRunAppAM function will return false which will let the 
fair scheduler not assign container
to the FSAppAttempt. 
In this scenario, all the application attempt will pending and never get any 
resource.

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-3.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-2.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though there are many free resource in my resource manager, but there are 

  was:
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though my cluster is leisure but there are about 


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Description: 
There are over three thousand nodes in my hadoop production cluster, and we use 
fair schedule as my scheduler.
Though my cluster is leisure but there are about 

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though my cluster is leisure but there are about 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048642#comment-16048642
 ] 

Daniel Templeton commented on YARN-6710:


Can you give us more details?  Looking at the screenshot, I see 3600 completed 
apps, which doesn't tell me much.

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daemon updated YARN-6710:
-
Attachment: screenshot-1.png

> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> ---
>
> Key: YARN-6710
> URL: https://issues.apache.org/jira/browse/YARN-6710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: daemon
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue

2017-06-13 Thread daemon (JIRA)
daemon created YARN-6710:


 Summary: There is a heavy bug in FSLeafQueue#amResourceUsage which 
will let the fair scheduler not assign container to the queue
 Key: YARN-6710
 URL: https://issues.apache.org/jira/browse/YARN-6710
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.2
Reporter: daemon
 Fix For: 2.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6413) Decouple Yarn Registry API from ZK

2017-06-13 Thread Ellen Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ellen Hui updated YARN-6413:

Attachment: 0001-Registry-API-v2.patch

Completed patch.

[~jianhe], could you please especially check the security parts of this? I 
tried to preserve the behaviour of the original registry, as best as I 
understood it, but I'm not sure I got it right.

This should yarn-native-sevices should compile against this with minimal 
effort, just fixing some class references on the other end.

> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 
> 0001-WIP-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6702) Zk connection leak during activeService fail if embedded elector is not curator

2017-06-13 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048103#comment-16048103
 ] 

Bibin A Chundatt commented on YARN-6702:


[~rohithsharma]
Thank you for providing patch. Is it possible to add a testcase for the same?


> Zk connection leak during activeService fail if embedded elector is not 
> curator
> ---
>
> Key: YARN-6702
> URL: https://issues.apache.org/jira/browse/YARN-6702
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Bibin A Chundatt
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6702.01.patch
>
>
> {{ResourceManager#transitionToActive}} startActiveService Failure the active 
> services are reinitialized.
> {code}
> this.rmLoginUGI.doAs(new PrivilegedExceptionAction() {
>   @Override
>   public Void run() throws Exception {
> try {
>   startActiveServices();
>   return null;
> } catch (Exception e) {
>   reinitialize(true);
>   throw e;
> }
>   }
> });
> {code}
> {{ZKRMStateStore#initInternal}} will create another ZK connection.
> {code}
> curatorFramework = resourceManager.getCurator();
> if (curatorFramework == null) {
>   curatorFramework = resourceManager.createAndStartCurator(conf);
> }
> {code}
> {quote}
> secureuser@vm1:/opt/hadoop/release/hadoop/sbin> netstat -aen | grep 2181
> tcp0  0 192.168.56.101:49222192.168.56.103:2181 
> ESTABLISHED 1004   31984  
> tcp0  0 192.168.56.101:46016192.168.56.103:2181 
> ESTABLISHED 1004   26120  
> tcp0  0 192.168.56.101:50918192.168.56.103:2181 
> ESTABLISHED 1004   34761  
> tcp0  0 192.168.56.101:49598192.168.56.103:2181 
> ESTABLISHED 1004   32483  
> tcp0  0 192.168.56.101:49472192.168.56.103:2181 
> ESTABLISHED 1004   32364  
> tcp0  0 192.168.56.101:50708192.168.56.103:2181 
> ESTABLISHED 1004   34310  
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6709) Root privilege escalation in experimental Docker support

2017-06-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-6709.

Resolution: Fixed

This was fixed as part of a series of commits done via security@hadoop.  JIRA 
filed for future release notes and changelog generation.

> Root privilege escalation in experimental Docker support
> 
>
> Key: YARN-6709
> URL: https://issues.apache.org/jira/browse/YARN-6709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: security
> Fix For: 3.0.0-alpha3, 2.8.1
>
>
> YARN-3853 and friends do not do enough input validation. They allow a user to 
> do escalate privileges at root trivially.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6709) Root privilege escalation in experimental Docker support

2017-06-13 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6709:
---
Description: YARN-3853 and friends do not do enough input validation. They 
allow a user to do escalate privileges at root trivially. See 
https://effectivemachines.com/2017/06/02/docker-security-in-framework-managed-multi-user-environments/
 for more information.  (was: YARN-3853 and friends do not do enough input 
validation. They allow a user to do escalate privileges at root trivially.)

> Root privilege escalation in experimental Docker support
> 
>
> Key: YARN-6709
> URL: https://issues.apache.org/jira/browse/YARN-6709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.1, 3.0.0-alpha3
>
>
> YARN-3853 and friends do not do enough input validation. They allow a user to 
> do escalate privileges at root trivially. See 
> https://effectivemachines.com/2017/06/02/docker-security-in-framework-managed-multi-user-environments/
>  for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6709) Root privilege escalation in experimental Docker support

2017-06-13 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-6709:
--

 Summary: Root privilege escalation in experimental Docker support
 Key: YARN-6709
 URL: https://issues.apache.org/jira/browse/YARN-6709
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, security
Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0
Reporter: Allen Wittenauer
Assignee: Varun Vasudev
Priority: Blocker
 Fix For: 3.0.0-alpha3, 2.8.1


YARN-3853 and friends do not do enough input validation. They allow a user to 
do escalate privileges at root trivially.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6601) Allow service to be started as System Services during serviceapi start up

2017-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047820#comment-16047820
 ] 

Hadoop QA commented on YARN-6601:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
45s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
36s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} yarn-native-services passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
yarn-native-services has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} yarn-native-services passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  3m 
14s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  3m 14s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  1s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 7 new + 206 unchanged - 1 fixed = 213 total (was 207) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 15s{color} 
| {color:red} hadoop-yarn-services-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6601 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872841/YARN-6601-yarn-native-services.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux a40df69aced1 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / c9d9c94 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16183/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html

[jira] [Updated] (YARN-6601) Allow service to be started as System Services during serviceapi start up

2017-06-13 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated YARN-6601:
--
Attachment: YARN-6601-yarn-native-services.003.patch

> Allow service to be started as System Services during serviceapi start up
> -
>
> Key: YARN-6601
> URL: https://issues.apache.org/jira/browse/YARN-6601
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
> Attachments: SystemServices.pdf, 
> YARN-6601-yarn-native-services.001.patch, 
> YARN-6601-yarn-native-services.002.patch, 
> YARN-6601-yarn-native-services.003.patch
>
>
> This is extended from YARN-1593 focusing only on system services. System 
> services are started during boot up of daemon or admin can be configurable 
> and started at any point of time. These services have special characteristics 
> which need to be respected. The document covers details about system services 
> characteristics. 
> This JIRA is focusing on configuring services using a json template and 
> placing in a shared filesystem. During YARN REST server( native-service-api) 
> start up read services details from shared location and start those services. 
> If there are services already configured than skip those services and 
> continue to start up the services. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6601) Allow service to be started as System Services during serviceapi start up

2017-06-13 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated YARN-6601:
--
Attachment: (was: YARN-6601-yarn-native-services.003.patch)

> Allow service to be started as System Services during serviceapi start up
> -
>
> Key: YARN-6601
> URL: https://issues.apache.org/jira/browse/YARN-6601
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
> Attachments: SystemServices.pdf, 
> YARN-6601-yarn-native-services.001.patch, 
> YARN-6601-yarn-native-services.002.patch
>
>
> This is extended from YARN-1593 focusing only on system services. System 
> services are started during boot up of daemon or admin can be configurable 
> and started at any point of time. These services have special characteristics 
> which need to be respected. The document covers details about system services 
> characteristics. 
> This JIRA is focusing on configuring services using a json template and 
> placing in a shared filesystem. During YARN REST server( native-service-api) 
> start up read services details from shared location and start those services. 
> If there are services already configured than skip those services and 
> continue to start up the services. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk

2017-06-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047505#comment-16047505
 ] 

Naganarasimha G R commented on YARN-5006:
-

Thanks [~bibinchundatt],
Overall the approach and the patch looks fine except for these following nits :
# Please add the configuration in the {{yarn-default.xml}} and also capture 
that it needs to be in sync with zookeeper jute buffer, else though it passes 
here it will fail again at the zookeeper end. I think 
{{TestYarnConfigurationFields}} is failing for the same reason.
# {{StoreLimitException}} documentation refers to only " exceeds limit for ZK 
RM state store" it should be any statestore. as we just catch in RMStatestore 
and hence can be thrown by any store.
# ZKRMStateStore ln no 751: please add the appid information in the log, so 
that it can be traced which app was creating the problem. I would prefer have 
configuration size too in this log.
# RMAppEvent  ln no 51:  {{storeApp}} is not signifying properly, would prefer 
doStoreAppInfo and may be a comment mentioning in the state store

> ResourceManager quit due to ApplicationStateData exceed the limit  size of 
> znode in zk
> --
>
> Key: YARN-5006
> URL: https://issues.apache.org/jira/browse/YARN-5006
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.2
>Reporter: dongtingting
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5006.001.patch, YARN-5006.002.patch
>
>
> Client submit a job, this job add 1 file into DistributedCache. when the 
> job is submitted, ResourceManager sotre ApplicationStateData into zk. 
> ApplicationStateData  is exceed the limit size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>   implements SingleArcTransition {
> @Override
> public void transition(RMStateStore store, RMStateStoreEvent event) {
>   if (!(event instanceof RMStateStoreAppEvent)) {
> // should never happen
> LOG.error("Illegal event type: " + event.getClass());
> return;
>   }
>   ApplicationState appState = ((RMStateStoreAppEvent) 
> event).getAppState();
>   ApplicationId appId = appState.getAppId();
>   ApplicationStateData appStateData = ApplicationStateData
>   .newInstance(appState);
>   LOG.info("Storing info for app: " + appId);
>   try {  
> store.storeApplicationStateInternal(appId, appStateData);  //store 
> the appStateData
> store.notifyApplication(new RMAppEvent(appId,
>RMAppEventType.APP_NEW_SAVED));
>   } catch (Exception e) {
> LOG.error("Error storing app: " + appId, e);
> store.notifyStoreOperationFailed(e);   //handle fail event, system 
> exit 
>   }
> };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
> AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> AsyncDispatcher event handler: Error storing app: 
> application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
> at 
> 

[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent

2017-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047468#comment-16047468
 ] 

Hadoop QA commented on YARN-5892:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
2s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
4s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
8s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 16 new + 654 unchanged - 1 fixed = 670 total (was 655) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
27s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 4 new + 853 unchanged - 0 fixed = 857 total (was 853) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 18s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
15s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5892 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872802/YARN-5892.015.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7d1d3a782bbe 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |