from:"Devaraj K \(JIRA\)"

[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-29 Thread Devaraj K (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-9270:

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1, latest patch looks good to me.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-27 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803090#comment-16803090
 ] 

Devaraj K commented on YARN-9270:
-

[~pbacsko], can you rebase this patch?

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9269) Minor cleanup in FpgaResourceAllocator

2019-03-27 Thread Devaraj K (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-9269:

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1, latest patch looks good to me, committing it shortly.

> Minor cleanup in FpgaResourceAllocator
> --
>
> Key: YARN-9269
> URL: https://issues.apache.org/jira/browse/YARN-9269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Attachments: YARN-9269-001.patch, YARN-9269-002.patch, 
> YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch
>
>
> Some stuff that we observed:
>  * {{addFpga()}} - we check for duplicate devices, but we don't print any 
> error/warning if there's any.
>  * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is 
> this method even needed? We already receive an {{FpgaDevice}} instance in 
> {{updateFpga()}} which I believe is the same that we're looking up.
>  * variable {{IPIDpreference}} is confusing
>  * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of 
> {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple 
> {{HashMap}} suffice?
>  * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear
>  * {{allowedFpgas}} should be an immutable list
>  * {{@VisibleForTesting}} methods should be package private
>  * get rid of {{*}} imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9268) General improvements in FpgaDevice

2019-03-25 Thread Devaraj K (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-9268:

Hadoop Flags: Reviewed

> General improvements in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch, YARN-9268-004.patch, YARN-9268-005.patch, 
> YARN-9268-006.patch, YARN-9268-007.patch
>
>
> Need to fix the following in the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9268) General improvements in FpgaDevice

2019-03-25 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801025#comment-16801025
 ] 

Devaraj K commented on YARN-9268:
-

+1, latest patch looks good to me.

> General improvements in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch, YARN-9268-004.patch, YARN-9268-005.patch, 
> YARN-9268-006.patch, YARN-9268-007.patch
>
>
> Need to fix the following in the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9268) General improvements in FpgaDevice

2019-03-21 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798723#comment-16798723
 ] 

Devaraj K commented on YARN-9268:
-

Thanks [~pbacsko] for quickly updating the patch.

* FpgaResourceAllocator.java
** {{aliasDevName}} is used in {{hashCode()}} but not in {{equals()}}.
** There are some fields not used in {{hashCode()}} and {{equals()}}, don't we 
need to include here?
** can you correct the typo here,
{code}
//key is requetor, aka. container ID
{code}

* TestFpgaResourceHandler.java
** Seems this change is not needed, same applies for all occurrences in this 
test class.

{code}
-  for (FpgaDevice device : allowedDevices) {
+  for (FpgaResourceAllocator.FpgaDevice device : allowedDevices) {
{code}

> General improvements in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch, YARN-9268-004.patch, YARN-9268-005.patch
>
>
> Need to fix the following in the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-21 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798312#comment-16798312
 ] 

Devaraj K commented on YARN-9267:
-

+1, latest patch looks good to me, committing it shortly.

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch, YARN-9267-007.patch, YARN-9267-008.patch, 
> YARN-9267-009.patch, YARN-9267-010.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-21 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798229#comment-16798229
 ] 

Devaraj K commented on YARN-9267:
-

Thanks [~pbacsko] for updating the patch, can you also take care of this 
checkstyle?
{code}
-0  checkstyle  0m 23s  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 46 unchanged - 6 fixed = 47 total (was 52)
{code}

{code}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java:322:
  throws ResourceHandlerException, PrivilegedOperationException, 
IOException {: Line is longer than 80 characters (found 82). [LineLength]
{code}

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch, YARN-9267-007.patch, YARN-9267-008.patch, 
> YARN-9267-009.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-20 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797741#comment-16797741
 ] 

Devaraj K commented on YARN-9267:
-

Please remove this log message, it makes to double log the error.
{code}
+  LOG.error("Could not calculate SHA-256", e);
{code}

Other than that patch looks good to me.

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch, YARN-9267-007.patch, YARN-9267-008.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-20 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797302#comment-16797302
 ] 

Devaraj K commented on YARN-9267:
-

bq. It's usually a good practice to do in unit tests, but I'm not a 
fundamentalist so I can go a file-creation way if you think it's better.
It is making to log the original cause and creating 
{{ResourceHandlerException}} to throw without the original reason. Please 
update to invoke {{getSha256ofFile}} directly.

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch, YARN-9267-007.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-19 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796800#comment-16796800
 ] 

Devaraj K commented on YARN-9267:
-

Thanks [~pbacsko] for updating the patch.

* FpgaResourceHandlerImpl.java

** I am not sure whether this is really needed, I think {{getSha256ofFile}} can 
be invoked directly and with that {{if (!hashOpt.isPresent()) {}} also can be 
avoided.

{code:xml}
+  private Function> digestProvider =
+  this::getSha256ofFile;
{code}

** With the above fix, can you also update here to throw the exception directly 
as a wrapped one and avoid logging.

{code:xml}
+  LOG.error("Could not calculate SHA-256", e);
{code}

* TestFpgaResourceHandler.java

** Can we have a loop here to add the {{FpgaDevice}} objects into 
{{deviceList}}, so that this duplicate code can be removed.

{code:xml}
+deviceList.add(new FpgaDevice(vendorType, 247, 0, null));
.
+deviceList.add(new FpgaDevice(vendorType, 247, 4, null));
{code}

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch, YARN-9267-007.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9268) General improvements in FpgaDevice

2019-03-18 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795671#comment-16795671
 ] 

Devaraj K commented on YARN-9268:
-

Thanks [~pbacsko] for the patch, latest patch is not applying to trunk, please 
update it.

> General improvements in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch
>
>
> Need to fix the following in the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-18 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795669#comment-16795669
 ] 

Devaraj K commented on YARN-9267:
-

Thanks [~pbacsko] for the patch, latest patch has gone stale, can you update 
the patch?

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch, YARN-9267-005.patch, 
> YARN-9267-006.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-18 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795675#comment-16795675
 ] 

Devaraj K commented on YARN-9270:
-

Thanks [~pbacsko] for the patch, latest patch is not getting applied, please 
update it.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9269) Minor cleanup in FpgaResourceAllocator

2019-03-18 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795674#comment-16795674
 ] 

Devaraj K commented on YARN-9269:
-

Thanks [~pbacsko] for the patch, latest patch is not getting applied, please 
update it.

> Minor cleanup in FpgaResourceAllocator
> --
>
> Key: YARN-9269
> URL: https://issues.apache.org/jira/browse/YARN-9269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9269-001.patch, YARN-9269-002.patch, 
> YARN-9269-003.patch
>
>
> Some stuff that we observed:
>  * {{addFpga()}} - we check for duplicate devices, but we don't print any 
> error/warning if there's any.
>  * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is 
> this method even needed? We already receive an {{FpgaDevice}} instance in 
> {{updateFpga()}} which I believe is the same that we're looking up.
>  * variable {{IPIDpreference}} is confusing
>  * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of 
> {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple 
> {{HashMap}} suffice?
>  * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear
>  * {{allowedFpgas}} should be an immutable list
>  * {{@VisibleForTesting}} methods should be package private
>  * get rid of {{*}} imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9267) General improvements in FpgaResourceHandlerImpl

2019-03-14 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793322#comment-16793322
 ] 

Devaraj K commented on YARN-9267:
-

Sorry for coming in late.

Thanks [~pbacsko] for the patch and [~snemeth] & [~tangzhankun] for the reviews.

Patch overall looks good to me, 

* Have you thought of using the existing library api like 
org.apache.commons.codec.digest.DigestUtils.sha256Hex(InputStream data), so 
that we don't have to add Sha256Calculator and tests for that.

> General improvements in FpgaResourceHandlerImpl
> ---
>
> Key: YARN-9267
> URL: https://issues.apache.org/jira/browse/YARN-9267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9267-001.patch, YARN-9267-002.patch, 
> YARN-9267-003.patch, YARN-9267-004.patch
>
>
> Fix some problems in {{FpgaResourceHandlerImpl}}:
>  * {{preStart()}} does not reconfigure card with the same IP - we see it as a 
> problem. If you recompile the FPGA application, you must rename the aocx file 
> because the card will not be reprogrammed. Suggestion: instead of storing 
> Node<\->IPID mapping, store Node<\->IPID hash (like the SHA-256 of the 
> localized file).
>  * Switch to slf4j from Apache Commons Logging
>  * Some unused imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402596#comment-16402596
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~miklos.szeg...@cloudera.com] for review and commit, [~leftnoteasy] and 
others for reviews.

[~miklos.szeg...@cloudera.com], is there any reason to keep this as still 
'Unresolved'?

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v11.patch, YARN-5764-v2.patch, YARN-5764-v3.patch, 
> YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch, 
> YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-03-10 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v11.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v11.patch, YARN-5764-v2.patch, YARN-5764-v3.patch, 
> YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch, 
> YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-03-10 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v10.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v2.patch, YARN-5764-v3.patch, YARN-5764-v4.patch, 
> YARN-5764-v5.patch, YARN-5764-v6.patch, YARN-5764-v7.patch, 
> YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-03-09 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v9.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-03-07 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v8.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch, YARN-5764-v8.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-02-21 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372275#comment-16372275
 ] 

Devaraj K commented on YARN-5764:
-

[~miklos.szeg...@cloudera.com] Thanks for comments.

bq. Is MB not supported?
Here conversion is happening to MB, directly taking the value if it is already 
in MB.

bq. Containers can change their resource usage. I do not see that supported, 
yet. It may need another jira.
Agree, will create an another jira to handle this.

I have addressed the other comments in the patch, please have a look into the 
patch.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-02-21 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v7.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2018-02-21 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v6.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-02-12 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361509#comment-16361509
 ] 

Devaraj K commented on YARN-5764:
-

[~miklos.szeg...@cloudera.com] Sorry for the delay, I will update the patch. 
Thanks for reminding me.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-10-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206622#comment-16206622
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~miklos.szeg...@cloudera.com] for the review.

bq. I see commented code in the body of the function and also in the unit tests.
I commented the recovery code and related tests code since the YARN-7033 was 
not committed by the time when the patch created.

bq. is package-info.java necessary?
It is necessary and it adds a checkstyle error if we don't have it.

I will update the patch with the comments fixed and uncommented code.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups

2017-09-14 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166780#comment-16166780
 ] 

Devaraj K commented on YARN-6620:
-

Thanks [~leftnoteasy] for the quick patch.

bq. Switched JAXB to handle XML parsing instead of check tags.
Overall looking good, it would be better if you could group the adapter and 
other supporting classes as inner classes in PerGpuDeviceInformation.

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using 
> CGroups
> -
>
> Key: YARN-6620
> URL: https://issues.apache.org/jira/browse/YARN-6620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, 
> YARN-6620.006-WIP.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups

2017-09-13 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165119#comment-16165119
 ] 

Devaraj K commented on YARN-6620:
-

Thanks [~leftnoteasy] for the responses.

bq. My understanding of JAXBContext is mostly used when we need to convert 
between object and XML/JSON. Since output of nvidia-smi is a customized XML 
format, which doesn't follow JAXB standard. Is it still best practice to use 
JAXBContext under such use case? For example, FairScheduler parses XML file 
directly: AllocationFileLoaderService#reloadAllocations.

JAXBContext can be used for any XML format, doesn't have to be in any specific 
format, I could see that the sample format in the patch can be converted to a 
Java Object ,so that we can eliminate the traversing and parsing logic in 
GpuDeviceInformationParser.java.

bq. I considered this option before, unless there's strong need for this to run 
different command or call Nvidia native APIs directly, I would prefer to hard 
code to use nvidia-smi instead of introducing another abstraction layer. I'm 
open to do refactoring to support this case once we have such requirements.
I think it would be useful if users have sym links created with different names 
than the hard coded name. I feel we don't have to add a new configuration for 
the executable instead we can have the binary name also as part of 
DEFAULT_NM_GPU_PATH_TO_EXEC and users can provide the path with the executable 
name for the configuration 'yarn.nodemanager.resource.gpu.path-to-executables'.

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using 
> CGroups
> -
>
> Key: YARN-6620
> URL: https://issues.apache.org/jira/browse/YARN-6620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups

2017-09-11 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162491#comment-16162491
 ] 

Devaraj K commented on YARN-6620:
-

Thanks [~leftnoteasy] for the patch, Great work!

There are some comments on the patch.

1. XML file reading in GpuDeviceInformationParser.java, can we use the existing 
libraries like javax.xml.bind.JAXBContext to unmarshall the XML document to a 
Java Object instead of reading tag by tag?

2. If you don't agree to use the existing libraries for reading xml file, 'in' 
stream may have to be closed after reading/parsing.

{code:xml}
  InputStream in = IOUtils.toInputStream(sanitizeXmlInput(xmlStr), "UTF-8");
  doc = builder.parse(in);
{code}

3. Instead of hardcoding the BINARY_NAME, can it be included as part of 
DEFAULT_NM_GPU_PATH_TO_EXEC as a default value, so that it can be also becomes 
configurable if incase users want to change it.
{code:xml}
public static final String DEFAULT_NM_GPU_PATH_TO_EXEC = "";

protected static final String BINARY_NAME = "nvidia-smi";
{code}


4. Please change the inline comment here accordingly.
{code:xml}
+  /**
+   * Disk as a resource is disabled by default.
+   **/
+  @Private
+  public static final boolean DEFAULT_NM_GPU_RESOURCE_ENABLED = false;
{code}

5. Can we use spaces instead of tab characters for indentation in 
nvidia-smi-sample-output.xml?

6. Are we going to support multiple containers/processes(limited number) 
sharing the same GPU device?

7. 

{code:title=GpuResourceAllocator.java|borderStyle=solid}
  for (int deviceNum : allowedGpuDevices) {
if (!usedDevices.containsKey(deviceNum)) {
  usedDevices.put(deviceNum, containerId);
  assignedGpus.add(deviceNum);
  if (assignedGpus.size() == numRequestedGpuDevices) {
break;
  }
}
  }

  // Record in state store if we allocated anything
  if (!assignedGpus.isEmpty()) {
List allocatedDevices = new ArrayList<>();
for (int gpu : assignedGpus) {
  allocatedDevices.add(String.valueOf(gpu));
}
{code}

Can you merge these two for loops into a one like below,

{code:xml}
 usedDevices.put(deviceNum, containerId);
 assignedGpus.add(deviceNum);

allocatedDevices.add(String.valueOf(deviceNum));
{code}

And also if the condition *if (assignedGpus.size() == numRequestedGpuDevices)*  
doesn't meet, do we need to throw an exception or log the error?

8. I see that getGpuDeviceInformation() is getting invoked twice which intern 
executes shell command and parses the xml file which are costly operations. Do 
we need to execute it twice here?

{code:title=GpuResourceDiscoverPlugin.java|borderStyle=solid}
GpuDeviceInformation info = getGpuDeviceInformation();

LOG.info("Trying to discover GPU information ...");
GpuDeviceInformation info = getGpuDeviceInformation();
{code}
And also I don't convince that having the logic other than assigning conf in 
setConf() method.

{code:xml}
public synchronized void setConf(Configuration conf) {
this.conf = conf;
numOfErrorExecutionSinceLastSucceed = 0;
featureEnabled = conf.getBoolean(YarnConfiguration.NM_GPU_RESOURCE_ENABLED,
YarnConfiguration.DEFAULT_NM_GPU_RESOURCE_ENABLED);

if (featureEnabled) {
  String dir = conf.get(YarnConfiguration.NM_GPU_PATH_TO_EXEC,
  .
{code}

And also there are Hadoop QA reported comments which needs to be fixed.

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using 
> CGroups
> -
>
> Key: YARN-6620
> URL: https://issues.apache.org/jira/browse/YARN-6620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6620.001.patch, YARN-6620.002.patch, 
> YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-09-06 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156530#comment-16156530
 ] 

Devaraj K commented on YARN-7033:
-

Thanks [~sunilg] for the review and agreeing with us. 

[~leftnoteasy], can you commit this if you don't have any further comments? 
Thanks

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch, YARN-7033-v3.patch, YARN-7033-v4.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-09-01 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150734#comment-16150734
 ] 

Devaraj K commented on YARN-7033:
-

[~sunilg], Thanks again for looking into this.

bq. In NMLeveldbStateStoreService, CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX will 
be updated only in case of GPU's, NUMA, FPGA's cases correct. If some one adds 
a custom resource after YARN-3926, will this code hit ?
Here CONTAINER_ASSIGNED_RESOURCES_KEY_SUFFIX is combined with the resourceType 
and is used as key for each container assigned resources of that particular 
type, It should work for any resourceType. There is nothing to bind only for 
these GPU's, NUMA, FPGA's types. Please let me know if doesn't clarifies or any 
other thoughts.

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch, YARN-7033-v3.patch, YARN-7033-v4.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-31 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149295#comment-16149295
 ] 

Devaraj K commented on YARN-7033:
-

[~sunilg], can you check the latest patch? Thanks

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch, YARN-7033-v3.patch, YARN-7033-v4.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-30 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-7033:

Attachment: YARN-7033-v4.patch

The previous patch could not be applied due to the recent commits, attaching 
the rebased patch with latest changes.

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch, YARN-7033-v3.patch, YARN-7033-v4.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-30 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-7033:

Attachment: YARN-7033-v3.patch

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch, YARN-7033-v3.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-29 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146663#comment-16146663
 ] 

Devaraj K commented on YARN-7033:
-

Thanks [~leftnoteasy] and [~sunilg] for the confirmation, I will update the 
patch with the revert of enum change. 

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-29 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-7033:

Attachment: YARN-7033-v2.patch

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch, 
> YARN-7033-v2.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2017-08-25 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v5.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2017-08-25 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v4.patch

Updated the patch to use ResourceHandlerModule API's.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7079) to support nodemanager ports management

2017-08-23 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138895#comment-16138895
 ] 

Devaraj K commented on YARN-7079:
-

[~tianjuan428], Thanks for the patch, patch seems to be quite large. I think 
you've spent good effort on this, Can you also upload the design draft/details 
if you have any?

>  to support nodemanager  ports management
> -
>
> Key: YARN-7079
> URL: https://issues.apache.org/jira/browse/YARN-7079
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: 田娟娟
> Attachments: YARN_7079.001.patch
>
>
> Just like the vcores and memory, ports is also  important resource 
> information to job allocation . So we add the ports management logic to yarn. 
> It can meet the user jobs' ports  request, and  never allocate two jobs(with 
> same port requirement) to one machine.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-17 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-7033:

Attachment: YARN-7033-v1.patch

Attaching patch with checkstyle and whitespace errors fixes.

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch, YARN-7033-v1.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-17 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-7033:

Attachment: YARN-7033-v0.patch

Attaching the patch which contains the common code from YARN-6620 to handle the 
assigned resources recovery.

> Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to 
> container
> ---
>
> Key: YARN-7033
> URL: https://issues.apache.org/jira/browse/YARN-7033
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: YARN-7033-v0.patch
>
>
> This JIRA adds the common logic to store the assigned resources to container 
> such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
> recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-08-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129994#comment-16129994
 ] 

Devaraj K commented on YARN-5764:
-

I've created YARN-7033 to move the common logic  from YARN-6620 to handle the 
recovery of assigned resources.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7033) Add support for NM Recovery of assigned resources(GPU's, NUMA, FPGA's) to container

2017-08-16 Thread Devaraj K (JIRA)

Devaraj K created YARN-7033:
---

 Summary: Add support for NM Recovery of assigned resources(GPU's, 
NUMA, FPGA's) to container
 Key: YARN-7033
 URL: https://issues.apache.org/jira/browse/YARN-7033
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Devaraj K
Assignee: Devaraj K


This JIRA adds the common logic to store the assigned resources to container 
such as GPU's(YARN-6620), NUMA(YARN-5764) and FPGA's(YARN-5983) etc. and 
recover upon restart of NM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-08-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129294#comment-16129294
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~leftnoteasy] for the details and the direction.

bq. however since ResourceHandler API is not added to DefaultContainerExecutor, 
it needs some extra effort to bring ResourceHandlerModule API to 
DefaultContainerExecutor, which I'm not sure if it worths
If it is not worth making changes to support DefaultContainerExecutor, we can 
proceed with LinuxContainerExecutor now and see the feasibility in the feature 
for DefaultContainerExecutor.

bq. If you plan to work on this feature in short term (say 1 month), we may 
need to split common libraries to a separate JIRA and commit to trunk first to 
unblock this one. I can do it two weeks after, if you want to speed it up, 
please feel free to take it up.
I can take it up this task to split the common code from YARN-6620 to separate 
JIRA to handle the NM recovery of assigned resources.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-08-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128521#comment-16128521
 ] 

Devaraj K commented on YARN-5764:
-

Hi [~leftnoteasy],
bq. It added numa controller for both default container executor and linux 
container executor, does it make sense to use this feature under default 
container executor since CPU asks might be ignored in RM side (so asking 100 
vcores is same as asking 1 vcores).
I think it would be useful when the user uses default container executor with 
DominantResourceCalculator, please correct me if I am wrong. Thanks

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-08-15 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128107#comment-16128107
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~leftnoteasy] for looking into the patch and for the suggestions, will 
update the patch with the suggestions.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2017-08-15 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v3.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5983) [Umbrella] Support for FPGA as a Resource in YARN

2017-04-28 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988319#comment-15988319
 ] 

Devaraj K commented on YARN-5983:
-

Thanks [~tangzhankun] and [~zyluo] for the design doc and hardwork, 
[~leftnoteasy] for the discussion.

1.
{code:xml}
The scheduler only considers non-exclusive resource. The exclusive resources may
have extra attributes needs to be matched when scheduling. Not just simply add 
or
reduce a number. For instance, in our PoC, a FPGA slot in one node may already
have one IP flashed so that the scheduler should try to match this IP attribute 
to
reuse it.
{code}

If you are passing all the attributes of the FPGA resources to RM scheduler, 
why do you want to have the NM side resource management? Can you give some 
details about the attributes passing to the RM and details maintain by the NM 
side resource management in abstract terms? 

2. {code:xml}
 Device resource needs additional preparation and isolation before container 
launch.
For instance, FPGA device may need to download an IP file from a repo then 
flash to
an allocated FPGA slot.
{code}
Does this need to be done for each container, Can it be done one time during 
the cluster installation?

3. Can FPGA slots share my multiple containers? How do we prevent if any 
container(Non FPGA allocated container)/application try to use the FPGA 
resources which are not allocated to them?

4. Any changes to ContainerExecutor, how does the application code running in 
the container come to know about the allocated FPGA resource to access/use the 
FPFA?

5. What are the configurations user to need to configure for the application to 
use FPGA resources?


> [Umbrella] Support for FPGA as a Resource in YARN
> -
>
> Key: YARN-5983
> URL: https://issues.apache.org/jira/browse/YARN-5983
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
> Attachments: YARN-5983-Support-FPGA-resource-on-NM-side_v1.pdf
>
>
> As various big data workload running on YARN, CPU will no longer scale 
> eventually and heterogeneous systems will become more important. ML/DL is a 
> rising star in recent years, applications focused on these areas have to 
> utilize GPU or FPGA to boost performance. Also, hardware vendors such as 
> Intel also invest in such hardware. It is most likely that FPGA will become 
> popular in data centers like CPU in the near future.
> So YARN as a resource managing and scheduling system, would be great to 
> evolve to support this. This JIRA proposes FPGA to be a first-class citizen. 
> The changes roughly includes:
> 1. FPGA resource detection and heartbeat
> 2. Scheduler changes
> 3. FPGA related preparation and isolation before launch container
> We know that YARN-3926 is trying to extend current resource model. But still 
> we can leave some FPGA related discussion here



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-04-10 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963602#comment-15963602
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~rajesh.balamohan] for taking a look into this.

bq. Was this flag (-XX:useNUMA) enabled in the tasks when running the benchmark?
yes, I used {{-XX:+UseNUMA}} for running the benchmark.

bq. Hive on MR is outdated, network intensive and slow. It would be great, if 
BB benchmark can be run with Hive on Tez which optimizes queries to a great 
extent. It has much better resource utilization and also elimiates a lot of IO 
barriers and would be a lot efficient than MR codebase.
I haven't tried the BB with Hive on Tez, Here we are not evaluating the BB 
execution engines performance and I think 'Hive on MR' or any other component 
would be ok to show case the performance benefits of NUMA patch.


> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2017-04-06 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v2.patch

Updating the patch with the 'interleave' option for memory.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2017-04-06 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: NUMA Performance Results.pdf

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-01-11 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819132#comment-15819132
 ] 

Devaraj K commented on YARN-5764:
-

bq. Do you have any benchmarks results that would illustrate the kind of 
performance gains that could potentially be realised with this patch?

Thanks [~raviprak] for going through this. I will share the performance results 
here.


Thanks [~sunilg] for the comments.
bq. if NM is taking the decision based on cores (NUMA cpus), it ll be more 
container specific. Could we apply it more of application specific where few 
apps containers only will be NUMA aware. 
bq. Also I think such NUMA aware nodes could be controlled within a specific 
nodelabel, I think it may yield better use cases for NUMA. So during NM init, 
such awareness info could be passed to RM and it can be made as node attribute. 
Such nodes could then be labelled together as well.

If we want to run an application only on NUMA aware nodes, we can group NUMA 
aware nodes into a node-label and specify this node-label for the application. 
I am wondering why do some applications don't want to run in NUMA if the NM 
supports and getting some perf gain for making this as applications specific. 
We can also include this as an attribute once the constraint node 
labels(YARN-3409) feature gets in. 


> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-01-10 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816512#comment-15816512
 ] 

Devaraj K commented on YARN-5764:
-

Thanks a lot [~leftnoteasy] for review and comments.

bq. What is the benefit to manually specify NUMA node? Since this is 
potentially complex for end user to specify, I think it's better to directly 
read data from OS.
If the users want to share the NUMA resources in Node Manager machine for 
non-Yarn applications, then users can specify what all numa nodes and each node 
capabilities can be used by Yarn using this declaration. I understand there are 
configurations for specifying numa nodes, each node memory and cpu's. But if we 
don't have provision for separating the NUMA resources for Yarn, we could end 
up overlapping the resources used by Yarn and Non-Yarn applications.

bq. Does the changes work on platform other than Linux?
This patch works for Linux, if this approach is agreeable then I will update 
for windows as well.

bq. I'm not quite sure about if this could happen: with this patch, YARN will 
launch process one by one on each NUMA node to bind memory/cpu. Is it possible 
that there's another process (outside of YARN) uses memory of NUMA node which 
causes processes launched by YARN failed to bind or run?
I do think it could happen for memory, we can avoid this using the NUMA node 
topology declaration for specifying the NUMA resources for Yarn applications. 
And also it would not be an issue with the soft binding option which you 
mentioned in the below comment.

bq. This patch uses hard binding (get allocated resource on specified node or 
fail), is it better to specify soft binding (prefer to allocate and can also 
accept other node). I think soft binding should be default behavior to support 
NUMA.
I think it is a good suggestion, I can update the patch with this by changing 
'\--membind=nodes' to '\--preferred=node'.

I will look forward for your further comments.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-01-06 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806024#comment-15806024
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~rohithsharma] for going through this.

bq. NUMA resources is scheduled by by NodeManager. Why can't RM make the 
decision of scheduling NUMA resources using resource profilers.?
With NUMA, memory blocks and processors in a single machine divided into numa 
nodes, and processors in the numa node can access the memory faster which is 
local to it. If we want to make RM to schedule this information, each NM has to 
send the numa nodes information(i.e. List{(numanode-id, processors, memory),..} 
to RM and RM has to maintain this information including the usage details for 
scheduling. At present RM already does the scheduling of NM memory and vcores 
as a whole and I think it is cumbersome to move numa nodes scheduling which is 
granular level scheduling to RM.

bq. Could you elaborate, why there are multiple numa-awareness.node-ids in 
single machine?
In Non-Uniform Memory Access model(NUMA), memory blocks and processors in a 
single machine divided into multiple numa nodes, and each numa node has an id 
assigned to it. When the user/application want to make use of the numa 
resources, then the process should be bind to those numa node-ids. 

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler

2017-01-06 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805654#comment-15805654
 ] 

Devaraj K commented on YARN-6061:
-

Should not handle this?
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java#L1378

> Add a customized uncaughtexceptionhandler for fair scheduler
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>  Labels: fairscheduler
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-12-23 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v1.patch

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-12-14 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: YARN-5764-v0.patch

Attaching the patch for this.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-12-14 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Attachment: NUMA Awareness for YARN Containers.pdf

Please find the attached proposal document and provide your 
feedback/suggestions. I will upload a patch soon with this approach for better 
understanding.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5764) NUMA awareness support for launching containers

2016-12-02 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-5764:
---

Assignee: Devaraj K

I will upload the design proposal for this.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
> Environment: SW: CentOS 6.7, Hadoop 2.6.0
> Processors: Intel Xeon CPU E5-2699 v4 @2.20GHz
> Memory: 256GB 4 NUMA nodes
>Reporter: Olasoji
>Assignee: Devaraj K
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-12-02 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Environment: (was: SW: CentOS 6.7, Hadoop 2.6.0
Processors: Intel Xeon CPU E5-2699 v4 @2.20GHz
Memory: 256GB 4 NUMA nodes)

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3409) Add constraint node labels

2016-11-25 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15697368#comment-15697368
 ] 

Devaraj K commented on YARN-3409:
-

Thanks [~Naganarasimha]/[~varun_saxena] for the document and others for the 
discussion.

- {code:xml}
String labelExpression,
String constraintLabelExpression, // New modification in the
interface
{code}

- As Bibin mentioned above, 'constraintLabelExpression' naming leads to 
confusion that why do we need two label expressions. I too think we need to 
have different naming if we are going to have this param/configs.

- Can NodeManagers have attribute names same as some label/partition name in 
the cluster? Did you think about having one expression(existing) which handles 
node label expression and constraints expression without delimiter between 
label and constraints expressions, constraints expression support 
implementation can be added without any new configurations/interfaces.  

- Can we have some details about how the NodeManager report these attributes to 
ResourceManager?

> Add constraint node labels
> --
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Constraints are orthogonal to partition, they’re describing attributes of 
> node’s hardware/software just for affinity. Some example of constraints:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3732) Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract classes

2016-10-28 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616520#comment-15616520
 ] 

Devaraj K commented on YARN-3732:
-

Thanks [~rohithsharma] for the review and commit.

> Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as 
> abstract classes
> --
>
> Key: YARN-3732
> URL: https://issues.apache.org/jira/browse/YARN-3732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-3732-1.patch, YARN-3732-2.patch, YARN-3732.patch
>
>
> All the other protocol record classes are abstract classes. Change 
> NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract 
> classes to make it consistent with other protocol record classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3732) Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract classes

2016-10-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15614197#comment-15614197
 ] 

Devaraj K commented on YARN-3732:
-

ASF License warnings are not related to the patch.

> Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as 
> abstract classes
> --
>
> Key: YARN-3732
> URL: https://issues.apache.org/jira/browse/YARN-3732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Attachments: YARN-3732-1.patch, YARN-3732-2.patch, YARN-3732.patch
>
>
> All the other protocol record classes are abstract classes. Change 
> NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract 
> classes to make it consistent with other protocol record classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3732) Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract classes

2016-10-27 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3732:

Attachment: YARN-3732-2.patch

Updated the patch against trunk.

> Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as 
> abstract classes
> --
>
> Key: YARN-3732
> URL: https://issues.apache.org/jira/browse/YARN-3732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Attachments: YARN-3732-1.patch, YARN-3732-2.patch, YARN-3732.patch
>
>
> All the other protocol record classes are abstract classes. Change 
> NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract 
> classes to make it consistent with other protocol record classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3861) Add fav icon to YARN & MR daemons web UI

2016-10-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613564#comment-15613564
 ] 

Devaraj K commented on YARN-3861:
-

Thanks [~rchiang] for the comment, can you provide the icon if you have that? 

> Add fav icon to YARN & MR daemons web UI
> 
>
> Key: YARN-3861
> URL: https://issues.apache.org/jira/browse/YARN-3861
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Devaraj K
>Assignee: Devaraj K
>  Labels: oct16-easy
> Attachments: RM UI in Chrome-With Patch.png, RM UI in Chrome-Without 
> Patch.png, RM UI in IE-With Patch.png, RM UI in IE-Without Patch.png.png, 
> YARN-3861.patch, hadoop-fav.png
>
>
> Add fav icon image to all YARN & MR daemons web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3732) Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract classes

2016-10-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611093#comment-15611093
 ] 

Devaraj K commented on YARN-3732:
-

Thanks [~rohithsharma] for checking this, will update the patch for the trunk.

> Change NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as 
> abstract classes
> --
>
> Key: YARN-3732
> URL: https://issues.apache.org/jira/browse/YARN-3732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Attachments: YARN-3732-1.patch, YARN-3732.patch
>
>
> All the other protocol record classes are abstract classes. Change 
> NodeHeartbeatResponse.java and RegisterNodeManagerResponse.java as abstract 
> classes to make it consistent with other protocol record classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-10-20 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Affects Version/s: (was: 2.6.0)

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
> Environment: SW: CentOS 6.7, Hadoop 2.6.0
> Processors: Intel Xeon CPU E5-2699 v4 @2.20GHz
> Memory: 256GB 4 NUMA nodes
>Reporter: Olasoji
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5764) NUMA awareness support for launching containers

2016-10-20 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-5764:

Fix Version/s: (was: 2.6.0)

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 2.6.0
> Environment: SW: CentOS 6.7, Hadoop 2.6.0
> Processors: Intel Xeon CPU E5-2699 v4 @2.20GHz
> Memory: 256GB 4 NUMA nodes
>Reporter: Olasoji
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller

2016-02-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150015#comment-15150015
 ] 

Devaraj K commented on YARN-4547:
-

shouldn't it be duplicate instead of Done?

> LeafQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-11 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143175#comment-15143175
 ] 

Devaraj K commented on YARN-4624:
-

Thanks [~brahmareddy] for the updated patch.

{code:xml}
+  capacities.getMaxAMLimitPercentage() == 0
+ ? 0 : capacities.getMaxAMLimitPercentage())).
{code}

Don't we need to check for null instead of 0 here? Please verify the scenario 
with the patch changes.

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-2674-002.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2016-02-11 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143169#comment-15143169
 ] 

Devaraj K commented on YARN-2266:
-

Duplicate of YARN-3813

> Add an application timeout service in RM to kill applications which are not 
> getting resources
> -
>
> Key: YARN-2266
> URL: https://issues.apache.org/jira/browse/YARN-2266
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Ashutosh Jindal
>
> Currently , If an application is submitted to RM, the app keeps waiting until 
> the resources are allocated for AM. Such an application may be stuck till a 
> resource is allocated for AM, and this may be due to over utilization of 
> Queue or User limits etc. In a production cluster, some periodic running 
> applications may have lesser cluster share. So after waiting for some time, 
> if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4667) RM Admin CLI for refreshNodesResources throws NPE when nothing is configured

2016-02-08 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-4667:

Hadoop Flags: Reviewed

+1, lgtm, committing it.

> RM Admin CLI for refreshNodesResources throws NPE when nothing is configured
> 
>
> Key: YARN-4667
> URL: https://issues.apache.org/jira/browse/YARN-4667
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4667.v1.001.patch
>
>
> {quote}
> $ ./yarn rmadmin -refreshNodesResources
> 16/02/03 10:54:27 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8033
> refreshNodesResources: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshNodesResources(AdminService.java:655)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshNodesResources(ResourceManagerAdministrationProtocolPBServiceImpl.java:246)
>   at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:287)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-65) Reduce RM app memory footprint once app has completed

2016-02-04 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-65:
-

Assignee: (was: Devaraj K)

> Reduce RM app memory footprint once app has completed
> -
>
> Key: YARN-65
> URL: https://issues.apache.org/jira/browse/YARN-65
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 0.23.3
>Reporter: Jason Lowe
>
> The ResourceManager holds onto a configurable number of completed 
> applications (yarn.resource.max-completed-applications, defaults to 1), 
> and the memory footprint of these completed applications can be significant.  
> For example, the {{submissionContext}} in RMAppImpl contains references to 
> protocolbuffer objects and other items that probably aren't necessary to keep 
> around once the application has completed.  We could significantly reduce the 
> memory footprint of the RM by releasing objects that are no longer necessary 
> once an application completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-02 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129844#comment-15129844
 ] 

Devaraj K commented on YARN-4624:
-

Thanks [~brahmareddy] for reporting and providing patch. 

Would you mind adding a test for this as part of the patch? 


> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-02 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-4624:

Priority: Major  (was: Blocker)

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2016-02-01 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-4100:

Hadoop Flags: Reviewed

+1, lgtm, will commit it shortly.

> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch, 
> YARN-4100.v1.002.patch, YARN-4100.v1.003.patch, YARN-4100.v1.004.patch, 
> YARN-4100.v1.005.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2016-02-01 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125994#comment-15125994
 ] 

Devaraj K commented on YARN-4100:
-

Thanks [~Naganarasimha] for the updated patch with comments fix.

The latest patch looks good to me, I will commit it tomorrow unless there are 
no comments from others.


> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch, 
> YARN-4100.v1.002.patch, YARN-4100.v1.003.patch, YARN-4100.v1.004.patch, 
> YARN-4100.v1.005.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4411) RMAppAttemptImpl#createApplicationAttemptReport throws IllegalArgumentException

2016-01-29 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-4411:

Hadoop Flags: Reviewed
 Summary: RMAppAttemptImpl#createApplicationAttemptReport throws 
IllegalArgumentException  (was: ResourceManager IllegalArgumentException error)

+1, lgtm, will commit it shortly.

> RMAppAttemptImpl#createApplicationAttemptReport throws 
> IllegalArgumentException
> ---
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: Bibin A Chundatt
> Attachments: 0002-YARN-4411.patch, 0003-YARN-4411.patch, 
> YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2016-01-28 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121029#comment-15121029
 ] 

Devaraj K commented on YARN-4411:
-

Thanks [~bibinchundatt] for the explanation.

I don't see any problem even if we remove this condition, the test still 
passes. I see you are trying to do FINAL_SAVING state test explicitly but my 
argument is that there is no need to restrict createApplicationAttemptReport() 
invocation here when the state is FINAL_SAVING and can allow to check for all 
the states including FINAL_SAVING.

{code:xml}
+  if (!rmAppAttemptState.equals(RMAppAttemptState.FINAL_SAVING)) {
{code}

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: Bibin A Chundatt
> Attachments: 0002-YARN-4411.patch, YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2016-01-27 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120871#comment-15120871
 ] 

Devaraj K commented on YARN-4411:
-

Thanks [~bibinchundatt] for the updated patch.

- I don't understand why do we need this, Do you see any problem if we invoke 
attempt.createApplicationAttemptReport() when the state is other than 
RMAppAttemptState.FINAL_SAVING? I think we can we can create 
ApplicationAttemptReport irrespective of the state.
{code:xml}
+  if (!rmAppAttemptState.equals(RMAppAttemptState.FINAL_SAVING)) {
{code}

- Can you tell me when the application attempt state would be null? If it is 
not really needed we can remove this assertion and if you have decided to keep 
this statement then please add an assertion message.
{code:xml}
+  assertTrue(null != attemptreport.getYarnApplicationAttemptState());
{code}

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: Bibin A Chundatt
> Attachments: 0002-YARN-4411.patch, YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4587) IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport

2016-01-26 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118793#comment-15118793
 ] 

Devaraj K commented on YARN-4587:
-

[~bibinchundatt], Thanks for the quick response and updated patch, I see you 
are uploading patch in the both jira's. Please close any one as duplicate and 
continue with the other jira. Thanks

> IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport
> ---
>
> Key: YARN-4587
> URL: https://issues.apache.org/jira/browse/YARN-4587
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4587.patch
>
>
> {noformat}
> it status: -102
> 2016-01-13 13:35:42,281 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1452672118921_0002_04 State change from RUNNING to FINAL_SAVING
> 2016-01-13 13:35:42,286 ERROR org.apache.hadoop.yarn.server.webapp.AppBlock: 
> Failed to read the attempts of the application application_1452672118921_0002.
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:2073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttempts(ClientRMService.java:436)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:230)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:227)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:226)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:65)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)
> at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
> {noformat}
> At {{RMAppAttemptImpl#createApplicationAttemptReport}}
> {noformat}
>attemptReport = ApplicationAttemptReport.newInstance(this
>   .getAppAttemptId(), this.getHost(), this.getRpcPort(), this
>   .getTrackingUrl(), this.getOriginalTrackingUrl(), 
> this.getDiagnostics(),
>   YarnApplicationAttemptState.valueOf(this.getState().toString()),
>   amId, this.startTime, this.finishTime);
> {noformat}
> {{YarnApplicationAttemptState}} mismatch with {{RMAppAttemptState}} for 
> FINAL_SAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2016-01-26 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118778#comment-15118778
 ] 

Devaraj K commented on YARN-4100:
-

Thanks [~Naganarasimha] for the patch, Sorry for late here. The latest patch 
looks fine to me except these below points.

- Can you check to re-frame the above sentence something like "Administrators 
can configure the provider for the node labels by configuring this parameter in 
NM"?
{code:xml}
+in RM, Administrators can configure in NM the provider for the
 node labels by configuring this parameter.
{code}

- {{This would be helpfull}}, can you correct to helpful here?

- {{If user don’t specify “(exclusive=…)”, execlusive}}, please change 
execlusive to exclusive?

- Can you remove the spaces between package name and class name 
{{org.apache.hadoop.yarn.server.resourcemanager.nodelabels.   
RMNodeLabelsMappingProvider}}?

> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch, 
> YARN-4100.v1.002.patch, YARN-4100.v1.003.patch, YARN-4100.v1.004.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4587) IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport

2016-01-25 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114998#comment-15114998
 ] 

Devaraj K commented on YARN-4587:
-

I see as per the conversation in YARN-4411, they both agreed Bibin to provide a 
patch with test. Providing a patch with test in YARN-4411 or in this jjra would 
be ok for me.

> IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport
> ---
>
> Key: YARN-4587
> URL: https://issues.apache.org/jira/browse/YARN-4587
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4587.patch
>
>
> {noformat}
> it status: -102
> 2016-01-13 13:35:42,281 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1452672118921_0002_04 State change from RUNNING to FINAL_SAVING
> 2016-01-13 13:35:42,286 ERROR org.apache.hadoop.yarn.server.webapp.AppBlock: 
> Failed to read the attempts of the application application_1452672118921_0002.
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:2073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttempts(ClientRMService.java:436)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:230)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:227)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:226)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:65)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)
> at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
> {noformat}
> At {{RMAppAttemptImpl#createApplicationAttemptReport}}
> {noformat}
>attemptReport = ApplicationAttemptReport.newInstance(this
>   .getAppAttemptId(), this.getHost(), this.getRpcPort(), this
>   .getTrackingUrl(), this.getOriginalTrackingUrl(), 
> this.getDiagnostics(),
>   YarnApplicationAttemptState.valueOf(this.getState().toString()),
>   amId, this.startTime, this.finishTime);
> {noformat}
> {{YarnApplicationAttemptState}} mismatch with {{RMAppAttemptState}} for 
> FINAL_SAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4587) IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport

2016-01-25 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114911#comment-15114911
 ] 

Devaraj K commented on YARN-4587:
-

Thanks [~bibinchundatt] for the patch, changes look good to me except these 
from test.

1. Here I think we don't need to catch the Exception and make the test fail, 
instead we can leave the Exception without try/catch and let the test fail with 
that.

{code:xml}
} catch (Exception e) {
  Assert.fail("Exception not expected-->" + stateChecked);
}
{code}
Exception

2. Can we remove this condition here and test for all the states without if 
check?

{code:xml}
+if (rmAppAttemptState.equals(RMAppAttemptState.FINAL_SAVING)) {

{code}

3. I think there is some unnecessary code {+allocateApplicationAttempt();} 
and duplication checking, you can remove these.


> IllegalArgumentException in RMAppAttemptImpl#createApplicationAttemptReport
> ---
>
> Key: YARN-4587
> URL: https://issues.apache.org/jira/browse/YARN-4587
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4587.patch
>
>
> {noformat}
> it status: -102
> 2016-01-13 13:35:42,281 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1452672118921_0002_04 State change from RUNNING to FINAL_SAVING
> 2016-01-13 13:35:42,286 ERROR org.apache.hadoop.yarn.server.webapp.AppBlock: 
> Failed to read the attempts of the application application_1452672118921_0002.
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:2073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttempts(ClientRMService.java:436)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:230)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$2.run(AppBlock.java:227)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:226)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:65)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)
> at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
> {noformat}
> At {{RMAppAttemptImpl#createApplicationAttemptReport}}
> {noformat}
>attemptReport = ApplicationAttemptReport.newInstance(this
>   .getAppAttemptId(), this.getHost(), this.getRpcPort(), this
>   .getTrackingUrl(), this.getOriginalTrackingUrl(), 
> this.getDiagnostics(),
>   YarnApplicationAttemptState.valueOf(this.getState().toString()),
>   amId, this.startTime, this.finishTime);
> {noformat}
> {{YarnApplicationAttemptState}} mismatch with {{RMAppAttemptState}} for 
> FINAL_SAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-4480) Clean up some inappropriate imports

2015-12-20 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-4480:
---

Assignee: Kai Zheng

> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 2.8.0
>
> Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-10 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3964:

Hadoop Flags: Reviewed

+1, committing it shortly.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949941#comment-14949941
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~dian.fu] for the updated patch. 

Latest patch looks good to me. I will commit it tomorrow if there are no 
further comments/objections.


> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.016.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-08 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948192#comment-14948192
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~leftnoteasy] for review and confirmation, [~Naganarasimha] and 
[~sunilg] for reviews. 

Thanks [~dian.fu] for the patch, It mostly looks good to me except these minor 
comments.

1. Can you update the descriptions for the new configs added in yarn-default.xml

{code:xml}
+The class to use as the node labels fetcher by ResourceManager. It should
+extend org.apache.hadoop.yarn.server.resourcemanager.nodelabels.
+RMNodeLabelsMappingProvider.
{code}

Can you update the description like below,
'When node labels "yarn.node-labels.configuration-type" is
of type "delegated-centralized", Administrators can configure 
the class for fetching node labels by ResourceManager. Configured
class needs to extend

org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider.'

{code:xml}
+The interval to use to update node labels by ResourceManager.
{code}

Can we think of having it like 'This interval is used to update the node labels 
by ResourceManager.'? And also can we describe here that if the value is '-1' 
then there will not be any timer task gets created.

2. In TestRMDelegatedNodeLabelsUpdater.java, can we have an assertion in catch 
block to check the expected exception message?

   {code:xml}
} catch (Exception e) {
  // expected
}
   {code}

3. Can you file a Jira to update the documentation for this?


> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-21 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900290#comment-14900290
 ] 

Devaraj K commented on YARN-3964:
-

[~leftnoteasy], Sure, Thanks for your interest.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-20 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900227#comment-14900227
 ] 

Devaraj K commented on YARN-3964:
-

Thanks [~dian.fu] for the patch.

Patch has gone stale, Can you please update the patch? And also please take 
care of the above jenkins warnings in the updated patch.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-842) Resource Manager & Node Manager UI's doesn't work with IE

2015-08-30 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved YARN-842.

Resolution: Not A Problem

It is working fine in the latest, closing it now. Please reopen if you still 
see this issue. Thanks.

> Resource Manager & Node Manager UI's doesn't work with IE
> -
>
> Key: YARN-842
> URL: https://issues.apache.org/jira/browse/YARN-842
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Devaraj K
>
> {code:xml}
> Webpage error details
> User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
> SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
> Center PC 6.0)
> Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
> Message: 'JSON' is undefined
> Line: 41
> Char: 218
> Code: 0
> URI: http://10.18.40.24:8088/cluster/apps
> {code}
> RM & NM UI's are not working with IE and showing the above error for every 
> link on the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3953) Nodemanager is shutting down while executing application

2015-07-22 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636482#comment-14636482
 ] 

Devaraj K commented on YARN-3953:
-

[~hemenglong] It could be due to jars mismatch, have you changed the jars in 
the installation by any chance?

> Nodemanager is shutting down while executing application
> 
>
> Key: YARN-3953
> URL: https://issues.apache.org/jira/browse/YARN-3953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: hemenglong
>
> Container expired since it was unused
> cleanup failed for container container_1437442699625_0472_01_13 : 
> java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
> failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused 
> {code:xml}
> 2015-07-22 11:02:43,969 ERROR org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.util.FSDownload.createStatusCacheLoader(Lorg/apache/hadoop/conf/Configuration;)Lcom/google/common/cache/CacheLoader;
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleInitContainerResources(ResourceLocalizationService.java:445)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:135)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3953) Nodemanager is shutting down while executing application

2015-07-22 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3953:

Fix Version/s: (was: 2.5.0)

> Nodemanager is shutting down while executing application
> 
>
> Key: YARN-3953
> URL: https://issues.apache.org/jira/browse/YARN-3953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: hemenglong
>
> Container expired since it was unused
> cleanup failed for container container_1437442699625_0472_01_13 : 
> java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
> failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused 
> {code:xml}
> 2015-07-22 11:02:43,969 ERROR org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.util.FSDownload.createStatusCacheLoader(Lorg/apache/hadoop/conf/Configuration;)Lcom/google/common/cache/CacheLoader;
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleInitContainerResources(ResourceLocalizationService.java:445)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:135)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3953) Nodemanager is shutting down while executing application

2015-07-22 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3953:

Release Note:   (was: 2015-07-22 11:02:43,969 ERROR 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.util.FSDownload.createStatusCacheLoader(Lorg/apache/hadoop/conf/Configuration;)Lcom/google/common/cache/CacheLoader;
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleInitContainerResources(ResourceLocalizationService.java:445)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:398)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:135)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745))

> Nodemanager is shutting down while executing application
> 
>
> Key: YARN-3953
> URL: https://issues.apache.org/jira/browse/YARN-3953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: hemenglong
>
> Container expired since it was unused
> cleanup failed for container container_1437442699625_0472_01_13 : 
> java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
> failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3953) Nodemanager is shutting down while executing application

2015-07-22 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3953:

Description: 
Container expired since it was unused
cleanup failed for container container_1437442699625_0472_01_13 : 
java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused 

{code:xml}
2015-07-22 11:02:43,969 ERROR org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.util.FSDownload.createStatusCacheLoader(Lorg/apache/hadoop/conf/Configuration;)Lcom/google/common/cache/CacheLoader;
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleInitContainerResources(ResourceLocalizationService.java:445)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:398)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:135)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}


  was:
Container expired since it was unused
cleanup failed for container container_1437442699625_0472_01_13 : 
java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused 



> Nodemanager is shutting down while executing application
> 
>
> Key: YARN-3953
> URL: https://issues.apache.org/jira/browse/YARN-3953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: hemenglong
>
> Container expired since it was unused
> cleanup failed for container container_1437442699625_0472_01_13 : 
> java.net.ConnectException: Call From hadoop2/192.168.16.2 to hadoop5:59546 
> failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused 
> {code:xml}
> 2015-07-22 11:02:43,969 ERROR org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.util.FSDownload.createStatusCacheLoader(Lorg/apache/hadoop/conf/Configuration;)Lcom/google/common/cache/CacheLoader;
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleInitContainerResources(ResourceLocalizationService.java:445)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:398)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:135)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-07-22 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636421#comment-14636421
 ] 

Devaraj K commented on YARN-3212:
-

[~djp], can you update the patch for this Jira as YARN-3445 got committed, so 
that we can see this feature working.

> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-20 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633469#comment-14633469
 ] 

Devaraj K commented on YARN-3896:
-

Thanks [~hex108] for the updated patch. 

There are some comments about the test.
# Can we have a separate new test for this case instead of adding it with other 
existing test?
# Can you avoid mentioning the JIRA ID in the comment?
   {code:xml}+// Simulate scenario from YARN-3896:{code}
# There are multiple sleep statements with hard coded values in the newly added 
test code. Can you avoid these sleep with hard coded timeouts?
# And also If I try to run the test without source changes, test is failing 
with this message "node shouldn't be null". Can we check for REBOOTED state 
here?

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset
> ---
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
> YARN-3896.03.patch, YARN-3896.04.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2015-07-10 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621947#comment-14621947
 ] 

Devaraj K commented on YARN-3857:
-

Thanks [~mujunchao] for updated patch with test. 

Please take care of these comments also along with the [~zxu] comments fix.

1.I don't think adding this new method is required. Can we just use the 
ClientToAMTokenSecretManagerInRM#getMasterKey() to know whether the master key 
present or not?
{code:xml}
+  
+  @VisibleForTesting
+  public synchronized boolean hasMasterKey(
+ ApplicationAttemptId applicationAttemptID) {
+   return this.masterKeys.containsKey(applicationAttemptID);
+  }
{code}

2. I see there are some format issues in the patch w.r.t braces and indentation 
with spaces. Please go through the 'Making Changes' section in 
https://wiki.apache.org/hadoop/HowToContribute and configure your IDE 
according. It will be one time job and you don't have to worry next time for 
creating patches.

{code:xml}
+if(isSecurityEnabled)
+{
{code}
{code:xml}
+}
+else
+{
{code}

4. Remove unused imports in RMAppAttemptImpl.java.

> Memory leak in ResourceManager with SIMPLE mode
> ---
>
> Key: YARN-3857
> URL: https://issues.apache.org/jira/browse/YARN-3857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: mujunchao
>Assignee: mujunchao
>Priority: Critical
> Attachments: YARN-3857-1.patch, YARN-3857-2.patch, 
> hadoop-yarn-server-resourcemanager.patch
>
>
>  We register the ClientTokenMasterKey to avoid client may hold an invalid 
> ClientToken after RM restarts. In SIMPLE mode, we register 
> Pair ,  But we never remove it from HashMap, as 
> unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3409) Add constraint node labels

2015-07-09 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621811#comment-14621811
 ] 

Devaraj K commented on YARN-3409:
-

[~leftnoteasy], Thanks for the details.
Are you going to include the scenario of having service API to retrieve the 
labels in Resource Manager as discussed in YARN-3557 as part of this jira?

Can we have a separate jira to discuss/handle the centralized configuration 
using a service API to retrieve the labels for nodes in Resource Manager?

> Add constraint node labels
> --
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Constraints are orthogonal to partition, they’re describing attributes of 
> node’s hardware/software just for affinity. Some example of constraints:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3813) Support Application timeout feature in YARN.

2015-07-08 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619755#comment-14619755
 ] 

Devaraj K commented on YARN-3813:
-

Thanks [~nijel] and [~rohithsharma] for the design proposal.

{quote}
New auxillary service : RMAppTimeOutService
Responsibility is to track the running application. Simple logic

//if job is running and the time elapsed kill
if ((RMAppState == SUBMITTED/ACCEPTED/RUNNING) &&
&& (currentTime - app.getSubmitTime()) >= timeout
{quote}

How frequently are you going to check this condition for each application?

Can we have a monitor something like RMAppTimeOutMonitor which extends 
AbstractLivelinessMonitor, when the application gets submitted to RM then we 
can register the application with RMAppTimeOutMonitor using the user specified 
timeout. And when the timeout reaches, RMAppTimeOutMonitor can trigger an event 
to take an action further.

bq. Yes, having a separate TIMEOUT event and TIMEOUT state is good approach and 
other option. Initially we consider to have new state TIMEOUT which require 
very huge changes across all the modules.
I feel having a TIMEOUT state for RMAppImpl  would be proper here. When 
RMAppTimeOutMonitor triggers an event on timeout for an application, RMAppImpl 
can move the state to TIMEOUT state from any of the non-final states and during 
the transition it can handle stopping the running attempt and the containers. I 
don't see here that there will be so many changes required for achieving it.


> Support Application timeout feature in YARN. 
> -
>
> Key: YARN-3813
> URL: https://issues.apache.org/jira/browse/YARN-3813
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: nijel
> Attachments: YARN Application Timeout .pdf
>
>
> It will be useful to support Application Timeout in YARN. Some use cases are 
> not worried about the output of the applications if the application is not 
> completed in a specific time. 
> *Background:*
> The requirement is to show the CDR statistics of last few  minutes, say for 
> every 5 minutes. The same Job will run continuously with different dataset.
> So one job will be started in every 5 minutes. The estimate time for this 
> task is 2 minutes or lesser time. 
> If the application is not completing in the given time the output is not 
> useful.
> *Proposal*
> So idea is to support application timeout, with which timeout parameter is 
> given while submitting the job. 
> Here, user is expecting to finish (complete or kill) the application in the 
> given time.
> One option for us is to move this logic to Application client (who submit the 
> job). 
> But it will be nice if it can be generic logic and can make more robust.
> Kindly provide your suggestions/opinion on this feature. If it sounds good, i 
> will update the design doc and prototype patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 >

1 - 100 of 488 matches

Mail list logo