[jira] [Assigned] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation

2020-11-19 Thread D M Murali Krishna Reddy (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D M Murali Krishna Reddy reassigned YARN-10487:
---

Assignee: D M Murali Krishna Reddy

> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation
> ---
>
> Key: YARN-10487
> URL: https://issues.apache.org/jira/browse/YARN-10487
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10487.001.patch
>
>
> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9883) Reshape SchedulerHealth class

2020-11-19 Thread D M Murali Krishna Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235959#comment-17235959
 ] 

D M Murali Krishna Reddy commented on YARN-9883:


[~BilwaST] I have uploaded the patch, can you review the patch

> Reshape SchedulerHealth class
> -
>
> Key: YARN-9883
> URL: https://issues.apache.org/jira/browse/YARN-9883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-9883.001.patch
>
>
> The {{SchedulerHealth}} class has some flaws, for example:
> - It has no javadoc at all
> - All its objects are package-private: they should be private
> - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they 
> are more efficient in storing Enums
> - schedulerHealthDetails only stores the last operation, its name should 
> reflect that (just like lastSchedulerRunDetails)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9883) Reshape SchedulerHealth class

2020-11-19 Thread D M Murali Krishna Reddy (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D M Murali Krishna Reddy updated YARN-9883:
---
Attachment: YARN-9883.001.patch

> Reshape SchedulerHealth class
> -
>
> Key: YARN-9883
> URL: https://issues.apache.org/jira/browse/YARN-9883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-9883.001.patch
>
>
> The {{SchedulerHealth}} class has some flaws, for example:
> - It has no javadoc at all
> - All its objects are package-private: they should be private
> - The internal maps should be (Concurrent) EnumMaps instead of HashMaps: they 
> are more efficient in storing Enums
> - schedulerHealthDetails only stores the last operation, its name should 
> reflect that (just like lastSchedulerRunDetails)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235931#comment-17235931
 ] 

angerszhu edited comment on YARN-10495 at 11/20/20, 6:49 AM:
-

[~ebadger]

Double check, this can solve our problem. we add rpath to 
$HADOOP_HOME/lib/native


was (Author: angerszhuuu):
[~ebadger]

Double check, this can solve our problem.

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235931#comment-17235931
 ] 

angerszhu edited comment on YARN-10495 at 11/20/20, 6:45 AM:
-

[~ebadger]

Double check, this can solve our problem.


was (Author: angerszhuuu):
[~ebadger]

Double check, this can solve our prblem.

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235931#comment-17235931
 ] 

angerszhu commented on YARN-10495:
--

[~ebadger]

Double check, this can solve our prblem.

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235890#comment-17235890
 ] 

angerszhu commented on YARN-10495:
--

[~ebadger]

I have tested this patch in our env, and I will confirm again whether there are 
other problems.

 

Also, I  am  not familiar with hadoop qa 's result, seems all UT passed?

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-11-19 Thread Drew Merrill (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235876#comment-17235876
 ] 

Drew Merrill commented on YARN-10427:
-

_*Anyone?*_ *I'd really appreciate a response from someone on this.* _*A 
developer? A fellow user? A computer?*_

Have I not included enough info or the right info needed to investigate this? 
If so, please let me know!

*At the very least, can someone else please _{color:#FF}confirm{color}_ 
_{color:#FF}that the issue with duplicate Job IDs is reproducible?{color}_ 
{color:#FF}{color:#172b4d}It's frustrating and stressful not knowing if the 
problem is due to something that _I'm doing wrong_ or if it's a bug in 
Hadoop.{color}{color}*

*{color:#FF}{color:#172b4d}There's either a teachable moment here where I 
can learn what I'm doing wrong or else an opportunity to identify and fix a bug 
in Hadoop. Both are good outcomes!{color}{color}*

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Priority: Major
> Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10492) deadlock in rm

2020-11-19 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235833#comment-17235833
 ] 

Wangda Tan commented on YARN-10492:
---

That will be helpful, thanks Jufeng!

> deadlock in rm 
> ---
>
> Key: YARN-10492
> URL: https://issues.apache.org/jira/browse/YARN-10492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: brick yang
>Priority: Critical
>  Labels: 3.1.1
>
> version: HDP-3.1.5.0-152   hadoop3.1
> capacity scheduler
> yarn sometimes not change to active
> we found that jstack dump has deadlocked:
> "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
> tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
> [0x7fee09633000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
>  - waiting to lock <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  
>  
>  
>  
>  
>  
> "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
> tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0003c0d0d6c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>  - locked <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at 

[jira] [Commented] (YARN-10492) deadlock in rm

2020-11-19 Thread jufeng li (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235828#comment-17235828
 ] 

jufeng li commented on YARN-10492:
--

We are using the same version(HDP-3.1.5.0-152   hadoop3.1),and we got the same 
issue,I solved this issue,do you want patch?

> deadlock in rm 
> ---
>
> Key: YARN-10492
> URL: https://issues.apache.org/jira/browse/YARN-10492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: brick yang
>Priority: Critical
>  Labels: 3.1.1
>
> version: HDP-3.1.5.0-152   hadoop3.1
> capacity scheduler
> yarn sometimes not change to active
> we found that jstack dump has deadlocked:
> "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
> tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
> [0x7fee09633000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
>  - waiting to lock <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  
>  
>  
>  
>  
>  
> "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
> tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0003c0d0d6c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>  - locked <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at 

[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-11-19 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235683#comment-17235683
 ] 

Wangda Tan commented on YARN-10496:
---

Worked with [~bteke] for a design doc, see the linked doc. Would like to see 
more comments from the community:

cc: [~epayne], [~jhung], [~tangzhankun], [~bilwa_st]

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235658#comment-17235658
 ] 

Hadoop QA commented on YARN-10495:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
9s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
2s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  8m 
20s{color} | 
[/branch-mvninstall-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-mvninstall-root.txt]
 | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
10s{color} | 
[/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt]
 | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
20s{color} | 
[/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt]
 | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} |  | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 10m 
32s{color} | 
[/branch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-shadedclient.txt]
 | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
20s{color} | 
[/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt]
 | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
25s{color} | 
[/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/318/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt]
 | {color:red} hadoop-yarn-server-nodemanager in trunk failed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
23s{color} | 

[jira] [Created] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-11-19 Thread Wangda Tan (Jira)
Wangda Tan created YARN-10496:
-

 Summary: [Umbrella] Support Flexible Auto Queue Creation in 
Capacity Scheduler
 Key: YARN-10496
 URL: https://issues.apache.org/jira/browse/YARN-10496
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacity scheduler
Reporter: Wangda Tan


CapacityScheduler today doesn’t support an auto queue creation which is 
flexible enough. The current constraints: 
 * Only leaf queues can be auto-created
 * A parent can only have either static queues or dynamic ones. This causes 
multiple constraints. For example:

 * It isn’t possible to have a VIP user like Alice with a static queue 
root.user.alice with 50% capacity while the other user queues (under root.user) 
are created dynamically and they share the remaining 50% of resources.

 
 * In comparison, FairScheduler allows the following scenarios, Capacity 
Scheduler doesn’t:
 ** This implies that there is no possibility to have both dynamically created 
and static queues at the same time under root
 * A new queue needs to be created under an existing parent, while the parent 
already has static queues
 * Nested queue mapping policy, like in the following example: 

|

|
 * Here two levels of queues may need to be created 

If an application belongs to user _alice_ (who has the primary_group of 
_engineering_), the scheduler checks whether _root.engineering_ exists, if it 
doesn’t,  it’ll be created. Then scheduler checks whether 
_root.engineering.alice_ exists, and creates it if it doesn't.

 

When we try to move users from FairScheduler to CapacityScheduler, we face 
feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235627#comment-17235627
 ] 

Eric Badger commented on YARN-10495:


Also, I've added you as a contributor in Hadoop Common, HDFS, Map/Reduce, and 
YARN. So you will now be able to assign JIRAs to yourself (as I've already done 
for you on this JIRA).

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned YARN-10495:
--

Assignee: angerszhu

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235625#comment-17235625
 ] 

Eric Badger commented on YARN-10495:


[~angerszhuuu], I imagine the {{-Dbundle.openssl}} adds the libcrypto.so 
library to {{../lib/native}} of the build that is created? I don't have 
experience with this flag. Also, have you tested this out in your environment?

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10482) Capacity Scheduler seems locked,RM cannot submit any new job,and change active RM manually return to normal

2020-11-19 Thread Wanqiang Ji (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235542#comment-17235542
 ] 

Wanqiang Ji commented on YARN-10482:


I discussed with [~Jufeng] offline many days ago, and found it seems caused by 
JUC bug, which has be fixed in JDK9.  
[https://bugs.openjdk.java.net/browse/JDK-8134855] 

Maybe YARN-10492 encountered the same problem. cc: [~wangda], [~snemeth]

> Capacity Scheduler seems locked,RM cannot submit any new job,and change 
> active RM  manually return to normal
> 
>
> Key: YARN-10482
> URL: https://issues.apache.org/jira/browse/YARN-10482
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager, 
> RM
>Affects Versions: 3.1.1
>Reporter: jufeng li
>Priority: Blocker
> Attachments: RM_normal_state.stack, RM_unnormal_state.stack
>
>
> Capacity Scheduler seems locked,RM cannot submit any new job, and change 
> active RM manually return to normal。its a serious bug!I check the stack 
> log,and found some info about *ReentrantReadWriteLock。*Can  anyone can solve 
> this issue?I uploaded the stack when RM normally and unnormally。RM  hangs 
> forever until I restart RM or change the active RM manually!!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8737) Race condition in ParentQueue when reinitializing and sorting child queues in the meanwhile

2020-11-19 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235386#comment-17235386
 ] 

Benjamin Teke commented on YARN-8737:
-

The test issue seems to be unrelated, so +1 (non-binding) on my part.

> Race condition in ParentQueue when reinitializing and sorting child queues in 
> the meanwhile
> ---
>
> Key: YARN-8737
> URL: https://issues.apache.org/jira/browse/YARN-8737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0, 2.9.3, 3.2.2, 3.1.4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8737.001.patch
>
>
> Administrator raised a update for queues through REST API, in RM parent queue 
> is refreshing child queues through calling ParentQueue#reinitialize, 
> meanwhile, async-schedule threads is sorting child queues when calling 
> ParentQueue#sortAndGetChildrenAllocationIterator. Race condition may happen 
> and throw exception as follow because TimSort does not handle the concurrent 
> modification of objects it is sorting:
> {noformat}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>         at java.util.TimSort.mergeHi(TimSort.java:899)
>         at java.util.TimSort.mergeAt(TimSort.java:516)
>         at java.util.TimSort.mergeCollapse(TimSort.java:441)
>         at java.util.TimSort.sort(TimSort.java:245)
>         at java.util.Arrays.sort(Arrays.java:1512)
>         at java.util.ArrayList.sort(ArrayList.java:1454)
>         at java.util.Collections.sort(Collections.java:175)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:291)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:804)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:817)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:636)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2494)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2431)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:2588)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:2676)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:927)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:962)
> {noformat}
> I think we can add read-lock for 
> ParentQueue#sortAndGetChildrenAllocationIterator to solve this problem, the 
> write-lock will be hold when updating child queues in 
> ParentQueue#reinitialize.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2020-11-19 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated YARN-3585:
-
Comment: was deleted

(was: Leveldb will have problems using logger)

> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2020-11-19 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235357#comment-17235357
 ] 

dzcxzl commented on YARN-3585:
--

Leveldb will have problems using logger

> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated YARN-10495:
-
Description: 
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*

We use a  internal custom dynamic link library environment 
/usr/lib/x86_64-linux-gnu

and we build hadoop with parameter as blow
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
is libcrypto)
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
libcrypto)
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r--  1 root root  298 2��   7  2019 libc.so
{code}
 We build container-executor with 

The  libcrypto.so 's version is not same case error when we start nodemanager

 
{code:java}
.. 3 more Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
object file: No such file or directory at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
 ... 4 more Caused by: ExitCodeException exitCode=127: 
/home/hadoop/hadoop/bin/container-executor: error while loading shared 
libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or 
directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
org.apache.hadoop.util.Shell.run(Shell.java:901) at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
 ... 6 more 
{code}
 

We should make RPATH of container-executor configurable to solve this problem 

  was:
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

We use a  internal custom dynamic link library environment 
/usr/lib/x86_64-linux-gnu

and we build hadoop with parameter as blow
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
is libcrypto)
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
libcrypto)
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  

[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated YARN-10495:
-
Description: 
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

We use a  internal custom dynamic link library environment 
/usr/lib/x86_64-linux-gnu

and we build hadoop with parameter as blow
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
is libcrypto)
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
libcrypto)
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r--  1 root root  298 2��   7  2019 libc.so
{code}
 We build container-executor with 

The  libcrypto.so 's version is not same case error when we start nodemanager

 
{code:java}
.. 3 more Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
object file: No such file or directory at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
 ... 4 more Caused by: ExitCodeException exitCode=127: 
/home/hadoop/hadoop/bin/container-executor: error while loading shared 
libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or 
directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
org.apache.hadoop.util.Shell.run(Shell.java:901) at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
 ... 6 more 
{code}
 

We should make RPATH of container-executor configurable to solve this problem 

  was:
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

we build hadoop with 
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine shared lib library pats /usr/lib/x86_64-linux-gun(where 
is libcrypto)
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager shared lib library pats /usr/lib/x86_64-linux-gun(where is 
libcrypto)
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 

[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated YARN-10495:
-
Description: 
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

we build hadoop with 
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine shared lib library pats /usr/lib/x86_64-linux-gun(where 
is libcrypto)
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager shared lib library pats /usr/lib/x86_64-linux-gun(where is 
libcrypto)
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r--  1 root root  298 2��   7  2019 libc.so
{code}
 We build container-executor with 

The  libcrypto.so 's version is not same case error when we start nodemanager

 
{code:java}
.. 3 more Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
object file: No such file or directory at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
 ... 4 more Caused by: ExitCodeException exitCode=127: 
/home/hadoop/hadoop/bin/container-executor: error while loading shared 
libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or 
directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
org.apache.hadoop.util.Shell.run(Shell.java:901) at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
 ... 6 more 
{code}
 

We should make RPATH of container-executor configurable to solve this problem 

  was:
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

we build hadoop with
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine /usr/lib/x86_64-linux-gun
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager   /usr/lib/x86_64-linux-gun
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r--  1 root root  298 2��   7  2019 libc.so
{code}
 

The  libcrypto.so 's version is not same case error when we start nodemanager

 
{code:java}
.. 3 more 

[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated YARN-10495:
-
Attachment: YARN-10495.001.patch

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*
> we build hadoop with
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager   /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235311#comment-17235311
 ] 

angerszhu commented on YARN-10495:
--

ping [~ebadger]  [~eyang]

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Priority: Major
> Attachments: YARN-10495.001.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*
> we build hadoop with
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager   /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters

2020-11-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235310#comment-17235310
 ] 

Hadoop QA commented on YARN-10031:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
37s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
3s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
24s{color} | 
[/branch-mvninstall-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvninstall-root.txt]
 | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
24s{color} | 
[/branch-compile-root-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-compile-root-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt]
 | {color:red} root in trunk failed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
23s{color} | 
[/branch-compile-root-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-compile-root-jdkPrivateBuild-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10.txt]
 | {color:red} root in trunk failed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | 
[/buildtool-branch-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/buildtool-branch-checkstyle-root.txt]
 | {color:orange} The patch fails to run checkstyle in root {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | 
[/branch-mvnsite-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt]
 | {color:red} hadoop-mapreduce-client-hs in trunk failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
28s{color} | 
[/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt]
 | {color:red} hadoop-yarn-common in trunk failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
24s{color} | 
[/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt]
 | {color:red} hadoop-yarn-server-common in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
2m  2s{color} |  | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | 
[/branch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/317/artifact/out/branch-javadoc-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdkUbuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1.txt]
 | {color:red} hadoop-mapreduce-client-hs in trunk failed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
24s{color} | 

[jira] [Updated] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated YARN-10495:
-
Description: 
In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on crypto 
to container-executor, we meet a case that in our jenkins machine, we have 
libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we don't 
have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*

we build hadoop with
{code:java}
 -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
{code}
 

Under jenkins machine /usr/lib/x86_64-linux-gun
{code:java}
-rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
-rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
-rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> libcrypto.so.1.0.0
-rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
{code}
 

Under nodemanager   /usr/lib/x86_64-linux-gun
{code:java}
-rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
-rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> libcrypto.so.1.1
-rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
-rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
/lib/x86_64-linux-gnu/libcrypt.so.1
-rw-r--r--  1 root root  298 2��   7  2019 libc.so
{code}
 

The  libcrypto.so 's version is not same case error when we start nodemanager

 
{code:java}
.. 3 more Caused by: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
object file: No such file or directory at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
 ... 4 more Caused by: ExitCodeException exitCode=127: 
/home/hadoop/hadoop/bin/container-executor: error while loading shared 
libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or 
directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
org.apache.hadoop.util.Shell.run(Shell.java:901) at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
 ... 6 more 
{code}
 

We should make RPATH of container-executor configurable to solve this problem 

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Priority: Major
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1*
> we build hadoop with
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager   /usr/lib/x86_64-linux-gun
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  

[jira] [Created] (YARN-10495) make the rpath of container-executor configurable

2020-11-19 Thread angerszhu (Jira)
angerszhu created YARN-10495:


 Summary: make the rpath of container-executor configurable
 Key: YARN-10495
 URL: https://issues.apache.org/jira/browse/YARN-10495
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters

2020-11-19 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235284#comment-17235284
 ] 

Andras Gyori commented on YARN-10031:
-

Fixed the checkstyle errors and javadoc issues.

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, 
> YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, 
> YARN-10031.005.patch
>
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10031) Create a general purpose log request with additional query parameters

2020-11-19 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10031:

Attachment: YARN-10031.005.patch

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, 
> YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, 
> YARN-10031.005.patch
>
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org