[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183823#comment-17183823
 ] 

Siddharth Ahuja edited comment on YARN-1806 at 8/25/20, 7:47 AM:
-----------------------------------------------------------------

Testing done on the platform:

*+1. Test Jstack collection for non-RUNNING app:+*

                a. Ensure there is a YARN application that is already present 
from a previous run and is NOT currently RUNNING.
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the non-running app. Jstack button should be visible.
                c. Click on Jstack button.  Error message should be displayed 
-> "Jstack cannot be collected for an application that is not running." because 
it is not possible to collect Jstack for a non-running application as it has no 
running containers.

*+2. Test for Jstack collection for a RUNNING app:+*
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
                f. Repeat step e. from above for another container. A thread 
dump should be captured and visible in the panel containing the stdout logs.
                g. Go back and repeat step e. for the same container that was 
first selected. Notice that 2 thread dumps are now present in the stdout logs 
with the latest thread dump shown later in the stdout logs.

*+3. Error checking - Jstack fetch attempt for a container that is not running 
due to killed application:+*

                a. Kill the currently RUNNING application using: yarn 
application -kill <running_app_id_from_above>,
                b. Now try selecting a container from the drop-down containing 
containers listing. Jstack collection is not possible and hence the error is 
displayed -> "Jstack fetch failed for container: <absent_container_id> due to: 
“Trying to signal an absent container <absent_container_id>”.

*+4. Error checking - Jstack fetch attempt for a container while RMs/NMs not 
available:+*
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
                f. Stop the ResourceManager/s.
                g. Select a different container from the drop-down list. An 
error should be displayed -> "Jstack fetch failed for container: 
<selected_container_id> due to: “Error: Not able to connect to YARN!”".
                h. Restart the ResourceManager/s.
                i. Repeat steps a. until e.
                j. Stop NodeManager/s.
                k. Select a different container from the drop-down list. An 
error should be displayed -> "Logs fetch failed for container: 
<selected_container_id> due to: “Error: Not able to connect to YARN!”".
                l. Start back the NodeManager/s.

*+5. Check latest (and the ONLY) running app attempt id is displayed:+*
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Now, run the following command to terminate the currently 
running AM:
                        
                        yarn container -signal <am_container_id> 
GRACEFUL_SHUTDOWN
                        
                e. Run the following command to check the currently running 
app_attempt_id:

                yarn applicationattempt -list application_1598288770104_0003
                
                f. Reload the UI. Jstack button should still be selected and a 
drop-down for the attempts should be present in the Jstack panel.
                g. Open the drop-down. Notice that the new application attempt 
id (second attempt id for the app) should now be displayed in the drop-down. 
This should be the only option other than "None".
                h. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel. Notice the containers' attempt id. They 
should be _02 i.e. not _01 anymore.
                i. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
        
*+6. Test for Jstack user authorization on a secured cluster:+*
                a. Ensure a secure cluster with Kerberos (and preferably 
SSL/TLS) is used for testing.
                b. Ensure that a user that is not the owner of this application 
and neither present in the yarn.admin.acl list is currently logged into the UI 
i.e. the Kerberos ticket for a non-admin user is used for SPNEGO auth.
                c. Ensure there is a YARN application that is currently in 
RUNNING state (submitted by a different user to the one that is logged into the 
UI and is also not present in the yarn.admin.acl list),
                d. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                e. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                f. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                g. Select a container from this drop-down. 
                h. Now try selecting a container from the drop-down containing 
containers listing. Jstack collection is not possible and hence an error is 
displayed.
                i. Add the user that is current logged in to yarn.admin.acl 
list and restart YARN service.
                j. Visit RM UI again, click on the previously running app from 
a different user, select the running app attempt from the drop down, select a 
running container and try fetching jstack by selecting a container. Jstack 
attempt should now be successful.



was (Author: sahuja):
Testing done on the platform:

+*      1. Test Jstack collection for non-RUNNING app:*+

                a. Ensure there is a YARN application that is already present 
from a previous run and is NOT currently RUNNING.
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the non-running app. Jstack button should be visible.
                c. Click on Jstack button.  Error message should be displayed 
-> "Jstack cannot be collected for an application that is not running." because 
it is not possible to collect Jstack for a non-running application as it has no 
running containers.

+*      2. Test for Jstack collection for a RUNNING app:*+
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
                f. Repeat step e. from above for another container. A thread 
dump should be captured and visible in the panel containing the stdout logs.
                g. Go back and repeat step e. for the same container that was 
first selected. Notice that 2 thread dumps are now present in the stdout logs 
with the latest thread dump shown later in the stdout logs.

+*      3. Error checking - Jstack fetch attempt for a container that is not 
running due to killed application:*+

                a. Kill the currently RUNNING application using: yarn 
application -kill <running_app_id_from_above>,
                b. Now try selecting a container from the drop-down containing 
containers listing. Jstack collection is not possible and hence the error is 
displayed -> "Jstack fetch failed for container: <absent_container_id> due to: 
“Trying to signal an absent container <absent_container_id>”.

*       4. Error checking - Jstack fetch attempt for a container while RMs/NMs 
not available:*
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
                f. Stop the ResourceManager/s.
                g. Select a different container from the drop-down list. An 
error should be displayed -> "Jstack fetch failed for container: 
<selected_container_id> due to: “Error: Not able to connect to YARN!”".
                h. Restart the ResourceManager/s.
                i. Repeat steps a. until e.
                j. Stop NodeManager/s.
                k. Select a different container from the drop-down list. An 
error should be displayed -> "Logs fetch failed for container: 
<selected_container_id> due to: “Error: Not able to connect to YARN!”".
                l. Start back the NodeManager/s.

*+      5. Check latest (and the ONLY) running app attempt id is displayed:+*
                a. Ensure there is a YARN application that is currently in 
RUNNING state,
                b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                d. Now, run the following command to terminate the currently 
running AM:
                        
                        yarn container -signal <am_container_id> 
GRACEFUL_SHUTDOWN
                        
                e. Run the following command to check the currently running 
app_attempt_id:

                yarn applicationattempt -list application_1598288770104_0003
                
                f. Reload the UI. Jstack button should still be selected and a 
drop-down for the attempts should be present in the Jstack panel.
                g. Open the drop-down. Notice that the new application attempt 
id (second attempt id for the app) should now be displayed in the drop-down. 
This should be the only option other than "None".
                h. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel. Notice the containers' attempt id. They 
should be _02 i.e. not _01 anymore.
                i. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
        
+*      6. Test for Jstack user authorization on a secured cluster:*+
                a. Ensure a secure cluster with Kerberos (and preferably 
SSL/TLS) is used for testing.
                b. Ensure that a user that is not the owner of this application 
and neither present in the yarn.admin.acl list is currently logged into the UI 
i.e. the Kerberos ticket for a non-admin user is used for SPNEGO auth.
                c. Ensure there is a YARN application that is currently in 
RUNNING state (submitted by a different user to the one that is logged into the 
UI and is also not present in the yarn.admin.acl list),
                d. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
                e. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"<currently_running_app_attempt_for_the_selected_running_app>" should be shown,
                f. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
                g. Select a container from this drop-down. 
                h. Now try selecting a container from the drop-down containing 
containers listing. Jstack collection is not possible and hence an error is 
displayed.
                i. Add the user that is current logged in to yarn.admin.acl 
list and restart YARN service.
                j. Visit RM UI again, click on the previously running app from 
a different user, select the running app attempt from the drop down, select a 
running container and try fetching jstack by selecting a container. Jstack 
attempt should now be successful.


> webUI update to allow end users to request thread dump
> ------------------------------------------------------
>
>                 Key: YARN-1806
>                 URL: https://issues.apache.org/jira/browse/YARN-1806
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Ming Ma
>            Assignee: Siddharth Ahuja
>            Priority: Major
>
> Both individual container gage and containers page will support this. After 
> end user clicks on the request link, they can follow to get to stdout page 
> for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to