[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183823#comment-17183823 ]
Siddharth Ahuja edited comment on YARN-1806 at 8/25/20, 7:45 AM: ----------------------------------------------------------------- Testing done on the platform: +* 1. Test Jstack collection for non-RUNNING app:*+ a. Ensure there is a YARN application that is already present from a previous run and is NOT currently RUNNING. b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the non-running app. Jstack button should be visible. c. Click on Jstack button. Error message should be displayed -> "Jstack cannot be collected for an application that is not running." because it is not possible to collect Jstack for a non-running application as it has no running containers. +* 2. Test for Jstack collection for a RUNNING app:*+ a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Repeat step e. from above for another container. A thread dump should be captured and visible in the panel containing the stdout logs. g. Go back and repeat step e. for the same container that was first selected. Notice that 2 thread dumps are now present in the stdout logs with the latest thread dump shown later in the stdout logs. +* 3. Error checking - Jstack fetch attempt for a container that is not running due to killed application:*+ a. Kill the currently RUNNING application using: yarn application -kill <running_app_id_from_above>, b. Now try selecting a container from the drop-down containing containers listing. Jstack collection is not possible and hence the error is displayed -> "Jstack fetch failed for container: <absent_container_id> due to: “Trying to signal an absent container <absent_container_id>”. * 4. Error checking - Jstack fetch attempt for a container while RMs/NMs not available:* a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Stop the ResourceManager/s. g. Select a different container from the drop-down list. An error should be displayed -> "Jstack fetch failed for container: <selected_container_id> due to: “Error: Not able to connect to YARN!”". h. Restart the ResourceManager/s. i. Repeat steps a. until e. j. Stop NodeManager/s. k. Select a different container from the drop-down list. An error should be displayed -> "Logs fetch failed for container: <selected_container_id> due to: “Error: Not able to connect to YARN!”". l. Start back the NodeManager/s. *+ 5. Check latest (and the ONLY) running app attempt id is displayed:+* a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Now, run the following command to terminate the currently running AM: yarn container -signal <am_container_id> GRACEFUL_SHUTDOWN e. Run the following command to check the currently running app_attempt_id: yarn applicationattempt -list application_1598288770104_0003 f. Reload the UI. Jstack button should still be selected and a drop-down for the attempts should be present in the Jstack panel. g. Open the drop-down. Notice that the new application attempt id (second attempt id for the app) should now be displayed in the drop-down. This should be the only option other than "None". h. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel. Notice the containers' attempt id. They should be _02 i.e. not _01 anymore. i. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. +* 6. Test for Jstack user authorization on a secured cluster:*+ a. Ensure a secure cluster with Kerberos (and preferably SSL/TLS) is used for testing. b. Ensure that a user that is not the owner of this application and neither present in the yarn.admin.acl list is currently logged into the UI i.e. the Kerberos ticket for a non-admin user is used for SPNEGO auth. c. Ensure there is a YARN application that is currently in RUNNING state (submitted by a different user to the one that is logged into the UI and is also not present in the yarn.admin.acl list), d. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. e. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, f. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, g. Select a container from this drop-down. h. Now try selecting a container from the drop-down containing containers listing. Jstack collection is not possible and hence an error is displayed. i. Add the user that is current logged in to yarn.admin.acl list and restart YARN service. j. Visit RM UI again, click on the previously running app from a different user, select the running app attempt from the drop down, select a running container and try fetching jstack by selecting a container. Jstack attempt should now be successful. was (Author: sahuja): Testing done on the platform: 1. Test Jstack collection for non-RUNNING app: a. Ensure there is a YARN application that is already present from a previous run and is NOT currently RUNNING. b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the non-running app. Jstack button should be visible. c. Click on Jstack button. Error message should be displayed -> "Jstack cannot be collected for an application that is not running." because it is not possible to collect Jstack for a non-running application as it has no running containers. 2. Test for Jstack collection for a RUNNING app: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Repeat step e. from above for another container. A thread dump should be captured and visible in the panel containing the stdout logs. g. Go back and repeat step e. for the same container that was first selected. Notice that 2 thread dumps are now present in the stdout logs with the latest thread dump shown later in the stdout logs. 3. Error checking - Jstack fetch attempt for a container that is not running due to killed application: a. Kill the currently RUNNING application using: yarn application -kill <running_app_id_from_above>, b. Now try selecting a container from the drop-down containing containers listing. Jstack collection is not possible and hence the error is displayed -> "Jstack fetch failed for container: <absent_container_id> due to: “Trying to signal an absent container <absent_container_id>”. 4. Error checking - Jstack fetch attempt for a container while RMs/NMs not available: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Stop the ResourceManager/s. g. Select a different container from the drop-down list. An error should be displayed -> "Jstack fetch failed for container: <selected_container_id> due to: “Error: Not able to connect to YARN!”". h. Restart the ResourceManager/s. i. Repeat steps a. until e. j. Stop NodeManager/s. k. Select a different container from the drop-down list. An error should be displayed -> "Logs fetch failed for container: <selected_container_id> due to: “Error: Not able to connect to YARN!”". l. Start back the NodeManager/s. 5. Check latest (and the ONLY) running app attempt id is displayed: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, d. Now, run the following command to terminate the currently running AM: yarn container -signal <am_container_id> GRACEFUL_SHUTDOWN e. Run the following command to check the currently running app_attempt_id: yarn applicationattempt -list application_1598288770104_0003 f. Reload the UI. Jstack button should still be selected and a drop-down for the attempts should be present in the Jstack panel. g. Open the drop-down. Notice that the new application attempt id (second attempt id for the app) should now be displayed in the drop-down. This should be the only option other than "None". h. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel. Notice the containers' attempt id. They should be _02 i.e. not _01 anymore. i. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. 6. Test for Jstack user authorization on a secured cluster (COULD NOT TEST because of unsecure cluster!): a. Ensure a secure cluster with Kerberos (and preferably SSL/TLS) is used for testing. b. Ensure that a user that is not the owner of this application and neither present in the yarn.admin.acl list is currently logged into the UI i.e. the Kerberos ticket for a non-admin user is used for SPNEGO auth. c. Ensure there is a YARN application that is currently in RUNNING state (submitted by a different user to the one that is logged into the UI and is also not present in the yarn.admin.acl list), d. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. e. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "<currently_running_app_attempt_for_the_selected_running_app>" should be shown, f. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, g. Select a container from this drop-down. h. Now try selecting a container from the drop-down containing containers listing. Jstack collection is not possible and hence an error is displayed. i. Add the user that is current logged in to yarn.admin.acl list and restart YARN service. j. Visit RM UI again, click on the previously running app from a different user, select the running app attempt from the drop down, select a running container and try fetching jstack by selecting a container. Jstack attempt should now be successful. > webUI update to allow end users to request thread dump > ------------------------------------------------------ > > Key: YARN-1806 > URL: https://issues.apache.org/jira/browse/YARN-1806 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Ming Ma > Assignee: Siddharth Ahuja > Priority: Major > > Both individual container gage and containers page will support this. After > end user clicks on the request link, they can follow to get to stdout page > for the thread dump content. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org