[jira] [Created] (YARN-9707) [UI2] App Attempt state data is missing
Yesha Vora created YARN-9707: Summary: [UI2] App Attempt state data is missing Key: YARN-9707 URL: https://issues.apache.org/jira/browse/YARN-9707 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Steps: 1) Launch a Dshell application or Yarn service application. 2) Go to app attempt page Grid view. State column shows N/A. Yarn UI1 shows app attempt state for Running and Finished application. This ability is missing from UI2. UI2 is using below rest call. This rest call does not show the app attempt state details. {code:title=ws/v1/cluster/apps/application_1563946396350_0002/appattempts?_=1564004553389} 115640045242901564004541852container_1563946396350_0002_01_01xx:yyxx:yyhttp://ixx:yy/node/containerlogs/container_1563946396350_0002_01_01/hrt_qaappattempt_1563946396350_0002_01{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9706) [UI2] App Attempt state missing from Graph view
Yesha Vora created YARN-9706: Summary: [UI2] App Attempt state missing from Graph view Key: YARN-9706 URL: https://issues.apache.org/jira/browse/YARN-9706 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora 1) Launch a Dshell application or Yarn service application. 2) Go to app attempt page Grid view. State column shows N/A. 3) Go to app attempt Graph view. State data is not present on this page. Apparently, app attempt data is only shown in Grid view. Grid and Graph view should show the same details. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9705) [UI2] AM Node Web UI should not display full link
Yesha Vora created YARN-9705: Summary: [UI2] AM Node Web UI should not display full link Key: YARN-9705 URL: https://issues.apache.org/jira/browse/YARN-9705 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora App Attempt page shows AM Node Web UI. It displays the full link. It should not print out full link as display text. Rather, It should use display AM Node name which links to the node. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9704) [UI2] Fix Pending, Allocated, Reserved Containers information for Fair Scheduler
Yesha Vora created YARN-9704: Summary: [UI2] Fix Pending, Allocated, Reserved Containers information for Fair Scheduler Key: YARN-9704 URL: https://issues.apache.org/jira/browse/YARN-9704 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora YARN UI2 shows "Pending, Allocated, Reserved Containers" information for fair scheduler. In here, pending container information is not printed. UI2 shows ",0,0" instead "0,0,0". In UI1, This same information is displayed as Num of active Application & Pending applications. Num Active Applications:0 Num Pending Applications: 0 It's not clear from UI2 what do we intend to show in "Pending, Allocated, Reserved Containers"? Is it really containers or apps? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file
Yesha Vora created YARN-9609: Summary: Nodemanager Web Service should return logAggregationType for each file Key: YARN-9609 URL: https://issues.apache.org/jira/browse/YARN-9609 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.1.1 Reporter: Yesha Vora Steps: 1) Launch sleeper yarn service 2) When sleeper component is in READY state, call NM web service to list the container files and its log aggregation status. http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs NM web service response shows a common log aggregation type response for all files. Instead, NM web service should return a log aggregation type for each file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9570) pplication in pending-ordering-policy is not considered while container allocation
Yesha Vora created YARN-9570: Summary: pplication in pending-ordering-policy is not considered while container allocation Key: YARN-9570 URL: https://issues.apache.org/jira/browse/YARN-9570 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Yesha Vora This is 5 node cluster with total 15GB capacity. 1) Configure Capacity scheduler and set max cluster priority=10 2) launch app1 with no priority and wait for it to occupy full cluster application_1558135983180_0001 is launched with Priority=0 3) launch app2 with priority=2 and check its in ACCEPTED state application_1558135983180_0002 is launched with Priority=2 4) launch app3 with priority=3 and check its in ACCEPTED state application_1558135983180_0003 is launched with Priority=2 5) kill container from app1 6) Verify app3 with higher priority goes to RUNNING state. When max-application-master-percentage is set to 0.1, app2 goes to RUNNING state even though app3 has higher priority. Root cause: In CS LeafQueue, there's two ordering list: If the queue's total application master usage below maxAMResourcePerQueuePercent, the app will be added to the "ordering-policy" list. Otherwise, the app will be added to the "pending-ordering-policy" list. During allocation, only apps in "ordering-policy" are considered. If there's any app finish, or queue config changed, or node add/remove happen, "pending-ordering-policy" will be reconsidered, and some apps from "pending-ordering-policy" will be added to "ordering-policy". This behavior leads to the issue of this JIRA: The cluster has 15GB resource, the max-application-master-percentage is set to 0.1. So it can at most accept 2GB (rounded by 1GB) AM resource, which equals to 2 applications. When app2 submitted, it will be added to ordering-policy. When app3 submitted, it will be added to pending-ordering-policy. When we kill app1, it won't finish immediately. Instead, it will still be part of "odering-policy" until all containers of app1 released. (That makes app3 stays in pending-ordering-policy). So any resource released by app1, app3 cannot pick up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8913) Add helper scripts to launch MaWo App to run Hadoop unit tests on Hadoop Cluster
Yesha Vora created YARN-8913: Summary: Add helper scripts to launch MaWo App to run Hadoop unit tests on Hadoop Cluster Key: YARN-8913 URL: https://issues.apache.org/jira/browse/YARN-8913 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora MaWo application can be used to run Hadoop UT faster in a Hadoop cluster. Develop helper scripts to orchestrate end-to-end workflow for running Hadoop UT using MaWo app. Pre-requisite: * A Hadoop Cluster with HDFS and YARN installed * Enable Docker on YARN feature Helper-scripts * MaWo_Driver ** create a docker image with latest hadoop source code ** create payload to MaWo app (This is input to mawo app where Each MaWo Task = UT execution of each Hadoop Module) ** Upload payload file to HDFS ** Update MaWo-Launch.json to resolve RM_HOST / Docker Image etc dynamically ** Launch MaWo app in Hadoop cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8912) Fix MaWo_Config to read WORKER_WORK_SPACE and MASTER_TASKS_STATUS_LOG_PATH from env
Yesha Vora created YARN-8912: Summary: Fix MaWo_Config to read WORKER_WORK_SPACE and MASTER_TASKS_STATUS_LOG_PATH from env Key: YARN-8912 URL: https://issues.apache.org/jira/browse/YARN-8912 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Fix MaWo Configuration to read MASTER_TASKS_STATUS_LOG_PATH and WORKER_WORK_SPACE from env. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8901) Restart "NEVER" policy does not work with component dependency
Yesha Vora created YARN-8901: Summary: Restart "NEVER" policy does not work with component dependency Key: YARN-8901 URL: https://issues.apache.org/jira/browse/YARN-8901 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Scenario: 1) Launch an application with two components. master and worker. Here, worker is dependent on master. ( Worker should be launched only after master is launched ) 2) Set restart_policy = NEVER for both master and worker. {code:title=sample launch.json} { "name": "mawo-hadoop-ut", "artifact": { "type": "DOCKER", "id": "xxx" }, "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK": "hadoop" }, "properties": { "docker.network": "hadoop" } }, "components": [{ "dependencies": [], "resource": { "memory": "2048", "cpus": "1" }, "name": "master", "run_privileged_container": true, "number_of_containers": 1, "launch_command": "start master", "restart_policy": "NEVER", }, { "dependencies": ["master"], "resource": { "memory": "8072", "cpus": "1" }, "name": "worker", "run_privileged_container": true, "number_of_containers": 10, "launch_command": "start worker", "restart_policy": "NEVER", }], "lifetime": -1, "version": 1.0 }{code} When restart policy is selected to NEVER, AM never launches Worker component. It get stuck with below message. {code} 2018-10-17 15:11:58,560 [Component dispatcher] INFO component.Component - [COMPONENT master] Transitioned from FLEXING to STABLE on CHECK_STABLE event. 2018-10-17 15:11:58,560 [pool-7-thread-1] INFO instance.ComponentInstance - [COMPINSTANCE master-0 : container_e41_1539027682947_0020_01_02] Transitioned from STARTED to READY on BECOME_READY event 2018-10-17 15:11:58,560 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed 2018-10-17 15:12:28,556 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed 2018-10-17 15:12:58,556 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed 2018-10-17 15:13:28,556 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed 2018-10-17 15:13:58,556 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed 2018-10-17 15:14:28,556 [pool-7-thread-1] INFO component.Component - [COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are ready or the dependent component has not completed {code} 'NEVER' restart policy expects master component to be finished before starting workers. Master component can not finish the job without workers. Thus, it create a deadlock. The logic for 'NEVER' restart policy should be fixed to allow worker components to be launched as soon as master component is in READY state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8754) [UI2] Improve terms on Component Instance page
Yesha Vora created YARN-8754: Summary: [UI2] Improve terms on Component Instance page Key: YARN-8754 URL: https://issues.apache.org/jira/browse/YARN-8754 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Component instance page has "node" and "host". These two fields are representing "bare_host" and "hostname" accordingly. >From UI2 page thats not clear. Thus, table content need to be changed to "bare >host" from "node" . This page also has "Host URL" which is hard coded to N/A. Thus, removing this field from table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
Yesha Vora created YARN-8753: Summary: [UI2] Lost nodes representation missing from Nodemanagers Chart Key: YARN-8753 URL: https://issues.apache.org/jira/browse/YARN-8753 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.1 Reporter: Yesha Vora Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. This chart does not show nodemanagers if they are LOST. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8666) Remove application tab from Yarn Queue Page
Yesha Vora created YARN-8666: Summary: Remove application tab from Yarn Queue Page Key: YARN-8666 URL: https://issues.apache.org/jira/browse/YARN-8666 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Yarn UI2 Queue page puts Application button. This button does not redirect to any other page. In addition to that running application table is also available on same page. Thus, there is no need to have a button for application in Queue page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8629) Container cleanup failed
Yesha Vora created YARN-8629: Summary: Container cleanup failed Key: YARN-8629 URL: https://issues.apache.org/jira/browse/YARN-8629 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora When an application failed to launch container successfully, the cleanup of container also failed with below message. {code} 2018-08-06 03:28:20,351 WARN resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup tasks file. java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at java.io.FileInputStream.(FileInputStream.java:93) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) 2018-08-06 03:28:20,372 WARN resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup tasks file.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8599) Build Master module for MaWo app
Yesha Vora created YARN-8599: Summary: Build Master module for MaWo app Key: YARN-8599 URL: https://issues.apache.org/jira/browse/YARN-8599 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Master component for MaWo application is responsible for driving end-to-end job execution. Its responsibility is * Get Job definition and create a Queue of Tasks * Assign Tasks to Worker * Manage Workers lifecycle -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8598) Build Master Job Module for MaWo Application
Yesha Vora created YARN-8598: Summary: Build Master Job Module for MaWo Application Key: YARN-8598 URL: https://issues.apache.org/jira/browse/YARN-8598 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora A job in MaWo application is a collection of Tasks. A Job consists of a setup task, a list of tasks and a teardown task. * JobBuilder ** SimpleTaskJobBuilder : SimpleJobBuilder should be able to parse simpleJobdescription file. In this file format, each line is considered as Task ** SimpleTaskJsonJobBuilder: Utility to parse json job description file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8597) Build Worker utility for MaWo Application
Yesha Vora created YARN-8597: Summary: Build Worker utility for MaWo Application Key: YARN-8597 URL: https://issues.apache.org/jira/browse/YARN-8597 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora The worker is responsible for executing Tasks. * Worker ** Create a worker class which drives worker life cycle ** Create WorkAssignment Protocol. It should be handle Register/deregister worker, send heartbeat ** Lifecycle: Register worker, Run Setup Task, Get Task from master and execute it using TaskRunner, Run Teardown Task * TaskRunner ** Simple Task Runner : This runner should be able to execute a simple task ** Composite Task Runner: This runner should be able to execute composite task * TaskWallTimeLimiter ** Create a utility which can abort the task if the execution time exceeds task timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8587) Delays are noticed to launch docker container
Yesha Vora created YARN-8587: Summary: Delays are noticed to launch docker container Key: YARN-8587 URL: https://issues.apache.org/jira/browse/YARN-8587 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Yesha Vora Launch dshell application. Wait for application to go in RUNNING state. {code:java} yarn jar /xx/hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar {code} Find out container allocation. Run docker inspect command for docker containers launched by app. Sometimes, the container is allocated to NM but docker PID is not up. {code:java} Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null xxx "sudo su - -c \"docker ps -a | grep container_e02_1531189225093_0003_01_02\" root" failed after 0 retries {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8580) yarn.resourcemanager.am.max-attempts is not respected for yarn services
Yesha Vora created YARN-8580: Summary: yarn.resourcemanager.am.max-attempts is not respected for yarn services Key: YARN-8580 URL: https://issues.apache.org/jira/browse/YARN-8580 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1 Reporter: Yesha Vora 1) Max am attempt is set to 100 on all nodes. ( including gateway) {code} yarn.resourcemanager.am.max-attempts 100 {code} 2) Start a Yarn service ( Hbase tarball ) application 3) Kill AM 20 times Here, App fails with below diagnostics. {code} bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status application_1532481557746_0001 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml Application Report : Application-Id : application_1532481557746_0001 Application-Name : hbase-tarball-lr Application-Type : yarn-service User : hbase Queue : default Application Priority : 0 Start-Time : 1532481864863 Finish-Time : 1532522943103 Progress : 100% State : FAILED Final-State : FAILED Tracking-URL : https://xxx:8090/cluster/app/application_1532481557746_0001 RPC Port : -1 AM Host : N/A Aggregate Resource Allocation : 252150112 MB-seconds, 164141 vcore-seconds Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds Log Aggregation Status : SUCCEEDED Diagnostics : Application application_1532481557746_0001 failed 20 times (global limit =100; local limit is =20) due to AM Container for appattempt_1532481557746_0001_20 exited with exitCode: 137 Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed on request. Exit code is 137 [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. [2018-07-25 12:49:03.045]Killed by external signal For more detailed output, check the application tracking page: https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on links to logs of each attempt. . Failing the application. Unmanaged Application : false Application Node Label Expression : AM container Node Label Expression : TimeoutType : LIFETIME ExpiryTime : 2018-07-25T22:26:15.419+ RemainingTime : 0seconds {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8579) New AM attempt could not retrieve previous attempt component data
Yesha Vora created YARN-8579: Summary: New AM attempt could not retrieve previous attempt component data Key: YARN-8579 URL: https://issues.apache.org/jira/browse/YARN-8579 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Yesha Vora Steps: 1) Launch httpd-docker 2) Wait for app to be in STABLE state 3) Run validation for app (It takes around 3 mins) 4) Stop all Zks 5) Wait 60 sec 6) Kill AM 7) wait for 30 sec 8) Start all ZKs 9) Wait for application to finish 10) Validate expected containers of the app Expected behavior: New attempt of AM should start and docker containers launched by 1st attempt should be recovered by new attempt. Actual behavior: New AM attempt starts. It can not recover 1st attempt docker containers. It can not read component details from ZK. Thus, it starts new attempt for all containers. {code} 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into registry 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 containers from previous attempt. 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling container_e08_1531977563978_0015_01_03 from previous attempt 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not found in registry for container container_e08_1531977563978_0015_01_03 from previous attempt, releasing 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component httpd 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT httpd]: 2 instances. 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8551) Build Common module for MaWo application
Yesha Vora created YARN-8551: Summary: Build Common module for MaWo application Key: YARN-8551 URL: https://issues.apache.org/jira/browse/YARN-8551 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Build Common module for MaWo application. This module should include defination of Task. A Task should contain * TaskID * Task Command * Task Environment * Task Timeout * Task Type ** Simple Task *** Its a single Task ** Composite Task *** Its a composition of multiple simple tasks ** Die Task *** Its a last task to be executed after a job is finished ** Null Task *** Its a null task -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8522) Application fails with InvalidResourceRequestException
Yesha Vora created YARN-8522: Summary: Application fails with InvalidResourceRequestException Key: YARN-8522 URL: https://issues.apache.org/jira/browse/YARN-8522 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Launch multiple streaming app simultaneously. Here, sometimes one of the application fails with below stack trace. {code} 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying after sleeping for 3ms. 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: Invocation returned exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, only one resource request with * is allowed at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) on [rm2], so propagating back to caller. 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hrt_qa/.staging/job_1530515284077_0007 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, only one resource request with * is allowed at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8485) Priviledged container app launch is failing intermittently
Yesha Vora created YARN-8485: Summary: Priviledged container app launch is failing intermittently Key: YARN-8485 URL: https://issues.apache.org/jira/browse/YARN-8485 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Privileged application fails intermittently {code:java} yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 30" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} Here, container launch fails with 'Privileged containers are disabled' even though Docker privilege container is enabled in the cluster {code:java|title=nm log} 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - All checks pass. Launching privileged container for : container_e01_1530220647587_0001_01_02 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container container_e01_1530220647587_0001_01_02 is : 29 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from container-launch with container ID: container_e01_1530220647587_0001_01_02 and exit code: 29 org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e01_1530220647587_0001_01_02 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exit code: 29 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container failed 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell error output: check privileges failed for user: hrt_qa, error code: 0 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled for user: hrt_qa 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, docker error code=11, error message='Privileged containers are disabled' 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 4 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : run as user is hrt_qa 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is hrt_qa 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating script paths... 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating local dirs... 2018-06-28
[jira] [Created] (YARN-8465) Dshell docker container gets marked as lost after NM restart
Yesha Vora created YARN-8465: Summary: Dshell docker container gets marked as lost after NM restart Key: YARN-8465 URL: https://issues.apache.org/jira/browse/YARN-8465 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1 Reporter: Yesha Vora scenario: 1) launch dshell application {code} yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 500" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code} 2) wait for app to be in stable state ( container_e01_1529968198450_0001_01_02 is running on host7 and container_e01_1529968198450_0001_01_03 is running on host5) 3) restart NM (host7) Here, dshell application fails with below error {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, service: }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, appUser=hbase 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154 [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154 , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=FINISHED, distributedFinalState=FAILED, appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, appUser=hbase 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to complete successfully{code} Here, the docker container marked as LOST after completion {code} 2018-06-25 23:35:27,970 WARN runtime.DockerLinuxContainerRuntime (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Liveliness check failed for PID: 423695. Container may have already completed. at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-25 23:35:27,975 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:signalContainer(762)) - Error in signalling container 423695 with NULL; exit = -1 org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal docker container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1036) at
[jira] [Created] (YARN-8429) Improve diagnostic message when artifact is not set properly
Yesha Vora created YARN-8429: Summary: Improve diagnostic message when artifact is not set properly Key: YARN-8429 URL: https://issues.apache.org/jira/browse/YARN-8429 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Yesha Vora Steps: 1) Create launch json file. Replace "artifact" with "artifacts" 2) launch yarn service app with cli The application launch fails with below error {code} [xxx xxx]$ yarn app -launch test2-2 test.json 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition from local FS: /xxx/test.json 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be absolute path: /xxx/xxx {code} artifact field is not mandatory. However, If that field is specified incorrectly, launch cmd should fail with proper error. Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8413) Flow activity page is failing with "Timeline server failed with an error"
Yesha Vora created YARN-8413: Summary: Flow activity page is failing with "Timeline server failed with an error" Key: YARN-8413 URL: https://issues.apache.org/jira/browse/YARN-8413 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.1 Reporter: Yesha Vora Flow activity page is fail to load with "Timeline server failed with an error" This page uses incorrect flow call "https://localhost:8188/ws/v2/timeline/flows?_=1528755339836; and it is failing to load. 1) Its using localhost instead ATS v2 hostname 2) Its using ATS v1.5 http port instead ATS v2 https port The correct rest call is "https://:/ws/v2/timeline/flows?_=1528755339836" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE
Yesha Vora created YARN-8409: Summary: ActiveStandbyElectorBasedElectorService is failing with NPE Key: YARN-8409 URL: https://issues.apache.org/jira/browse/YARN-8409 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Reporter: Yesha Vora In RM-HA env, kill ZK leader and then perform RM failover. Sometimes, active RM gets NPE and fail to come up successfully {code:java} 2018-06-08 10:31:03,007 INFO client.ZooKeeperSaslClient (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL mechanism. 2018-06-08 10:31:03,008 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2018-06-08 10:31:03,009 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 2018-06-08 10:31:03,344 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService failed in state INITED java.lang.NullPointerException at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033) at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030) at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095) at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087) at org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030) at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479) 2018-06-08 10:31:03,345 INFO ha.ActiveStandbyElector (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8407) Container launch exception in AM log should be printed in ERROR level
Yesha Vora created YARN-8407: Summary: Container launch exception in AM log should be printed in ERROR level Key: YARN-8407 URL: https://issues.apache.org/jira/browse/YARN-8407 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora when a container launch is failing due to docker image not available is logged as INFO level in AM log. Container launch failure should be logged as ERROR. Steps: launch httpd yarn-service application with invalid docker image {code:java} 2018-06-07 01:51:32,966 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE httpd-0 : container_e05_1528335963594_0001_01_02]: container_e05_1528335963594_0001_01_02 completed. Reinsert back to pending list and requested a new container. exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from container-launch. Container id: container_e05_1528335963594_0001_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: Unable to find image 'xxx/httpd:0.1' locally Trying to pull repository xxx/httpd ... /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on yyy: no such host. See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is hbase main : requested yarn user is hbase Creating script paths... Creating local dirs... Getting exit code file... Changing effective user to root... Wrote the exit code 7 to /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode [2018-06-07 01:51:02.393]Diagnostic message from attempt : [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 4096 bytes of stderr.txt : [2018-06-07 01:51:32.428]Could not find nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid in any of the directories 2018-06-07 01:51:32,966 [Component dispatcher] INFO instance.ComponentInstance - [COMPINSTANCE httpd-0 : container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT on STOP event{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8386) App log can not be viewed from Logs tab in secure cluster
Yesha Vora created YARN-8386: Summary: App log can not be viewed from Logs tab in secure cluster Key: YARN-8386 URL: https://issues.apache.org/jira/browse/YARN-8386 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora App Logs can not be viewed from UI2 logs tab. Steps: 1) Launch yarn service 2) Let application finish and go to Logs tab to view AM log Here, service am api is failing with 401 authentication error. {code} Request URL: http://xxx:8188/ws/v1/applicationhistory/containers/container_e09_1527737134553_0034_01_01/logs/serviceam.log?_=1527799590942 Request Method: GET Status Code: 401 Authentication required Response html> Error 401 Authentication required HTTP ERROR 401 Problem accessing /ws/v1/applicationhistory/containers/container_e09_1527737134553_0034_01_01/logs/serviceam.log. Reason: Authentication required
[jira] [Created] (YARN-8368) yarn app start cli should print applicationId
Yesha Vora created YARN-8368: Summary: yarn app start cli should print applicationId Key: YARN-8368 URL: https://issues.apache.org/jira/browse/YARN-8368 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora yarn app start cli should print the application Id similar to yarn launch cmd. {code:java} bash-4.2$ yarn app -start hbase-app-test WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. 18/05/24 15:15:53 INFO client.RMProxy: Connecting to ResourceManager at xxx/xxx:8050 18/05/24 15:15:54 INFO client.RMProxy: Connecting to ResourceManager at xxx/xxx:8050 18/05/24 15:15:55 INFO client.ApiServiceClient: Service hbase-app-test is successfully started.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8330) An extra container got launched by RM for yarn-service
Yesha Vora created YARN-8330: Summary: An extra container got launched by RM for yarn-service Key: YARN-8330 URL: https://issues.apache.org/jira/browse/YARN-8330 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Steps: launch Hbase tarball app list containers for hbase tarball app {code} /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list appattempt_1525463491331_0006_01 WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 Total number of containers :5 Container-IdStart Time Finish Time StateHost Node Http Address LOG-URL container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 Fri May 04 22:34:26 + 2018 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 Fri May 04 22:34:15 + 2018 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 Fri May 04 22:34:56 + 2018 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 Fri May 04 22:34:56 + 2018 N/A nullxxx:25454 http://xxx:8042 http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} Total expected containers = 4 ( 3 components container + 1 am). Instead, RM is listing 5 containers. container_e06_1525463491331_0006_01_04 is in null state. Yarn service utilized container 02, 03, 05 for component. There is no log available in NM & AM related to container 04. Only one line in RM log is printed {code} 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(489)) - container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8318) Ipaddress in component page shows N/A
Yesha Vora created YARN-8318: Summary: Ipaddress in component page shows N/A Key: YARN-8318 URL: https://issues.apache.org/jira/browse/YARN-8318 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Component page shows IP address value as N/A. It should print IP address of docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8316) Diagnostic message should improve when yarn service fails to launch due to ATS unavailability
Yesha Vora created YARN-8316: Summary: Diagnostic message should improve when yarn service fails to launch due to ATS unavailability Key: YARN-8316 URL: https://issues.apache.org/jira/browse/YARN-8316 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Scenario: 1) shutdown ATS 2) launch yarn service. yarn service launch cmd fails with below stack trace. There is no diagnostic message available in response. {code:java} bash-4.2$ yarn app -launch hbase-sec /tmp/hbase-secure.yar WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. 18/05/17 13:24:43 INFO client.RMProxy: Connecting to ResourceManager at xxx/xxx:8050 18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History server at localhost/xxx:10200 18/05/17 13:24:44 INFO client.RMProxy: Connecting to ResourceManager at xxx/xxx:8050 18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History server at localhost/127.0.0.1:10200 18/05/17 13:24:44 INFO client.ApiServiceClient: Loading service definition from local FS: /tmp/hbase-secure.yar 18/05/17 13:26:06 ERROR client.ApiServiceClient: bash-4.2$ echo $? 56{code} The Error message should provide ConnectionRefused exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved
Yesha Vora created YARN-8309: Summary: Diagnostic message for yarn service app failure due token renewal should be improved Key: YARN-8309 URL: https://issues.apache.org/jira/browse/YARN-8309 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora When Yarn service application failed due to token renewal issue , The diagonstic message was unclear . {code:java} Application application_1526413043392_0002 failed 20 times due to AM Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from container-launch. Container id: container_e04_1526413043392_0002_20_01 Exit code: 1 Exception message: Launch container failed Shell output: main : command provided 1 main : run as user is hbase main : requested yarn user is hbase Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... [2018-05-15 23:15:28.806]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 23:15:28.807]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, check the application tracking page: https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on links to logs of each attempt. . Failing the application.{code} Here, diagnostic message should be improved to specify that AM is failing due to token renewal issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8308) Yarn service app fails due to issues with Renew Token
Yesha Vora created YARN-8308: Summary: Yarn service app fails due to issues with Renew Token Key: YARN-8308 URL: https://issues.apache.org/jira/browse/YARN-8308 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Run Yarn service application beyond dfs.namenode.delegation.token.max-lifetime. Here, yarn service application fails with below error. {code} 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service Service Master failed in state INITED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) at org.apache.hadoop.ipc.Client.call(Client.java:1437) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) at org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) at org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) at org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app master 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting service master org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
[jira] [Created] (YARN-8302) ATS v2 should handle HBase connection issue properly
Yesha Vora created YARN-8302: Summary: ATS v2 should handle HBase connection issue properly Key: YARN-8302 URL: https://issues.apache.org/jira/browse/YARN-8302 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.1.0 Reporter: Yesha Vora ATS v2 call times out with below error when it can't connect to HBase instance. {code} bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: application/json' --max-time 5 --negotiate -u : 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092' curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received {code} {code:title=ATS log} 2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow, ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx,17020,1526348294182, seqNum=-1{code} There are two issues here. 1) Check why ATS can't connect to HBase 2) In case of connection error, ATS call should not get timeout. It should fail with proper error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster
Yesha Vora created YARN-8297: Summary: Incorrect ATS Url used for Wire encrypted cluster Key: YARN-8297 URL: https://issues.apache.org/jira/browse/YARN-8297 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS urls, it uses https protocol with http port. This issue causes all ATS call to fail and UI does not display component details. url used: https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320 expected url : https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart
Yesha Vora created YARN-8290: Summary: Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart Key: YARN-8290 URL: https://issues.apache.org/jira/browse/YARN-8290 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Scenario: 1) Start 5 streaming application in background 2) Kill Active RM and cause RM failover After RM failover, The application failed with below error. {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: Invocation returned exception on [rm2] : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1517520038847_0003' doesn't exist in RM. Please check that the job submission was successful. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) , so propagating back to caller. 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application application_1517520038847_0003 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hrt_qa/.staging/job_1517520038847_0003 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is not set in the application report Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services
Yesha Vora created YARN-8283: Summary: [Umbrella] MaWo - A Master Worker framework on top of YARN Services Key: YARN-8283 URL: https://issues.apache.org/jira/browse/YARN-8283 Project: Hadoop YARN Issue Type: New Feature Reporter: Yesha Vora There is a need for an application / framework to handle Master-Worker scenarios. There are existing frameworks on YARN which can be used to run a job in distributed manner such as Mapreduce, Tez, Spark etc. But master-worker use-cases usually are force-fed into one of these existing frameworks which have been designed primarily around data-parallelism instead of generic Master Worker type of computations. In this JIRA, we’d like to contribute MaWo - a YARN Service based framework that achieves this goal. The overall goal is to create an app that can take an input job specification with tasks, their durations and have a Master dish the tasks off to a predetermined set of workers. The components will be responsible for making sure that the tasks and the overall job finish in specific time durations. We have been using a version of the MaWo framework for running unit tests of Hadoop in a parallel manner on an existing Hadoop YARN cluster. What typically takes 10 hours to run all of Hadoop project’s unit-tests can finish under 20 minutes on a MaWo app of about 50 containers! YARN-3307 was an original attempt at this but through a first-class YARN app. In this JIRA, we instead use YARN Service for orchestration so that our code can focus on the core Master Worker paradigm. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8271) Change UI2 labeling of certain tables to avoid confusion
Yesha Vora created YARN-8271: Summary: Change UI2 labeling of certain tables to avoid confusion Key: YARN-8271 URL: https://issues.apache.org/jira/browse/YARN-8271 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Update labeling for few items to avoid confusion - Cluster Page (/cluster-overview): -- "Finished apps" --> "Finished apps from all users" -- "Running apps" --> "Running apps from all users" - Queues overview page (/yarn-queues/root) && Per queue page (/yarn-queue/root/apps) -- "Running Apps" --> "Running apps from all users in queue " - Nodes Page - side bar for all pages -- "List of Applications" --> "List of Applications on this node" -- "List of Containers" --> "List of Containers on this node" - Yarn Tools ** Yarn Tools --> YARN Tools - Queue page ** Running Apps: --> Running Apps From All Users -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page
Yesha Vora created YARN-8266: Summary: Clicking on application from cluster view should redirect to application attempt page Key: YARN-8266 URL: https://issues.apache.org/jira/browse/YARN-8266 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Start one application 2) Go to cluster overview page 3) Click on applicationId from Cluster Resource Usage By Application This action redirects to [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is invalid url. It does not show any details. Instead It should redirect to attempt page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"
Yesha Vora created YARN-8253: Summary: HTTPS Ats v2 api call fails with "bad HTTP parsed" Key: YARN-8253 URL: https://issues.apache.org/jira/browse/YARN-8253 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.1.0 Reporter: Yesha Vora When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address. Here, ATS v2 call is failing with below error. {code:java} [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: application/json' --negotiate -u: 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL' [hrt_qa@xxx root]$ echo $? 35{code} {code:java|title=Ats v2} 2018-05-02 05:45:40,427 WARN http.HttpParser (HttpParser.java:(1832)) - Illegal character 0x16 in state=START for buffer HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 2018-05-02 05:45:40,428 WARN http.HttpParser (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8251) Clicking on app link at the header goes to Diagnostics Tab instead of AppAttempt Tab
Yesha Vora created YARN-8251: Summary: Clicking on app link at the header goes to Diagnostics Tab instead of AppAttempt Tab Key: YARN-8251 URL: https://issues.apache.org/jira/browse/YARN-8251 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora 1. Click on Application link under Application tab 2. It goes to Specific Application page with appAttempt Tab 3. Click on the "Application \[app ID\]" link at the top 4. It goes to Specific Application page with Diagnostic Tab instead of appAttempt Tab -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8231) Dshell application fails when one of the docker container gets killed
Yesha Vora created YARN-8231: Summary: Dshell application fails when one of the docker container gets killed Key: YARN-8231 URL: https://issues.apache.org/jira/browse/YARN-8231 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora 1) Launch dshell application {code} yarn jar hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -keep_containers_across_application_attempts -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} 2) Kill container_1524681858728_0012_01_02 Expected behavior: Application should start new instance and finish successfully Actual behavior: Application Failed as soon as container was killed {code:title=AM log} 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: appattempt_1524681858728_0012_01 got container status for containerID=container_1524681858728_0012_01_02, state=COMPLETE, exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on request. Exit code is 137 [2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137. [2018-04-27 23:05:09.332]Killed by external signal 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: appattempt_1524681858728_0012_01 got container status for containerID=container_1524681858728_0012_01_03, state=COMPLETE, exitStatus=0, diagnostics= 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1524681858728_0012_01_03 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics., total=2, completed=2, allocated=2, failed=1 18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8215) Ats v2 returns invalid YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS
Yesha Vora created YARN-8215: Summary: Ats v2 returns invalid YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS Key: YARN-8215 URL: https://issues.apache.org/jira/browse/YARN-8215 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Run Httpd yarn service 2) Stop Httpd yarn service 3) Validate application attempt page. ATS v2 call is returning invalid data for YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS {code:java} http://xxx:8198/ws/v2/timeline/apps/application_1524698886838_0005/entities/YARN_CONTAINER?fields=ALL&_=1524705653569{code} {code} [{"metrics":[{"type":"SINGLE_VALUE","id":"CPU","aggregationOp":"NOP","values":{"1524704571187":0}},{"type":"SINGLE_VALUE","id":"MEMORY","aggregationOp":"NOP","values":{"1524704562126":30973952}}],"events":[{"id":"YARN_CONTAINER_FINISHED","timestamp":1524704571552,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_FINISHED","timestamp":1524704488410,"info":{}},{"id":"YARN_CONTAINER_CREATED","timestamp":1524704482976,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_STARTED","timestamp":1524704482976,"info":{}}],"createdtime":1524704482973,"idprefix":9223370512150292834,"id":"container_e12_1524698886838_0005_01_03","info":{"YARN_CONTAINER_STATE":"COMPLETE","YARN_CONTAINER_ALLOCATED_HOST":"xxx","YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS":"xxx:0","YARN_CONTAINER_ALLOCATED_VCORE":1,"FROM_ID":"yarn-cluster!hbase!httpd-docker-config-3!1524704463727!application_1524698886838_0005!YARN_CONTAINER!9223370512150292834!container_e12_1524698886838_0005_01_03","YARN_CONTAINER_ALLOCATED_PORT":25454,"UID":"yarn-cluster!application_1524698886838_0005!YARN_CONTAINER!9223370512150292834!container_e12_1524698886838_0005_01_03","YARN_CONTAINER_ALLOCATED_MEMORY":1024,"SYSTEM_INFO_PARENT_ENTITY":{"type":"YARN_APPLICATION_ATTEMPT","id":"appattempt_1524698886838_0005_01"},"YARN_CONTAINER_EXIT_STATUS":-105,"YARN_CONTAINER_ALLOCATED_PRIORITY":"0","YARN_CONTAINER_DIAGNOSTICS_INFO":"[2018-04-26 01:02:34.486]Container killed by the ApplicationMaster.\n[2018-04-26 01:02:45.616]Container killed on request. Exit code is 137\n[2018-04-26 01:02:49.387]Container exited with a non-zero exit code 137. \n","YARN_CONTAINER_FINISHED_TIME":1524704571552},"relatesto":{},"configs":{},"isrelatedto":{},"type":"YARN_CONTAINER"},{"metrics":[{"type":"SINGLE_VALUE","id":"CPU","aggregationOp":"NOP","values":{"1524704564690":6}},{"type":"SINGLE_VALUE","id":"MEMORY","aggregationOp":"NOP","values":{"1524704564690":3710976}}],"events":[{"id":"YARN_CONTAINER_FINISHED","timestamp":1524704567244,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_FINISHED","timestamp":1524704487938,"info":{}},{"id":"YARN_CONTAINER_CREATED","timestamp":1524704483140,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_STARTED","timestamp":1524704483140,"info":{}}],"createdtime":1524704482919,"idprefix":9223370512150292888,"id":"container_e12_1524698886838_0005_01_04","info":{"YARN_CONTAINER_STATE":"COMPLETE","YARN_CONTAINER_ALLOCATED_HOST":"xxx","YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS":"xxx:0","YARN_CONTAINER_ALLOCATED_VCORE":1,"FROM_ID":"yarn-cluster!hbase!httpd-docker-config-3!1524704463727!application_1524698886838_0005!YARN_CONTAINER!9223370512150292888!container_e12_1524698886838_0005_01_04","YARN_CONTAINER_ALLOCATED_PORT":25454,"UID":"yarn-cluster!application_1524698886838_0005!YARN_CONTAINER!9223370512150292888!container_e12_1524698886838_0005_01_04","YARN_CONTAINER_ALLOCATED_MEMORY":1024,"SYSTEM_INFO_PARENT_ENTITY":{"type":"YARN_APPLICATION_ATTEMPT","id":"appattempt_1524698886838_0005_01"},"YARN_CONTAINER_EXIT_STATUS":-105,"YARN_CONTAINER_ALLOCATED_PRIORITY":"1","YARN_CONTAINER_DIAGNOSTICS_INFO":"[2018-04-26 01:02:34.500]Container killed by the ApplicationMaster.\n[2018-04-26 01:02:45.771]Container killed on request. Exit code is 137\n[2018-04-26 01:02:47.242]Container exited with a non-zero exit code 137.
[jira] [Created] (YARN-8211) Yarn registry dns log finds BufferUnderflowException
Yesha Vora created YARN-8211: Summary: Yarn registry dns log finds BufferUnderflowException Key: YARN-8211 URL: https://issues.apache.org/jira/browse/YARN-8211 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Yarn registry dns server is constantly getting BufferUnderflowException. {code:java} 2018-04-25 01:36:56,139 WARN concurrent.ExecutorHelper (ExecutorHelper.java:logThrowableFromAfterExecute(50)) - Execution exception when running task in RegistryDNS 76 2018-04-25 01:36:56,139 WARN concurrent.ExecutorHelper (ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in thread RegistryDNS 76: java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:500) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:135) at org.apache.hadoop.registry.server.dns.RegistryDNS.getMessgeLength(RegistryDNS.java:820) at org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:767) at org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846) at org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7878) Docker container IP detail missing when service is in STABLE state
[ https://issues.apache.org/jira/browse/YARN-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora resolved YARN-7878. -- Resolution: Duplicate > Docker container IP detail missing when service is in STABLE state > -- > > Key: YARN-7878 > URL: https://issues.apache.org/jira/browse/YARN-7878 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Priority: Critical > > Scenario > 1) Launch Hbase on docker app > 2) Validate yarn service status using cli > {code:java} > {"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m > > -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep > 15; /usr/hdp/current/hbase-master/bin/hbase master > start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70 > -Xmx2048m > -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep > 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver > start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep > >
[jira] [Created] (YARN-8167) Improve Diagonstic message when a user without privileged permission deploys a privileged app
Yesha Vora created YARN-8167: Summary: Improve Diagonstic message when a user without privileged permission deploys a privileged app Key: YARN-8167 URL: https://issues.apache.org/jira/browse/YARN-8167 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Validate hrt_qa user is not mentioned in yarn.nodemanager.runtime.linux.docker.privileged-containers.acl 2) launch a dshell app with YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true {code} /usr/hdp/current/hadoop-yarn-client/bin/yarn jar /usr/hdp/3.0.0.0/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.0.0.jar -shell_command "sleep 30" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar /usr/hdp/3.0.0.0/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.0.0.jar{code} The application is failing to launch the container. However, diagnostic message of the app is not proper. It returns "Diagnostics., total=1, completed=1, allocated=1, failed=1" The AM log also does not have proper error message {code:title=AppMaster.stderr} 18/04/16 20:45:56 INFO distributedshell.ApplicationMaster: appattempt_1523387473707_0049_01 got container status for containerID=container_e24_1523387473707_0049_01_02, state=COMPLETE, exitStatus=-1, diagnostics=[2018-04-16 20:45:49.062]Exception from container-launch. Container id: container_e24_1523387473707_0049_01_02 Exit code: -1 Exception message: Shell output: [2018-04-16 20:45:49.085]Container exited with a non-zero exit code -1. [2018-04-16 20:45:49.085]Container exited with a non-zero exit code -1. 18/04/16 20:45:56 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers{code} Diagonstic message should be improved to explicitly mention that "hrt_qa user does not have permission to launch privilege containers" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8166) Service AppId page throws HTTP Error 401
Yesha Vora created YARN-8166: Summary: Service AppId page throws HTTP Error 401 Key: YARN-8166 URL: https://issues.apache.org/jira/browse/YARN-8166 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Launch a yarn service in unsecure cluster 2) Go to component info page for sleeper-0 3) click on sleeper link http://xxx:8088/ui2/#/yarn-component-instances/sleeper/components?service=yesha-sleeper&=application_1518804855867_0002 Above url fails with HTTP Error 401 {code} 401, Authorization required. Please check your security settings. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8142) yarn service application stops when AM is killed
Yesha Vora created YARN-8142: Summary: yarn service application stops when AM is killed Key: YARN-8142 URL: https://issues.apache.org/jira/browse/YARN-8142 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Steps: 1) Launch sleeper job ( non-docker yarn service) {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch fault-test-am-sleeper /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: application_1522887500374_0010 Exit Code: 0\{code} 2) Wait for sleeper component to be up 3) Kill AM process PID Expected behavior: New attempt of AM will be started. The pre-existing container will keep running Actual behavior: Application finishes with State : FINISHED and Final-State : ENDED New attempt was never launched Note: when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the entire app down instead of letting it continue to run for another attempt -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
Yesha Vora created YARN-8140: Summary: Improve log message when launch cmd is ran for stopped yarn service Key: YARN-8140 URL: https://issues.apache.org/jira/browse/YARN-8140 Project: Hadoop YARN Issue Type: Improvement Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Launch sleeper app {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: application_1523387473707_0007 Exit Code: 0\{code} 2) Stop the application {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop sleeper2-duplicate-app-stopped WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service sleeper2-duplicate-app-stopped Exit Code: 0\{code} 3) Launch the application with same name {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json Exit Code: 56 {code} Here, launch cmd fails with "Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". The log message should be more meaningful. It should return that "sleeper2-duplicate-app-stopped is in stopped state". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
Yesha Vora created YARN-8116: Summary: Nodemanager fails with NumberFormatException: For input string: "" Key: YARN-8116 URL: https://issues.apache.org/jira/browse/YARN-8116 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Steps followed. 1) Update nodemanager debug delay config {code} yarn.nodemanager.delete.debug-delay-sec 350 {code} 2) Launch distributed shell application multiple times {code} /usr/hdp/current/hadoop-yarn-client/bin/yarn jar hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar hadoop-yarn-applications-distributedshell-*.jar{code} 3) restart NM Nodemanager fails to start with below error. {code} {code:title=NM log} 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: true 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set as 3600. The logs will be aggregated every 3600 seconds 2018-03-23 21:32:14,455 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:601) at java.lang.Long.parseLong(Long.java:631) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService (LogAggregationService.java:serviceStop(148)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit 2018-03-23 21:32:14,460 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:601) at java.lang.Long.parseLong(Long.java:631) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) 2018-03-23 21:32:14,463 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NodeManager metrics system... 2018-03-23 21:32:14,464 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread
[jira] [Created] (YARN-7961) Improve status response when yarn application is destroyed
Yesha Vora created YARN-7961: Summary: Improve status response when yarn application is destroyed Key: YARN-7961 URL: https://issues.apache.org/jira/browse/YARN-7961 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Yarn should provide some way to figure out if yarn service is destroyed. If yarn service application is stopped, "yarn app -status " shows that service is Stopped. After destroying yarn service, "yarn app -status " returns 404 {code} [hdpuser@cn005 sleeper]$ yarn app -status yesha-sleeper WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/02/16 11:02:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/16 11:02:31 INFO client.RMProxy: Connecting to ResourceManager at xxx/xx.xx.xx.xx:8050 18/02/16 11:02:31 INFO client.AHSProxy: Connecting to Application History server at xxx/xx.xx.xx.x:10200 18/02/16 11:02:31 INFO client.RMProxy: Connecting to ResourceManager at xxx/xx.xx.xx.x:8050 18/02/16 11:02:31 INFO client.AHSProxy: Connecting to Application History server at xxx/xx.xx.xx.x:10200 18/02/16 11:02:31 INFO util.log: Logging initialized @2075ms yesha-sleeper Failed : HTTP error code : 404 {code} Yarn should be able to notify user that whether a certain app is destroyed or never created. HTTP 404 error does not explicitly provide information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7954) Component status stays "Ready" when yarn service is stopped
Yesha Vora created YARN-7954: Summary: Component status stays "Ready" when yarn service is stopped Key: YARN-7954 URL: https://issues.apache.org/jira/browse/YARN-7954 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Steps: 1) Launch yarn service application 2) Stop application 3) Run get status from yarn cli {code} [hdpuser@cn005 sleeper]$ yarn app -status yesha-sleeper WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/02/16 10:54:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/16 10:54:37 INFO client.RMProxy: Connecting to ResourceManager at xxx/xx.xx.xx.xx:8050 18/02/16 10:54:37 INFO client.AHSProxy: Connecting to Application History server at xxx/xx.xx.xx.xx:10200 18/02/16 10:54:37 INFO client.RMProxy: Connecting to ResourceManager at xxx/xx.xx.xx.xx:8050 18/02/16 10:54:37 INFO client.AHSProxy: Connecting to Application History server at xxx/xx.xx.xx.xx:10200 18/02/16 10:54:38 INFO util.log: Logging initialized @1957ms {"name":"yesha-sleeper","lifetime":-1,"components":[],"configuration":{"properties":{},"env":{},"files":[]},"state":"STOPPED","quicklinks":{},"kerberos_principal":{}} {code} 4) Validate UI2 for service status Here, Yarn service status is marked as "finished". However, components status still shows Ready. On stopping yarn service, component status should be updated to "Stop" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7957) Yarn service delete option disappears after stopping application
Yesha Vora created YARN-7957: Summary: Yarn service delete option disappears after stopping application Key: YARN-7957 URL: https://issues.apache.org/jira/browse/YARN-7957 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Launch yarn service 2) Go to service page and click on Setting button->"Stop Service". The application will be stopped. 3) Refresh page Here, setting button disappears. Thus, user can not delete service from UI after stopping application Expected behavior: Setting button should be present on UI page after application is stopped. If application is stopped, setting button should only have "Delete Service" action available. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7956) HOME/Services/ and HOME/Services//Components refer to same page
Yesha Vora created YARN-7956: Summary: HOME/Services/ and HOME/Services//Components refer to same page Key: YARN-7956 URL: https://issues.apache.org/jira/browse/YARN-7956 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Scenario: 1) Start Yarn service 2) Click on a Running yarn service ( example : yesha-sleeper) http://:8088/ui2/#/yarn-app/application_1518804855867_0002/components?service=yesha-sleeper 3) Now click on yesha-sleeper [application_1518804855867_0002] link Both components and yesha-sleeper [application_1518804855867_0002] link point to one page. HOME/Services/ and HOME/Services//Components refer to same page. We should not need two links to refer to one page h2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7949) ArtifactsId should not be a compulsory field for new service
Yesha Vora created YARN-7949: Summary: ArtifactsId should not be a compulsory field for new service Key: YARN-7949 URL: https://issues.apache.org/jira/browse/YARN-7949 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora 1) Click on New Service 2) Create a component Create Component page has Artifacts Id as compulsory entry. Few yarn service example such as sleeper.json does not need to provide artifacts id. {code:java|title=sleeper.json} { "name": "sleeper-service", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 90", "resource": { "cpus": 1, "memory": "256" } } ] }{code} Thus, artifactsId should not be compulsory field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7944) Remove master node link from headers of application pages
Yesha Vora created YARN-7944: Summary: Remove master node link from headers of application pages Key: YARN-7944 URL: https://issues.apache.org/jira/browse/YARN-7944 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.1.0 Reporter: Yesha Vora Rm UI2 has links for Master container log and master node. This link published on application and service page. This links are not required on all pages because AM container node link and container log link are already present in Application view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7928) [UI2] Components details not present for Yarn service with Yarn authentication
Yesha Vora created YARN-7928: Summary: [UI2] Components details not present for Yarn service with Yarn authentication Key: YARN-7928 URL: https://issues.apache.org/jira/browse/YARN-7928 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Scenario: Launch Hbase app in secure hadoop cluster where yarn UI authentication is enabled Validate Components page. Here, Component details are missing from UI {code:java} Failed to load http://xxx:8198/ws/v2/timeline/apps/application_1518564922635_0001/entities/SERVICE_ATTEMPT?fields=ALL&_=1518567830088: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://xxx:8088' is therefore not allowed access.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7907) Yarn app CLI client does not send Kerberos header to Resource Manager rest API
Yesha Vora created YARN-7907: Summary: Yarn app CLI client does not send Kerberos header to Resource Manager rest API Key: YARN-7907 URL: https://issues.apache.org/jira/browse/YARN-7907 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.0.0 Reporter: Yesha Vora Launch of yarn service app is failing in secure mode with below stacktrace. {code:java} [hrt_qa@xxx root]$ kinit -kt /home/hrt_qa/hadoopqa/keytabs/hrt_qa.headless.keytab hrt_qa [hrt_qa@xxx root]$ yarn app -launch test2 sleeper WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/02/07 22:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/07 22:50:41 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/02/07 22:50:41 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200 18/02/07 22:50:41 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-800/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/02/07 22:50:42 ERROR client.ApiServiceClient: Authentication required{code} CLI client does not send Kerberos header to Resource Manager rest API. Tcpdump indicate that there is no token being sent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7897) Invalid NM log & NM UI link published on Yarn UI when container fails
Yesha Vora created YARN-7897: Summary: Invalid NM log & NM UI link published on Yarn UI when container fails Key: YARN-7897 URL: https://issues.apache.org/jira/browse/YARN-7897 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Launch Httpd example via rest api in unsecure mode 2) container_e04_1517875972784_0001_01_02 fails with "Unable to find image 'centos/httpd-24-centos7:latest" 3) Go To RM UI2 to debug issue. The Yarn app attempt page has incorrect Value for Logs and Nodemanager UI Logs = N/A Nodemanager UI = http://nmhost:0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7896) AM log link in diagnostic is redirected to old RM UI
Yesha Vora created YARN-7896: Summary: AM log link in diagnostic is redirected to old RM UI Key: YARN-7896 URL: https://issues.apache.org/jira/browse/YARN-7896 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Scenario: 1) Run Httpd yarn service in secure mode and make sure application gets launched as dr.who 2) Go to Diagnostic tab The message showed in Diagnostics mentions to AM UI link for old UI. In diagnostic message UI2 link should be mentioned. {code:java|title=Diagnostics} Application application_1517253048795_0001 failed 20 times due to AM Container for appattempt_1517253048795_0001_20 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2018-01-29 23:01:46.234]Application application_1517253048795_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is dr.who main : requested yarn user is dr.who User dr.who not found For more detailed output, check the application tracking page: http://xxx:8088/cluster/app/application_1517253048795_0001 Then click on links to logs of each attempt. . Failing the application.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7878) Docker container IP detail missing when service is in STABLE state
Yesha Vora created YARN-7878: Summary: Docker container IP detail missing when service is in STABLE state Key: YARN-7878 URL: https://issues.apache.org/jira/browse/YARN-7878 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Scenario 1) Launch Hbase on docker app 2) Validate yarn service status using cli {code:java} {"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep 15; /usr/hdp/current/hbase-master/bin/hbase master start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70 -Xmx2048m -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep
[jira] [Created] (YARN-7851) Graph view does not show all AM attempts
Yesha Vora created YARN-7851: Summary: Graph view does not show all AM attempts Key: YARN-7851 URL: https://issues.apache.org/jira/browse/YARN-7851 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Scenario: 1) Run an application where all AM attempt fails 2) Go to Graph view for application Here, The application started 10 attempt of AM. However, Graph view has pictorial representation of only has 4 AM attempts. It should show all 10 attempts in Graph -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7850) New UI does not show status for Log Aggregation
Yesha Vora created YARN-7850: Summary: New UI does not show status for Log Aggregation Key: YARN-7850 URL: https://issues.apache.org/jira/browse/YARN-7850 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora The status of Log Aggregation is not specified any where. New UI should show the Log aggregation status for finished application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7832) Logs page does not work for Running applications
Yesha Vora created YARN-7832: Summary: Logs page does not work for Running applications Key: YARN-7832 URL: https://issues.apache.org/jira/browse/YARN-7832 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.0.0 Reporter: Yesha Vora Scenario * Run yarn service application * When application is Running, go to log page * Select AttemptId and Container Id Logs are not showed on UI. It complains "No log data available!" Here [http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358|http://ctr-e137-1514896590304-35963-01-04.hwx.site:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358] API fails with 500 Internal Server Error. {"exception":"WebApplicationException","message":"java.io.IOException: ","javaClassName":"javax.ws.rs.WebApplicationException"} {code:java} GET http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358 500 (Internal Server Error) (anonymous) @ VM779:1 send @ vendor.js:572 ajax @ vendor.js:548 (anonymous) @ vendor.js:5119 initializePromise @ vendor.js:2941 Promise @ vendor.js:3005 ajax @ vendor.js:5117 ajax @ yarn-ui.js:1 superWrapper @ vendor.js:1591 query @ vendor.js:5112 ember$data$lib$system$store$finders$$_query @ vendor.js:5177 query @ vendor.js:5334 fetchLogFilesForContainerId @ yarn-ui.js:132 showLogFilesForContainerId @ yarn-ui.js:126 run @ vendor.js:648 join @ vendor.js:648 run.join @ vendor.js:1510 closureAction @ vendor.js:1865 trigger @ vendor.js:302 (anonymous) @ vendor.js:339 each @ vendor.js:61 each @ vendor.js:51 trigger @ vendor.js:339 d.select @ vendor.js:5598 (anonymous) @ vendor.js:5598 d.invoke @ vendor.js:5598 d.trigger @ vendor.js:5598 e.trigger @ vendor.js:5598 (anonymous) @ vendor.js:5598 d.invoke @ vendor.js:5598 d.trigger @ vendor.js:5598 (anonymous) @ vendor.js:5598 dispatch @ vendor.js:306 elemData.handle @ vendor.js:281{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view
Yesha Vora created YARN-7830: Summary: If attempt has selected grid view, attempt info page should be opened with grid view Key: YARN-7830 URL: https://issues.apache.org/jira/browse/YARN-7830 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Start Application and visit attempt page 2) click on Grid view 3) Click on attempt 1 Current behavior: This page is redirected to attempt info page. This page redirects to graph view . Expected behavior: In this scenario, It should redirect to grid view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7827) Stop and Delete Yarn Service from RM UI fails with HTTP ERROR 404
Yesha Vora created YARN-7827: Summary: Stop and Delete Yarn Service from RM UI fails with HTTP ERROR 404 Key: YARN-7827 URL: https://issues.apache.org/jira/browse/YARN-7827 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Enable Ats v2 2) Start Httpd Yarn service 3) Go to UI2 attempts page for yarn service 4) Click on setting icon 5) Click on stop service 6) This action will pop up a box to confirm stop. click on "Yes" Expected behavior: Yarn service should be stopped Actual behavior: Yarn UI is not notifying on whether Yarn service is stopped or not. On checking network stack trace, the PUT request failed with HTTP error 404 {code} Sorry, got error 404 Please consult RFC 2616 for meanings of the error code. Error Details org.apache.hadoop.yarn.webapp.WebAppException: /v1/services/httpd-hrt-qa-n: controller for v1 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:143) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1578) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at
***UNCHECKED*** [jira] [Created] (YARN-7826) Yarn service status cli does not update lifetime if its updated with -appId
Yesha Vora created YARN-7826: Summary: Yarn service status cli does not update lifetime if its updated with -appId Key: YARN-7826 URL: https://issues.apache.org/jira/browse/YARN-7826 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora 1) Create Httpd yarn service with lifetime = 3600 sec. 2) Run yarn application -status , The lifetime field has 3600 sec. 3) Update lifetime of service using applicationId {code} yarn application -appId application_1516919074719_0001 -updateLifetime 48000{code} 4) Verify Application status using ApplicationId. Lifetime detail is updated correctly 5) Verify Lifetime using application name {code} [hrt_qa@xxx hadoopqe]$ yarn application -status httpd-hrt-qa-n { "uri" : null, "name" : "httpd-hrt-qa-n", "id" : "application_1516919074719_0001", "artifact" : null, "resource" : null, "launch_time" : null, "number_of_running_containers" : null, "lifetime" : 3600, "placement_policy" : null, "components" : [ { "name" : "httpd", "dependencies" : [ ], "readiness_check" : null, "artifact" : { "id" : "centos/httpd-24-centos7:latest", "type" : "DOCKER", "uri" : null }, "launch_command" : "/usr/bin/run-httpd", "resource" : { "uri" : null, "profile" : null, "cpus" : 1, "memory" : "1024", "additional" : null }, "number_of_containers" : 2, "run_privileged_container" : false, "placement_policy" : null, "state" : "STABLE", "configuration" : { "properties" : { }, "env" : { }, "files" : [ { "type" : "TEMPLATE", "dest_file" : "/var/www/html/index.html", "src_file" : null, "properties" : { "content" : "TitleHello from ${COMPONENT_INSTANCE_NAME}!" } } ] }, "quicklinks" : [ ], "containers" : [ { "uri" : null, "id" : "container_e07_1516919074719_0001_01_02", "launch_time" : 1516919372633, "ip" : "xxx.xxx.xxx.xxx", "hostname" : "httpd-0.httpd-hrt-qa-n.hrt_qa.test.com", "bare_host" : "xxx", "state" : "READY", "component_instance_name" : "httpd-0", "resource" : null, "artifact" : null, "privileged_container" : null }, { "uri" : null, "id" : "container_e07_1516919074719_0001_01_03", "launch_time" : 1516919372637, "ip" : "xxx.xxx.xxx.xxx", "hostname" : "httpd-1.httpd-hrt-qa-n.hrt_qa.test.com", "bare_host" : "xxx", "state" : "READY", "component_instance_name" : "httpd-1", "resource" : null, "artifact" : null, "privileged_container" : null } ] }, { "name" : "httpd-proxy", "dependencies" : [ ], "readiness_check" : null, "artifact" : { "id" : "centos/httpd-24-centos7:latest", "type" : "DOCKER", "uri" : null }, "launch_command" : "/usr/bin/run-httpd", "resource" : { "uri" : null, "profile" : null, "cpus" : 1, "memory" : "1024", "additional" : null }, "number_of_containers" : 1, "run_privileged_container" : false, "placement_policy" : null, "state" : "STABLE", "configuration" : { "properties" : { }, "env" : { }, "files" : [ { "type" : "TEMPLATE", "dest_file" : "/etc/httpd/conf.d/httpd-proxy.conf", "src_file" : "httpd-proxy.conf", "properties" : { } } ] }, "quicklinks" : [ ], "containers" : [ { "uri" : null, "id" : "container_e07_1516919074719_0001_01_04", "launch_time" : 1516919372638, "ip" : "xxx.xxx.xxx.xxx", "hostname" : "httpd-proxy-0.httpd-hrt-qa-n.hrt_qa.test.com", "bare_host" : "xxx", "state" : "READY", "component_instance_name" : "httpd-proxy-0", "resource" : null, "artifact" : null, "privileged_container" : null } ] } ], "configuration" : { "properties" : { }, "env" : { }, "files" : [ ] }, "state" : "STABLE", "quicklinks" : { "Apache HTTP Server" : "http://httpd-proxy-0.httpd-hrt-qa-n.hrt_qa.test.com:8080; }, "queue" : null, "kerberos_principal" : { "principal_name" : null, "keytab" : null } } {code} Here, App status with app-name did not have new lifetime. The application status with app name should also reflect the new lifetime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7825) Maintain constant horizontal application info bar for all pages
Yesha Vora created YARN-7825: Summary: Maintain constant horizontal application info bar for all pages Key: YARN-7825 URL: https://issues.apache.org/jira/browse/YARN-7825 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) enable Ats v2 2) Start Yarn service application ( Httpd ) 3) Fix horizontal info bar for below pages. * component page * Component Instance info page * Application attempt Info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7824) Yarn Component Instance page should include link to container logs
Yesha Vora created YARN-7824: Summary: Yarn Component Instance page should include link to container logs Key: YARN-7824 URL: https://issues.apache.org/jira/browse/YARN-7824 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Launch Httpd example 2) Visit component Instance page for httpd-proxy-0 This page has information regarding httpd-proxy-0 component. This page should also include a link to container logs for this component h2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7818) DistributedShell Container fails with exitCode=143 when NM restarts and recovers
Yesha Vora created YARN-7818: Summary: DistributedShell Container fails with exitCode=143 when NM restarts and recovers Key: YARN-7818 URL: https://issues.apache.org/jira/browse/YARN-7818 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora steps: 1) Run Dshell Application {code} yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar /usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar -keep_containers_across_application_attempts -timeout 90 -shell_command "sleep 110" -num_containers 4{code} 2) Find out host where AM is running. 3) Find Containers launched by application 4) Restart NM where AM is running 5) Validate that new attempt is not started and containers launched before restart are in RUNNING state. In this test, step#5 fails because containers failed to launch with error 143 {code} 2018-01-24 09:48:30,547 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1516787230461_0001_01_03 transitioned from RUNNING to KILLING 2018-01-24 09:48:30,547 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container container_e04_1516787230461_0001_01_03 2018-01-24 09:48:30,552 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 143. Privileged Execution Operation Stderr: Stdout: main : command provided 1 main : run as user is hrt_qa main : requested yarn user is hrt_qa Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... Full command array for failed execution: [/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, application_1516787230461_0001, container_e04_1516787230461_0001_01_03, /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_03, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/launch_container.sh, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.tokens, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid, /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none] 2018-01-24 09:48:30,553 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ExitCodeException exitCode=143: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 10 more 2018-01-24
[jira] [Created] (YARN-7805) Yarn should update container as failed on docker container failure
Yesha Vora created YARN-7805: Summary: Yarn should update container as failed on docker container failure Key: YARN-7805 URL: https://issues.apache.org/jira/browse/YARN-7805 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Steps: Start hbase yarn service example on docker when Hbase master fails, it lead the master daemon docker to fail. {code} [root@xx bin]# docker ps -a CONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES a57303b1a736x/xxxhbase:x.x.x.x.0.0.0 "bash /grid/0/hadoop/" 5 minutes ago Exited (1) 4 minutes ago container_e07_1516734339938_0018_01_02 [root@xxx bin]# docker exec -it a57303b1a736 bash Error response from daemon: Container a57303b1a7364a733428ec76581368253e5a701560a510204b8c302e3bbeed26 is not running {code} Expected behavior: Yarn should mark this container as failed and start new docker container Actual behavior: Yarn did not capture that container is failed. It kept showing container status as Running. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7804) Refresh action on Grid view page should not be redirected to graph view
Yesha Vora created YARN-7804: Summary: Refresh action on Grid view page should not be redirected to graph view Key: YARN-7804 URL: https://issues.apache.org/jira/browse/YARN-7804 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Affects Versions: 3.0.0 Reporter: Yesha Vora Steps: 1) Go to application attempt page http://host:8088/ui2/#/yarn-app/application_1516734339938_0020/attempts?service=abc 2) click on grid view 3) click refresh button of the page Actual behavior: on refresh page, the page moves come back to graph view. Expected behavior: on refreshing page, the page should stay at grid view -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7802) Application regex search did not work properly with app name
Yesha Vora created YARN-7802: Summary: Application regex search did not work properly with app name Key: YARN-7802 URL: https://issues.apache.org/jira/browse/YARN-7802 Project: Hadoop YARN Issue Type: Bug Components: yarn-ui-v2 Reporter: Yesha Vora Steps: 1) Start yarn services with "yesha-hbase-retry-2" 2) put regex = yesha-hbase-retry-2 http://host:8088/ui2/#/yarn-apps/apps?searchText=yesha-hbase-retry-2 Here, the application does not gets listed. The regex work with "yesha-hbase-retry-" input but does not work with full app name. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7768) yarn application -status appName does not return valid json
Yesha Vora created YARN-7768: Summary: yarn application -status appName does not return valid json Key: YARN-7768 URL: https://issues.apache.org/jira/browse/YARN-7768 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora yarn application -status does not return valid json 1) It has classname added to json content such as class Service, class KerberosPrincipal , class Component etc 2) The json object should be comma separated. {code} [hrt_qa@2 hadoopqe]$ yarn application -status httpd-hrt-qa WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/01/18 00:33:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/18 00:33:08 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/01/18 00:33:08 INFO utils.ServiceApiUtil: Loading service definition from hdfs://mycluster/user/hrt_qa/.yarn/services/httpd-hrt-qa/httpd-hrt-qa.json 18/01/18 00:33:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 class Service { name: httpd-hrt-qa id: application_1516234304810_0001 artifact: null resource: null launchTime: null numberOfRunningContainers: null lifetime: 3600 placementPolicy: null components: [class Component { name: httpd state: STABLE dependencies: [] readinessCheck: null artifact: class Artifact { id: centos/httpd-24-centos7:latest type: DOCKER uri: null } launchCommand: /usr/bin/run-httpd resource: class Resource { profile: null cpus: 1 memory: 1024 additional: null } numberOfContainers: 2 containers: [class Container { id: container_e05_1516234304810_0001_01_02 launchTime: Thu Jan 18 00:19:22 UTC 2018 ip: 172.17.0.2 hostname: httpd-0.httpd-hrt-qa.hrt_qa.test.com bareHost: 5.hwx.site state: READY componentInstanceName: httpd-0 resource: null artifact: null privilegedContainer: null }, class Container { id: container_e05_1516234304810_0001_01_03 launchTime: Thu Jan 18 00:19:23 UTC 2018 ip: 172.17.0.3 hostname: httpd-1.httpd-hrt-qa.hrt_qa.test.com bareHost: 5.hwx.site state: READY componentInstanceName: httpd-1 resource: null artifact: null privilegedContainer: null }] runPrivilegedContainer: false placementPolicy: null configuration: class Configuration { properties: {} env: {} files: [class ConfigFile { type: TEMPLATE destFile: /var/www/html/index.html srcFile: null properties: {content=TitleHello from ${COMPONENT_INSTANCE_NAME}!} }] } quicklinks: [] }, class Component { name: httpd-proxy state: FLEXING dependencies: [] readinessCheck: null artifact: class Artifact { id: centos/httpd-24-centos7:latest type: DOCKER uri: null } launchCommand: /usr/bin/run-httpd resource: class Resource { profile: null cpus: 1 memory: 1024 additional: null } numberOfContainers: 1 containers: [] runPrivilegedContainer: false placementPolicy: null configuration: class Configuration { properties: {} env: {} files: [class ConfigFile { type: TEMPLATE destFile: /etc/httpd/conf.d/httpd-proxy.conf srcFile: httpd-proxy.conf properties: {} }] } quicklinks: [] }] configuration: class Configuration { properties: {} env: {} files: [] } state: STARTED quicklinks: {Apache HTTP Server=http://httpd-proxy-0.httpd-hrt-qa.hrt_qa.test.com:8080} queue: null kerberosPrincipal: class KerberosPrincipal { principalName: null keytab: null {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Created] (YARN-7744) Fix Get status rest api response when application is destroyed
Yesha Vora created YARN-7744: Summary: Fix Get status rest api response when application is destroyed Key: YARN-7744 URL: https://issues.apache.org/jira/browse/YARN-7744 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Priority: Critical Steps: 1) Create a yarn service 2) Destroy a yarn service Run get status for application using REST API: {code} response json = {u'diagnostics': u'Failed to retrieve service: File does not exist: hdfs://mycluster/user/yarn/.yarn/services/httpd-service/httpd-service.json'} status code = 500{code} The REST API should respond with proper json including diagnostics and HTTP status code 404 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7743) UI component placement overlaps on Safari
Yesha Vora created YARN-7743: Summary: UI component placement overlaps on Safari Key: YARN-7743 URL: https://issues.apache.org/jira/browse/YARN-7743 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Yesha Vora Browser: Safari Version 9.1.1 (11601.6.17) Issue with new RM UI: The two tables on application and service page overlaps on Safari browser. Find screenshot for details -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7741) Eliminate extra log statement from yarn app -destroy cli
Yesha Vora created YARN-7741: Summary: Eliminate extra log statement from yarn app -destroy cli Key: YARN-7741 URL: https://issues.apache.org/jira/browse/YARN-7741 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora "Yarn destroy -app " cli prints very long stacktrace from zookeeper client which is not required. This cli prints 44009 characters (38 lines and 358 words) This api should only print the message whether app was destroyed successfully or not. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7740) Fix logging for destroy yarn service cli when app does not exist
Yesha Vora created YARN-7740: Summary: Fix logging for destroy yarn service cli when app does not exist Key: YARN-7740 URL: https://issues.apache.org/jira/browse/YARN-7740 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Scenario: Run "yarn app -destroy" cli with a application name which does not exist. Here, The cli should return a message " Application does not exists" instead it is returning a message "Destroyed cluster httpd-xxx" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7730) Add memory management configs to yarn-default
Yesha Vora created YARN-7730: Summary: Add memory management configs to yarn-default Key: YARN-7730 URL: https://issues.apache.org/jira/browse/YARN-7730 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Priority: Minor Add below configuration and description to yarn-defaults.xml {code} "yarn.nodemanager.resource.memory.enabled" // the default value is false, we need to set to true here to enable the cgroups based memory monitoring. "yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage" // the default value is 90.0f, which means in memory congestion case, the container can still keep/reserve 90% resource for its claimed value. It cannot be set to above 100 or set as negative value. "yarn.nodemanager.resource.memory.cgroups.swappiness" // The percentage that memory can be swapped or not. default value is 0, which means container memory cannot be swapped out. If not set, linux cgroup setting by default set to 60 which means 60% of memory can potentially be swapped out when system memory is not enough.{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7719) [Yarn services] Yarn application logs does not collect all AM log files
Yesha Vora created YARN-7719: Summary: [Yarn services] Yarn application logs does not collect all AM log files Key: YARN-7719 URL: https://issues.apache.org/jira/browse/YARN-7719 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Steps: 1) Run Yarn Service Application such as httpd 2) Gather yarn application log after application is finished The log collection only shows content of container-localizer-syslog. Log collection should also gather below files from AM. * directory.info * launch_container.sh * prelaunch.err * prelaunch.out * serviceam-err.txt * serviceam-out.txt Without these log files, debugging of an failure app becomes impossible. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7717) Add configuration consistency for module.enabled and docker.privileged-containers.enabled
Yesha Vora created YARN-7717: Summary: Add configuration consistency for module.enabled and docker.privileged-containers.enabled Key: YARN-7717 URL: https://issues.apache.org/jira/browse/YARN-7717 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Yesha Vora container-executor.cfg has two properties related to dockerization. 1) module.enabled = true/false 2) docker.privileged-containers.enabled = 1/0 Here, both property takes different value to enable / disable feature. Module enabled take true/false string while docker.privileged-containers.enabled takes 1/0 integer value. This properties behavior should be consistent. Both properties should have true or false string as value to enable or disable feature/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7706) httpd yarn service example fails with "java.lang.IllegalArgumentException: Src_file does not exist for config file: httpd-proxy.conf"
Yesha Vora created YARN-7706: Summary: httpd yarn service example fails with "java.lang.IllegalArgumentException: Src_file does not exist for config file: httpd-proxy.conf" Key: YARN-7706 URL: https://issues.apache.org/jira/browse/YARN-7706 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora steps: * Enable yarn containerization in cluster * Launch httpd example. httpd.json and httpd-proxy.conf file are present at /yarn-service-examples/httpd {code} [hrt_qa@xxx httpd]$ ls -la total 8 drwxr-xr-x. 2 root root 46 Jan 5 02:52 . drwxr-xr-x. 5 root root 51 Jan 5 02:52 .. -rw-r--r--. 1 root root 1337 Jan 1 04:21 httpd.json -rw-r--r--. 1 root root 1065 Jan 1 04:21 httpd-proxy.conf{code} {code} [hrt_qa@xxx yarn-service-examples]$ yarn app -launch httpd-hrtqa httpd/httpd.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/01/05 20:39:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/05 20:39:23 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/01/05 20:39:23 INFO client.ServiceClient: Loading service definition from local FS: /xxx/yarn-service-examples/httpd/httpd.json Exception in thread "main" java.lang.IllegalArgumentException: Src_file does not exist for config file: httpd-proxy.conf at org.apache.hadoop.yarn.service.provider.AbstractClientProvider.validateConfigFiles(AbstractClientProvider.java:105) at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateComponent(ServiceApiUtil.java:224) at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateAndResolveService(ServiceApiUtil.java:189) at org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:213) at org.apache.hadoop.yarn.service.client.ServiceClient.actionLaunch(ServiceClient.java:204) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:447) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111){code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7671) Improve Diagonstic message for stop yarn native service
Yesha Vora created YARN-7671: Summary: Improve Diagonstic message for stop yarn native service Key: YARN-7671 URL: https://issues.apache.org/jira/browse/YARN-7671 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Yesha Vora Steps: 1) Install Hadoop 3.0 cluster 2) Run Yarn service application {code:title=sleeper.json}{ "name": "sleeper-service", "components" : [ { "name": "sleeper", "number_of_containers": 1, "launch_command": "sleep 90", "resource": { "cpus": 1, "memory": "256" } } ] }{code} {code:title=cmd} yarn app -launch my-sleeper1 sleeper.json{code} 3) stop yarn service app {code:title=cmd} yarn app -stop my-sleeper1{code} On stopping yarn service, appId finishes with YarnApplicationState: FINISHED , FinalStatus Reported by AM: ENDED and Diagnostics: Navigate to the failed component for more details. Here, Diagnostics message should be improved. When an application is explicitly stopped by user, the diagnostics message should say " Application stopped by user" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7597) testContainerLogsWithNewAPI and testContainerLogsWithOldAPI UT fails
Yesha Vora created YARN-7597: Summary: testContainerLogsWithNewAPI and testContainerLogsWithOldAPI UT fails Key: YARN-7597 URL: https://issues.apache.org/jira/browse/YARN-7597 Project: Hadoop YARN Issue Type: Test Reporter: Yesha Vora testContainerLogsWithNewAPI and testContainerLogsWithOldAPI UT fails {code} Stacktrace java.util.NoSuchElementException: null at java.util.LinkedList.getFirst(LinkedList.java:244) at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.getFileControllerForWrite(LogAggregationFileControllerFactory.java:149) at org.apache.hadoop.yarn.logaggregation.TestContainerLogsUtils.uploadContainerLogIntoRemoteDir(TestContainerLogsUtils.java:122) at org.apache.hadoop.yarn.logaggregation.TestContainerLogsUtils.createContainerLogFileInRemoteFS(TestContainerLogsUtils.java:96) at org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:541) at org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogsWithNewAPI(TestNMWebServices.java:342){code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7268) testCompareXmlAgainstConfigurationClass fails due to 1 missing property from yarn-default
Yesha Vora created YARN-7268: Summary: testCompareXmlAgainstConfigurationClass fails due to 1 missing property from yarn-default Key: YARN-7268 URL: https://issues.apache.org/jira/browse/YARN-7268 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Yesha Vora {code} Error Message yarn-default.xml has 1 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration Stacktrace java.lang.AssertionError: yarn-default.xml has 1 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:414) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Standard Output File yarn-default.xml (253 properties) yarn-default.xml has 1 properties missing in class org.apache.hadoop.yarn.conf.YarnConfiguration yarn.log-aggregation.file-controller.TFile.class ={code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7175) Log collection fails when a container is acquired but not launched on NM
Yesha Vora created YARN-7175: Summary: Log collection fails when a container is acquired but not launched on NM Key: YARN-7175 URL: https://issues.apache.org/jira/browse/YARN-7175 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Yesha Vora Scenario: * Run Spark App * As soon as spark application finishes, Run "yarn application -status " cli in a loop for 2-3 mins to check Log_aggreagtion status. I'm noticing that log_aggregation status remains in "RUNNING" and eventually ends up with "TIMED_OUT" status. This situation happens when an application has acquired a container but it is not launched on NM. This scenario should be better handled and should not cause this delay to get the application log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7090) testRMRestartAfterNodeLabelDisabled[1] UT Fails
Yesha Vora created YARN-7090: Summary: testRMRestartAfterNodeLabelDisabled[1] UT Fails Key: YARN-7090 URL: https://issues.apache.org/jira/browse/YARN-7090 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Yesha Vora testRMRestartAfterNodeLabelDisabled[1] UT fails with below error. {code} Error Message expected:<[x]> but was:<[]> Stacktrace org.junit.ComparisonFailure: expected:<[x]> but was:<[]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterNodeLabelDisabled(TestRMRestart.java:2408) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7065) [RM UI] App status not getting updated in "All application" page
Yesha Vora created YARN-7065: Summary: [RM UI] App status not getting updated in "All application" page Key: YARN-7065 URL: https://issues.apache.org/jira/browse/YARN-7065 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Scenario: 1) Run Spark Long Running application 2) Do RM and NN failover randomly 3) Validate App state in Yarn The Spark applications are finished. Yarn-cli returns correct status of yarn application. {code} [hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014 17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History server at host1 xxx.xx.xx.x:10200 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1] Application Report : Application-Id : application_1503203977699_0014 Application-Name : org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources Application-Type : SPARK User : hrt_qa Queue : default Application Priority : null Start-Time : 1503215983532 Finish-Time : 1503250203806 Progress : 0% State : FAILED Final-State : FAILED Tracking-URL : https://host1:8090/cluster/app/application_1503203977699_0014 RPC Port : -1 AM Host : N/A Aggregate Resource Allocation : 174722793 MB-seconds, 170603 vcore-seconds Log Aggregation Status : SUCCEEDED Diagnostics : Application application_1503203977699_0014 failed 20 times due to AM Container for appattempt_1503203977699_0014_20 exited with exitCode: 1 For more detailed output, check the application tracking page: https://host1:8090/cluster/app/application_1503203977699_0014 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e04_1503203977699_0014_20_01 Exit code: 1 Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Shell output: main : command provided 1 main : run as user is hrt_qa main : requested yarn user is hrt_qa Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_01/container_e04_1503203977699_0014_20_01.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. Unmanaged Application : false Application Node Label Expression : AM container Node Label Expression : {code} However, RM UI "All application" page still shows the application in "RUNNING" State. https://host1:8090/cluster On clicking application_id ( https://host1:8090/cluster/app/application_1503203977699_0014) , it redirects to application page and there it shows correct application state = Failed. The App status is not getting updated on Yarn All Application page. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6576) Improve Diagonstic by moving Error stack trace from NM to slider AM
Yesha Vora created YARN-6576: Summary: Improve Diagonstic by moving Error stack trace from NM to slider AM Key: YARN-6576 URL: https://issues.apache.org/jira/browse/YARN-6576 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Slider Master diagonstics should improve to show root cause of App failures for issues like missing docker image. Currently, Slider Master log does not show proper error message to debug such failure. User have to access Nodemanager logs to find out root cause of such issues where container failed to start. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6233) FSRMStateStore UT fails with IO Timed out Error
Yesha Vora created YARN-6233: Summary: FSRMStateStore UT fails with IO Timed out Error Key: YARN-6233 URL: https://issues.apache.org/jira/browse/YARN-6233 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora FSRMStateStore UT fails with IO Timed out Error as below. {code:title=test cmd} export MAVEN_OPTS=-Xmx1024m; mvn -B -nsu test -Dtest=TestFifoScheduler,TestFairOrderingPolicy,TestFSAppAttempt,TestFSParentQueue,TestQueueManager,TestFairSchedulerFairShare,TestMaxRunningAppsEnforcer,TestAppRunnability,TestFairSchedulerConfiguration,TestFairSchedulerPreemption,TestSchedulingPolicy,TestComputeFairShares,TestFSLeafQueue,TestFairSchedulerEventLog,TestQueuePlacementPolicy,TestFairSchedulerQueueACLs,TestAllocationFileLoaderService,TestFairScheduler,TestDominantResourceFairnessPolicy,TestEmptyQueues,TestQueueCapacities,TestChildQueueOrder,TestQueueMappings,TestParentQueue,TestCapacitySchedulerNodeLabelUpdate,TestNodeLabelContainerAllocation,TestCapacityScheduler,TestApplicationLimits,TestWorkPreservingRMRestartForNodeLabel,TestReservationQueue,TestApplicationLimitsByPartition,TestCapacitySchedulerDynamicBehavior,TestQueueParsing,TestCapacitySchedulerLazyPreemption,TestContainerAllocation,TestLeafQueue,TestCapacitySchedulerSurgicalPreemption,TestReservations,TestCapacitySchedulerQueueACLs,TestUtils,TestPriorityUtilizationQueueOrderingPolicy,TestRMApplicationHistoryWriter,TestResources,TestResourceWeights,TestRMNMRPCResponseId,TestNMReconnect,TestNMExpiry,TestLeveldbRMStateStore,TestZKRMStateStore,TestMemoryRMStateStore,TestFSRMStateStore,TestZKRMStateStoreZKClientConnections,TestSystemMetricsPublisher,TestSimpleCapacityReplanner,TestInMemoryPlan,TestNoOverCommitPolicy,TestRLESparseResourceAllocation,TestCapacitySchedulerPlanFollower,TestInMemoryReservationAllocation,TestSchedulerPlanFollowerBase,TestGreedyReservationAgent,TestReservationInputValidator,TestRpcCall --projects :hadoop-yarn-server-resourcemanager,:hadoop-nfs{code} {code} Results : Tests in error: TestFSRMStateStore.testFSRMStateStoreClientRetry:385 » test timed out after 3... TestFSRMStateStore.testFSRMStateStore:168 » IO Timed out waiting for Mini HDFS... Tests run: 487, Failures: 0, Errors: 2, Skipped: 2 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop NFS .. SUCCESS [ 4.172 s] [INFO] hadoop-yarn-server-resourcemanager . FAILURE [21:57 min] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 22:05 min [INFO] Finished at: 2017-02-23T21:33:03+00:00 [INFO] Final Memory: 53M/873M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-yarn-server-resourcemanager: There are test failures. [ERROR] [ERROR] Please refer to /xxx/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-yarn-server-resourcemanager{code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6220) Few TestSecureRMRegistryOperations UT fails
Yesha Vora created YARN-6220: Summary: Few TestSecureRMRegistryOperations UT fails Key: YARN-6220 URL: https://issues.apache.org/jira/browse/YARN-6220 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora 8 Tests from TestSecureRMRegistryOperations fails as below. * testAlicePathRestrictedAnonAccess * testAnonNoWriteAccess * testAnonNoWriteAccessOffRoot * testAnonReadAccess * testDigestAccess * testUserHomedirsPermissionsRestricted * testUserZookeeperHomePathAccess * testZookeeperCanWriteUnderSystem {code} java.lang.reflect.UndeclaredThrowableException: null at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at java.util.concurrent.FutureTask.get(FutureTask.java:119) at org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations$1.run(TestSecureRMRegistryOperations.java:107) at org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations$1.run(TestSecureRMRegistryOperations.java:98) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations.startRMRegistryOperations(TestSecureRMRegistryOperations.java:97) at org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations.testAnonNoWriteAccess(TestSecureRMRegistryOperations.java:148){code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6219) NM web server related UT fails with "NMWebapps failed to start."
Yesha Vora created YARN-6219: Summary: NM web server related UT fails with "NMWebapps failed to start." Key: YARN-6219 URL: https://issues.apache.org/jira/browse/YARN-6219 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora TestNodeStatusUpdater.testCompletedContainerStatusBackup and TestNMWebServer UT fails with NMWebapps failed to start. {code} Error Message NMWebapps failed to start. Stacktrace org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to start. at org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices.(NMWebServices.java:108) at org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$$FastClassByGuice$$84485dc9.newInstance() at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.Scopes$1$1.get(Scopes.java:65) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40) at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024) at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974) at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013) at com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory$GuiceInstantiatedComponentProvider.getInstance(GuiceComponentProviderFactory.java:332) at com.sun.jersey.server.impl.component.IoCResourceFactory$SingletonWrapper.init(IoCResourceFactory.java:178) at com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:584) at com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:581) at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193) at com.sun.jersey.server.impl.application.WebApplicationImpl.getResourceComponentProvider(WebApplicationImpl.java:581) at com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:658) at com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:653) at com.sun.jersey.server.impl.application.RootResourceUriRules.(RootResourceUriRules.java:124) at com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1298) at com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:169) at com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:775) at com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:771) at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193) at com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:771) at com.sun.jersey.guice.spi.container.servlet.GuiceContainer.initiate(GuiceContainer.java:121) at com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:318) at com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:609) at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:210) at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373) at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:710) at com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114) at com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98) at com.google.inject.servlet.GuiceFilter.init(GuiceFilter.java:172) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at
[jira] [Created] (YARN-6189) Improve application status log message when RM restarted when app is in NEW state
Yesha Vora created YARN-6189: Summary: Improve application status log message when RM restarted when app is in NEW state Key: YARN-6189 URL: https://issues.apache.org/jira/browse/YARN-6189 Project: Hadoop YARN Issue Type: Improvement Reporter: Yesha Vora When RM restart/failover happens when application is in NEW state, application status command for that application prints below stacktrace. Improve exception message to less confusion to say something like: "application is not unknown, may be previous submission is not successful." {code} hrt_qa@:/root> yarn application -status application_1470379565464_0001 16/08/05 17:24:29 INFO impl.TimelineClientImpl: Timeline service address: https://hostxxx:8190/ws/v1/timeline/ 16/08/05 17:24:30 INFO client.AHSProxy: Connecting to Application History server at hostxxx/xxx:10200 16/08/05 17:24:31 WARN retry.RetryInvocationHandler: Exception while invoking ApplicationClientProtocolPBClientImpl.getApplicationReport over rm1. Not retrying because try once and fail. org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1470379565464_0001' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy18.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:436) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:481) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:160) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:83) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1470379565464_0001' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at
[jira] [Created] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS
Yesha Vora created YARN-6137: Summary: Yarn client implicitly invoke ATS client which accesses HDFS Key: YARN-6137 URL: https://issues.apache.org/jira/browse/YARN-6137 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Yarn is implicitly trying to invoke ATS Client even though client does not need it. and ATSClient code is trying to access hdfs. Due to that service is hitting GSS exception. Yarnclient is implicitly creating ats client that tries to access Hdfs. All servers that use yarnclient cannot be expected to change to accommodate this behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5941) Slider handles "site.mawo-site.per.component" for multiple components incorrectly
Yesha Vora created YARN-5941: Summary: Slider handles "site.mawo-site.per.component" for multiple components incorrectly Key: YARN-5941 URL: https://issues.apache.org/jira/browse/YARN-5941 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora When multiple components are started by slider and each component should have a different property file, "per.component" should be set to true for each component. {code:title=component1} 'properties': { 'site.app-site.job-builder.class': 'xxx', 'site.app-site.rpc.server.hostname': 'xxx', 'site.app-site.per.component': 'true' } {code} {code:title=component2} 'properties': { 'site.app-site.job-builder.class.component2': 'yyy', 'site.app-site.rpc.server.hostname.component2': 'yyy', 'site.app-site.per.component': 'true' } {code} While doing that, one of the component's property file gets "per.component"="true" in the slider generated property file. {code:title=property file for component1} #Generated by Apache Slider #Tue Nov 29 23:20:25 UTC 2016 per.component=true job-builder.class=xxx rpc.server.hostname=xxx{code} {code:title=property file for component2} #Generated by Apache Slider #Tue Nov 29 23:20:25 UTC 2016 job-builder.class.component2=yyy rpc.server.hostname.component2=yyy{code} "per.component" should not be added in any component's property file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5497) Use different color for Undefined and Succeeded final state in application page
Yesha Vora created YARN-5497: Summary: Use different color for Undefined and Succeeded final state in application page Key: YARN-5497 URL: https://issues.apache.org/jira/browse/YARN-5497 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Yesha Vora Priority: Trivial When application is in Running state, Final status value is set to "Undefined" When application is succeeded , Final status value is set to "SUCCEEDED". Yarn UI use same green color for both the above final status. It will be good to have different colors for each final status value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5496) Make Node Heatmap Chart categories clickable
Yesha Vora created YARN-5496: Summary: Make Node Heatmap Chart categories clickable Key: YARN-5496 URL: https://issues.apache.org/jira/browse/YARN-5496 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Make Node Heatmap Chart categories clickable. This Heatmap chart has few categories like 10% used, 30% used etc. This tags should be clickable. If user clicks on 10% used tag, it should shows hosts with 10% usage. This can be a useful feature for clusters having 1000s of nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5494) Nodes page throws "Sorry Error Occurred" message
Yesha Vora created YARN-5494: Summary: Nodes page throws "Sorry Error Occurred" message Key: YARN-5494 URL: https://issues.apache.org/jira/browse/YARN-5494 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Priority: Critical Steps to reproduce: * Click on Nodes. This page will list nodes of the cluster * Click on one the nodes such as node1. ( It will redirect to http://:4200/#/yarn-node/:31924/:8042 url) This url prompts "Sorry Error Occurred" error. {code}jquery.js:8630 XMLHttpRequest cannot load http://xxx:xxx:8042/ws/v1/node. Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https, chrome-extension-resource.send @ jquery.js:8630 ember.debug.js:30877 Error: Adapter operation failed at new Error (native) at Error.EmberError (http://xxx:4200/assets/vendor.js:25278:21) at Error.ember$data$lib$adapters$errors$$AdapterError (http://xxx:4200/assets/vendor.js:91198:50) at Class.handleResponse (http://xxx:4200/assets/vendor.js:92494:16) at Class.hash.error (http://xxx:4200/assets/vendor.js:92574:33) at fire (http://xxx:4200/assets/vendor.js:3306:30) at Object.fireWith [as rejectWith] (http://xxx:4200/assets/vendor.js:3418:7) at done (http://xxx:4200/assets/vendor.js:8473:14) at XMLHttpRequest. (http://xxx:4200/assets/vendor.js:8806:9) at Object.send (http://xxx:4200/assets/vendor.js:8837:10)onerrorDefault @ ember.debug.js:30877{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5493) In leaf queue page, list applications should only show applications from that leaf queues
Yesha Vora created YARN-5493: Summary: In leaf queue page, list applications should only show applications from that leaf queues Key: YARN-5493 URL: https://issues.apache.org/jira/browse/YARN-5493 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Steps to reproduce: * Create a 2 queues * Go to leaf queue page at http://:/#/yarn-queue-apps/ url * click on application list. Here, it list down all the applications. Instead , It should list down only applications from that particular leaf queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org