[jira] [Created] (YARN-9707) [UI2] App Attempt state data is missing

2019-07-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9707:


 Summary: [UI2] App Attempt state data is missing
 Key: YARN-9707
 URL: https://issues.apache.org/jira/browse/YARN-9707
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Steps:

1) Launch a Dshell application or Yarn service application.
2) Go to app attempt page Grid view. State column shows N/A.

Yarn UI1 shows app attempt state for Running and Finished application. This 
ability is missing from UI2.

UI2 is using below rest call. This rest call does not show the app attempt 
state details.

{code:title=ws/v1/cluster/apps/application_1563946396350_0002/appattempts?_=1564004553389}
115640045242901564004541852container_1563946396350_0002_01_01xx:yyxx:yyhttp://ixx:yy/node/containerlogs/container_1563946396350_0002_01_01/hrt_qaappattempt_1563946396350_0002_01{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9706) [UI2] App Attempt state missing from Graph view

2019-07-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9706:


 Summary: [UI2] App Attempt state missing from Graph view
 Key: YARN-9706
 URL: https://issues.apache.org/jira/browse/YARN-9706
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora



1) Launch a Dshell application or Yarn service application.
2) Go to app attempt page Grid view. State column shows N/A.
3) Go to app attempt Graph view. State data is not present on this page.

Apparently, app attempt data is only shown in Grid view. Grid and Graph view 
should show the same details.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9705) [UI2] AM Node Web UI should not display full link

2019-07-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9705:


 Summary: [UI2] AM Node Web UI should not display full link
 Key: YARN-9705
 URL: https://issues.apache.org/jira/browse/YARN-9705
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


App Attempt page shows AM Node Web UI. It displays the full link. 
It should not print out full link as display text. Rather, It should use 
display AM Node name which links to the node.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9704) [UI2] Fix Pending, Allocated, Reserved Containers information for Fair Scheduler

2019-07-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9704:


 Summary: [UI2] Fix Pending, Allocated, Reserved Containers 
information for Fair Scheduler
 Key: YARN-9704
 URL: https://issues.apache.org/jira/browse/YARN-9704
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


YARN UI2 shows "Pending, Allocated, Reserved Containers" information for fair 
scheduler. In here, pending container information is not printed. UI2 shows 
",0,0" instead "0,0,0".

In UI1, This same information is displayed as Num of active Application & 
Pending applications.

Num Active Applications:0
Num Pending Applications:   0

It's not clear from UI2 what do we intend to show in "Pending, Allocated, 
Reserved Containers"? Is it really containers or apps?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9609) Nodemanager Web Service should return logAggregationType for each file

2019-06-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9609:


 Summary: Nodemanager Web Service should return logAggregationType 
for each file
 Key: YARN-9609
 URL: https://issues.apache.org/jira/browse/YARN-9609
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.1.1
Reporter: Yesha Vora


Steps:
1) Launch sleeper yarn service
2) When sleeper component is in READY state, call NM web service to list the 
container files and its log aggregation status.
http://NMHost:NMPort/ws/v1/node/containers/CONTAINERID/logs

NM web service response shows a common log aggregation type response for all 
files.
Instead, NM web service should return a log aggregation type for each file.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9570) pplication in pending-ordering-policy is not considered while container allocation

2019-05-20 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-9570:


 Summary: pplication in pending-ordering-policy is not considered 
while container allocation
 Key: YARN-9570
 URL: https://issues.apache.org/jira/browse/YARN-9570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Yesha Vora


This is 5 node cluster with total 15GB capacity.

1) Configure Capacity scheduler and set max cluster priority=10
2) launch app1 with no priority and wait for it to occupy full cluster
application_1558135983180_0001 is launched with Priority=0
3) launch app2 with priority=2 and check its in ACCEPTED state
application_1558135983180_0002 is launched with Priority=2
4) launch app3 with priority=3 and check its in ACCEPTED state
application_1558135983180_0003 is launched with Priority=2
5) kill container from app1
6) Verify app3 with higher priority goes to RUNNING state.

When max-application-master-percentage is set to 0.1, app2 goes to RUNNING 
state even though app3 has higher priority.

Root cause:
In CS LeafQueue, there's two ordering list:

If the queue's total application master usage below 
maxAMResourcePerQueuePercent, the app will be added to the "ordering-policy" 
list.
Otherwise, the app will be added to the "pending-ordering-policy" list.
During allocation, only apps in "ordering-policy" are considered. 
If there's any app finish, or queue config changed, or node add/remove happen, 
"pending-ordering-policy" will be reconsidered, and some apps from 
"pending-ordering-policy" will be added to "ordering-policy".

This behavior leads to the issue of this JIRA:

The cluster has 15GB resource, the max-application-master-percentage is set to 
0.1. So it can at most accept 2GB (rounded by 1GB) AM resource, which equals to 
2 applications.
When app2 submitted, it will be added to ordering-policy.
When app3 submitted, it will be added to pending-ordering-policy.
When we kill app1, it won't finish immediately. Instead, it will still be part 
of "odering-policy" until all containers of app1 released. (That makes app3 
stays in pending-ordering-policy).
So any resource released by app1, app3 cannot pick up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8913) Add helper scripts to launch MaWo App to run Hadoop unit tests on Hadoop Cluster

2018-10-18 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8913:


 Summary: Add helper scripts to launch MaWo App to run Hadoop unit 
tests on Hadoop Cluster
 Key: YARN-8913
 URL: https://issues.apache.org/jira/browse/YARN-8913
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


MaWo application can be used to run Hadoop UT faster in a Hadoop cluster.
 Develop helper scripts to orchestrate end-to-end workflow for running Hadoop 
UT using MaWo app.

Pre-requisite:
 * A Hadoop Cluster with HDFS and YARN installed
 * Enable Docker on YARN feature

 

Helper-scripts
 * MaWo_Driver
 ** create a docker image with latest hadoop source code
 ** create payload to MaWo app (This is input to mawo app where Each MaWo Task 
= UT execution of each Hadoop Module)
 ** Upload payload file to HDFS
 ** Update MaWo-Launch.json to resolve RM_HOST / Docker Image etc dynamically
 ** Launch MaWo app in Hadoop cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8912) Fix MaWo_Config to read WORKER_WORK_SPACE and MASTER_TASKS_STATUS_LOG_PATH from env

2018-10-18 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8912:


 Summary: Fix MaWo_Config to read WORKER_WORK_SPACE and 
MASTER_TASKS_STATUS_LOG_PATH from env
 Key: YARN-8912
 URL: https://issues.apache.org/jira/browse/YARN-8912
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Fix MaWo Configuration to read MASTER_TASKS_STATUS_LOG_PATH and 
WORKER_WORK_SPACE from env. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8901) Restart "NEVER" policy does not work with component dependency

2018-10-17 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8901:


 Summary: Restart "NEVER" policy does not work with component 
dependency
 Key: YARN-8901
 URL: https://issues.apache.org/jira/browse/YARN-8901
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Scenario:

1) Launch an application with two components. master and worker. Here, worker 
is dependent on master. ( Worker should be launched only after master is 
launched )
2) Set restart_policy = NEVER for both master and worker. 

{code:title=sample launch.json}
{
"name": "mawo-hadoop-ut",
"artifact": {
"type": "DOCKER",
"id": "xxx"
},
"configuration": {
"env": {
   "YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK": 
"hadoop"
 },
"properties": {
   "docker.network": "hadoop"
}
},
"components": [{
"dependencies": [],
"resource": {
"memory": "2048",
"cpus": "1"
},
"name": "master",
"run_privileged_container": true,
"number_of_containers": 1,
"launch_command": "start master",
"restart_policy": "NEVER",
}, {
"dependencies": ["master"],
"resource": {
"memory": "8072",
"cpus": "1"
},
"name": "worker",
"run_privileged_container": true,
"number_of_containers": 10,
"launch_command": "start worker",
"restart_policy": "NEVER",
}],
"lifetime": -1,
"version": 1.0
}{code}

When restart policy is selected to NEVER, AM never launches Worker component. 
It get stuck with below message. 
{code}
2018-10-17 15:11:58,560 [Component  dispatcher] INFO  component.Component - 
[COMPONENT master] Transitioned from FLEXING to STABLE on CHECK_STABLE event.
2018-10-17 15:11:58,560 [pool-7-thread-1] INFO  instance.ComponentInstance - 
[COMPINSTANCE master-0 : container_e41_1539027682947_0020_01_02] 
Transitioned from STARTED to READY on BECOME_READY event
2018-10-17 15:11:58,560 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed 
2018-10-17 15:12:28,556 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed 
2018-10-17 15:12:58,556 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed 
2018-10-17 15:13:28,556 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed 
2018-10-17 15:13:58,556 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed 
2018-10-17 15:14:28,556 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT worker]: Dependency master not satisfied, only 1 of 1 instances are 
ready or the dependent component has not completed {code}

'NEVER' restart policy expects master component to be finished before starting 
workers. Master component can not finish the job without workers. Thus, it 
create a deadlock.

The logic for 'NEVER' restart policy should be fixed to allow worker components 
to be launched as soon as master component is in READY state. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8754:


 Summary: [UI2] Improve terms on Component Instance page 
 Key: YARN-8754
 URL: https://issues.apache.org/jira/browse/YARN-8754
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Component instance page has "node" and "host". These two fields are 
representing "bare_host" and "hostname" accordingly. 

>From UI2 page thats not clear. Thus, table content need to be changed to "bare 
>host" from "node" .

This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8753:


 Summary: [UI2] Lost nodes representation missing from Nodemanagers 
Chart
 Key: YARN-8753
 URL: https://issues.apache.org/jira/browse/YARN-8753
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.1
Reporter: Yesha Vora


Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status page. 
This chart does not show nodemanagers if they are LOST. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8666) Remove application tab from Yarn Queue Page

2018-08-14 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8666:


 Summary: Remove application tab from Yarn Queue Page
 Key: YARN-8666
 URL: https://issues.apache.org/jira/browse/YARN-8666
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Yarn UI2 Queue page puts Application button. This button does not redirect to 
any other page. In addition to that running application table is also available 
on same page. 

Thus, there is no need to have a button for application in Queue page. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8629) Container cleanup failed

2018-08-06 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8629:


 Summary: Container cleanup failed
 Key: YARN-8629
 URL: https://issues.apache.org/jira/browse/YARN-8629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


When an application failed to launch container successfully, the cleanup of 
container also failed with below message.
{code}
2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
(CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
tasks file.
java.io.FileNotFoundException: 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
 (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
(CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8599) Build Master module for MaWo app

2018-07-27 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8599:


 Summary: Build Master module for MaWo app
 Key: YARN-8599
 URL: https://issues.apache.org/jira/browse/YARN-8599
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Master component for MaWo application is responsible for driving end-to-end job 
execution. Its responsibility is
 * Get Job definition and create a Queue of Tasks
 * Assign Tasks to Worker
 * Manage Workers lifecycle 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8598) Build Master Job Module for MaWo Application

2018-07-27 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8598:


 Summary: Build Master Job Module for MaWo Application
 Key: YARN-8598
 URL: https://issues.apache.org/jira/browse/YARN-8598
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


A job in MaWo application is a collection of Tasks. A Job consists of a setup 
task, a list of tasks and a teardown task. 
 * JobBuilder
 ** SimpleTaskJobBuilder : SimpleJobBuilder should be able to parse 
simpleJobdescription file. In this file format, each line is considered as Task
 ** SimpleTaskJsonJobBuilder: Utility to parse json job description file. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8597) Build Worker utility for MaWo Application

2018-07-27 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8597:


 Summary: Build Worker utility for MaWo Application
 Key: YARN-8597
 URL: https://issues.apache.org/jira/browse/YARN-8597
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


The worker is responsible for executing Tasks. 
 * Worker
 ** Create a worker class which drives worker life cycle
 ** Create WorkAssignment Protocol. It should be handle Register/deregister 
worker, send heartbeat 
 ** Lifecycle: Register worker, Run Setup Task, Get Task from master and 
execute it using TaskRunner, Run Teardown Task
 *  TaskRunner
 ** Simple Task Runner : This runner should be able to execute a simple task
 ** Composite Task Runner: This runner should be able to execute composite task
 * TaskWallTimeLimiter
 ** Create a utility which can abort the task if the execution time exceeds 
task timeout. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8587) Delays are noticed to launch docker container

2018-07-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8587:


 Summary: Delays are noticed to launch docker container
 Key: YARN-8587
 URL: https://issues.apache.org/jira/browse/YARN-8587
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Yesha Vora


Launch dshell application. Wait for application to go in RUNNING state.
{code:java}
yarn  jar /xx/hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
"sleep 300" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=httpd:0.1 -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-xx.jar
{code}
Find out container allocation. Run docker inspect command for docker containers 
launched by app.

Sometimes, the container is allocated to NM but docker PID is not up.
{code:java}
Command ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null xxx 
"sudo su - -c \"docker ps  -a | grep 
container_e02_1531189225093_0003_01_02\" root" failed after 0 retries 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8580) yarn.resourcemanager.am.max-attempts is not respected for yarn services

2018-07-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8580:


 Summary: yarn.resourcemanager.am.max-attempts is not respected for 
yarn services
 Key: YARN-8580
 URL: https://issues.apache.org/jira/browse/YARN-8580
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.1
Reporter: Yesha Vora


1) Max am attempt is set to 100 on all nodes. ( including gateway)
{code}
 
  yarn.resourcemanager.am.max-attempts
  100
{code}
2) Start a Yarn service ( Hbase tarball ) application
3) Kill AM 20 times

Here, App fails with below diagnostics.

{code}
bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status 
application_1532481557746_0001
18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml at 
file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml
Application Report : 
Application-Id : application_1532481557746_0001
Application-Name : hbase-tarball-lr
Application-Type : yarn-service
User : hbase
Queue : default
Application Priority : 0
Start-Time : 1532481864863
Finish-Time : 1532522943103
Progress : 100%
State : FAILED
Final-State : FAILED
Tracking-URL : 
https://xxx:8090/cluster/app/application_1532481557746_0001
RPC Port : -1
AM Host : N/A
Aggregate Resource Allocation : 252150112 MB-seconds, 164141 
vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : SUCCEEDED
Diagnostics : Application application_1532481557746_0001 failed 20 
times (global limit =100; local limit is =20) due to AM Container for 
appattempt_1532481557746_0001_20 exited with  exitCode: 137
Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed on 
request. Exit code is 137
[2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. 
[2018-07-25 12:49:03.045]Killed by external signal
For more detailed output, check the application tracking page: 
https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on links 
to logs of each attempt.
. Failing the application.
Unmanaged Application : false
Application Node Label Expression : 
AM container Node Label Expression : 
TimeoutType : LIFETIME  ExpiryTime : 2018-07-25T22:26:15.419+   
RemainingTime : 0seconds
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8579:


 Summary: New AM attempt could not retrieve previous attempt 
component data
 Key: YARN-8579
 URL: https://issues.apache.org/jira/browse/YARN-8579
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Yesha Vora


Steps:
1) Launch httpd-docker
2) Wait for app to be in STABLE state
3) Run validation for app (It takes around 3 mins)
4) Stop all Zks 
5) Wait 60 sec
6) Kill AM
7) wait for 30 sec
8) Start all ZKs
9) Wait for application to finish
10) Validate expected containers of the app

Expected behavior:
New attempt of AM should start and docker containers launched by 1st attempt 
should be recovered by new attempt.

Actual behavior:
New AM attempt starts. It can not recover 1st attempt docker containers. It can 
not read component details from ZK. 
Thus, it starts new attempt for all containers.

{code}
2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into registry
2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
containers from previous attempt.
2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not read 
component paths: 
`/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': 
No such file or directory: KeeperErrorCode = NoNode for 
/registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
container_e08_1531977563978_0015_01_03 from previous attempt
2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
found in registry for container container_e08_1531977563978_0015_01_03 from 
previous attempt, releasing
2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
initial evaluation of component httpd
2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
httpd]: 2 instances.
2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
Requesting for 2 container(s){code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8551) Build Common module for MaWo application

2018-07-18 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8551:


 Summary: Build Common module for MaWo application
 Key: YARN-8551
 URL: https://issues.apache.org/jira/browse/YARN-8551
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Build Common module for  MaWo application.
This module should include defination of Task. A Task should contain
* TaskID
* Task Command
* Task Environment
* Task Timeout 
* Task Type
** Simple Task
*** Its a single Task 
** Composite Task
*** Its a composition of multiple simple tasks
** Die Task
*** Its a last task to be executed after a job is finished
** Null Task
*** Its a null task







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-12 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8522:


 Summary: Application fails with InvalidResourceRequestException
 Key: YARN-8522
 URL: https://issues.apache.org/jira/browse/YARN-8522
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Launch multiple streaming app simultaneously. Here, sometimes one of the 
application fails with below stack trace.

{code}
18/07/02 07:14:32 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From xx.xx.xx.xx/xx.xx.xx.xx to xx.xx.xx.xx:8032 failed on connection 
exception: java.net.ConnectException: Connection refused; For more details see: 
 http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
after sleeping for 3ms.
18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: Invocation 
returned exception: 
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, only one resource request with * is allowed
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
 on [rm2], so propagating back to caller.
18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/hrt_qa/.staging/job_1530515284077_0007
18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, only one resource request with * is allowed
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8485) Priviledged container app launch is failing intermittently

2018-06-30 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8485:


 Summary: Priviledged container app launch is failing intermittently
 Key: YARN-8485
 URL: https://issues.apache.org/jira/browse/YARN-8485
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Privileged application fails intermittently 
{code:java}
yarn  jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
  -shell_command "sleep 30" -num_containers 1 -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
Here,  container launch fails with 'Privileged containers are disabled' even 
though Docker privilege container is enabled in the cluster
{code:java|title=nm log}
2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
(DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - All 
checks pass. Launching privileged container for : 
container_e01_1530220647587_0001_01_02
2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
container_e01_1530220647587_0001_01_02 is : 29
2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
container-launch with container ID: container_e01_1530220647587_0001_01_02 
and exit code: 29
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Launch container failed
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e01_1530220647587_0001_01_02
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exit code: 29
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
failed
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: check privileges 
failed for user: hrt_qa, error code: 0
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
for user: hrt_qa
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
docker error code=11, error message='Privileged containers are disabled'
2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) -
2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 
4
2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : run as user is hrt_qa
2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : requested yarn user is hrt_qa
2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating script paths...
2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating local dirs...
2018-06-28 

[jira] [Created] (YARN-8465) Dshell docker container gets marked as lost after NM restart

2018-06-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8465:


 Summary: Dshell docker container gets marked as lost after NM 
restart
 Key: YARN-8465
 URL: https://issues.apache.org/jira/browse/YARN-8465
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.1
Reporter: Yesha Vora


scenario:
1) launch dshell application
{code}
yarn  jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar
  -shell_command "sleep 500" -num_containers 2 -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code}

2) wait for app to be in stable state ( 
container_e01_1529968198450_0001_01_02 is running on host7 and 
container_e01_1529968198450_0001_01_03 is running on host5)
3) restart NM (host7)

Here, dshell application fails with below error
{code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report 
from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, 
service:  }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, 
appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, 
distributedFinalState=UNDEFINED, 
appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
appUser=hbase
18/06/25 23:35:31 INFO distributedshell.Client: Got application report from ASM 
for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: desired 
= 2, completed = 2, allocated = 2, failed = 1, diagnostics = [2018-06-25 
23:35:28.000]Container exited with a non-zero exit code 154
[2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154
, appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, 
appStartTime=1529969211776, yarnAppState=FINISHED, 
distributedFinalState=FAILED, 
appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
appUser=hbase
18/06/25 23:35:31 INFO distributedshell.Client: Application did finished 
unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
loop
18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to complete 
successfully{code}

Here, the docker container marked as LOST after completion
{code}
2018-06-25 23:35:27,970 WARN  runtime.DockerLinuxContainerRuntime 
(DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker 
container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Liveliness check failed for PID: 423695. Container may have already completed.
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905)
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-06-25 23:35:27,975 WARN  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:signalContainer(762)) - Error in signalling 
container 423695 with NULL; exit = -1
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Signal docker container failed
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1036)
at 

[jira] [Created] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-06-14 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8429:


 Summary: Improve diagnostic message when artifact is not set 
properly
 Key: YARN-8429
 URL: https://issues.apache.org/jira/browse/YARN-8429
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Yesha Vora


Steps:

1) Create launch json file. Replace "artifact" with "artifacts"
2) launch yarn service app with cli

The application launch fails with below error
{code}
[xxx xxx]$ yarn app -launch test2-2 test.json 
18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition from 
local FS: /xxx/test.json
18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be absolute 
path: /xxx/xxx
{code}

artifact field is not mandatory. However, If that field is specified 
incorrectly, launch cmd should fail with proper error. 
Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8413) Flow activity page is failing with "Timeline server failed with an error"

2018-06-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8413:


 Summary: Flow activity page is failing with "Timeline server 
failed with an error"
 Key: YARN-8413
 URL: https://issues.apache.org/jira/browse/YARN-8413
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.1
Reporter: Yesha Vora


Flow activity page is fail to load with "Timeline server failed with an error"

This page uses incorrect flow call 
"https://localhost:8188/ws/v2/timeline/flows?_=1528755339836; and it is failing 
to load.

1) Its using localhost instead ATS v2 hostname
2) Its using ATS v1.5 http port instead ATS v2 https port

The correct rest call is "https://:/ws/v2/timeline/flows?_=1528755339836"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-08 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8409:


 Summary: ActiveStandbyElectorBasedElectorService is failing with 
NPE
 Key: YARN-8409
 URL: https://issues.apache.org/jira/browse/YARN-8409
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Yesha Vora


In RM-HA env, kill ZK leader and then perform RM failover. 

Sometimes, active RM gets NPE and fail to come up successfully
{code:java}

2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
(ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL mechanism.

2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
(ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
'Client'

2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1146)) 
- Session 0x0 for server null, unexpected error, closing socket connection and 
attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

2018-06-08 10:31:03,344 INFO  service.AbstractService 
(AbstractService.java:noteFailure(267)) - Service 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
 failed in state INITED

java.lang.NullPointerException

at 
org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)

at 
org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)

at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)

at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)

at 
org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)

at 
org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)

at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)

at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)

at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)

at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)

2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
(ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-06-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8407:


 Summary: Container launch exception in AM log should be printed in 
ERROR level
 Key: YARN-8407
 URL: https://issues.apache.org/jira/browse/YARN-8407
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


when a container launch is failing due to docker image not available is logged 
as INFO level in AM log. 
Container launch failure should be logged as ERROR.

Steps:
launch httpd yarn-service application with invalid docker image

 
{code:java}
2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
container_e05_1528335963594_0001_01_02]: 
container_e05_1528335963594_0001_01_02 completed. Reinsert back to pending 
list and requested a new container.

exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
container-launch.

Container id: container_e05_1528335963594_0001_01_02

Exit code: 7

Exception message: Launch container failed

Shell error output: Unable to find image 'xxx/httpd:0.1' locally

Trying to pull repository xxx/httpd ...

/usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on yyy: 
no such host.

See '/usr/bin/docker-current run --help'.


Shell output: main : command provided 4

main : run as user is hbase

main : requested yarn user is hbase

Creating script paths...

Creating local dirs...

Getting exit code file...

Changing effective user to root...

Wrote the exit code 7 to 
/grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode

[2018-06-07 01:51:02.393]Diagnostic message from attempt :

[2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
4096 bytes of stderr.txt :

[2018-06-07 01:51:32.428]Could not find 
nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
 in any of the directories

2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8386) App log can not be viewed from Logs tab in secure cluster

2018-05-31 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8386:


 Summary:  App log can not be viewed from Logs tab in secure cluster
 Key: YARN-8386
 URL: https://issues.apache.org/jira/browse/YARN-8386
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


App Logs can not be viewed from UI2 logs tab.

Steps:
1) Launch yarn service 
2) Let application finish and go to Logs tab to view AM log

Here, service am api is failing with 401 authentication error.

{code}
Request URL: 
http://xxx:8188/ws/v1/applicationhistory/containers/container_e09_1527737134553_0034_01_01/logs/serviceam.log?_=1527799590942
Request Method: GET
Status Code: 401 Authentication required
 Response 
html>


Error 401 Authentication required

HTTP ERROR 401
Problem accessing 
/ws/v1/applicationhistory/containers/container_e09_1527737134553_0034_01_01/logs/serviceam.log.
 Reason:
Authentication required



[jira] [Created] (YARN-8368) yarn app start cli should print applicationId

2018-05-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8368:


 Summary: yarn app start cli should print applicationId
 Key: YARN-8368
 URL: https://issues.apache.org/jira/browse/YARN-8368
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


yarn app start cli should print the application Id similar to yarn launch cmd.
{code:java}
bash-4.2$ yarn app -start hbase-app-test

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

18/05/24 15:15:53 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xxx:8050

18/05/24 15:15:54 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xxx:8050

18/05/24 15:15:55 INFO client.ApiServiceClient: Service hbase-app-test is 
successfully started.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8330) An extra container got launched by RM for yarn-service

2018-05-19 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8330:


 Summary: An extra container got launched by RM for yarn-service
 Key: YARN-8330
 URL: https://issues.apache.org/jira/browse/YARN-8330
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Steps:
launch Hbase tarball app
list containers for hbase tarball app

{code}
/usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
appattempt_1525463491331_0006_01
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
Total number of containers :5
Container-IdStart Time Finish Time   
StateHost   Node Http Address   
 LOG-URL
container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018
   N/A RUNNINGxxx:25454  http://xxx:8042
http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
Fri May 04 22:34:26 + 2018   N/A 
RUNNINGxxx:25454  http://xxx:8042
http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
Fri May 04 22:34:15 + 2018   N/A 
RUNNINGxxx:25454  http://xxx:8042
http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
Fri May 04 22:34:56 + 2018   N/A 
RUNNINGxxx:25454  http://xxx:8042
http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
Fri May 04 22:34:56 + 2018   N/A
nullxxx:25454  http://xxx:8042
http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}

Total expected containers = 4 ( 3 components container + 1 am). Instead, RM is 
listing 5 containers. 
container_e06_1525463491331_0006_01_04 is in null state.
Yarn service utilized container 02, 03, 05 for component. There is no log 
available in NM & AM related to container 04. Only one line in RM log is printed
{code}
2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(489)) - container_e06_1525463491331_0006_01_04 
Container Transitioned from NEW to RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8318) Ipaddress in component page shows N/A

2018-05-17 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8318:


 Summary: Ipaddress in component page shows N/A
 Key: YARN-8318
 URL: https://issues.apache.org/jira/browse/YARN-8318
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Component page shows IP address value as N/A. It should print IP address of 
docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8316) Diagnostic message should improve when yarn service fails to launch due to ATS unavailability

2018-05-17 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8316:


 Summary: Diagnostic message should improve when yarn service fails 
to launch due to ATS unavailability
 Key: YARN-8316
 URL: https://issues.apache.org/jira/browse/YARN-8316
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Scenario:

1) shutdown ATS
2) launch yarn service.

yarn service launch cmd fails with below stack trace. There is no diagnostic 
message available in response.
{code:java}
bash-4.2$ yarn app -launch hbase-sec /tmp/hbase-secure.yar 
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
18/05/17 13:24:43 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xxx:8050
18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History 
server at localhost/xxx:10200
18/05/17 13:24:44 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xxx:8050
18/05/17 13:24:44 INFO client.AHSProxy: Connecting to Application History 
server at localhost/127.0.0.1:10200
18/05/17 13:24:44 INFO client.ApiServiceClient: Loading service definition from 
local FS: /tmp/hbase-secure.yar
18/05/17 13:26:06 ERROR client.ApiServiceClient: 
bash-4.2$ echo $?
56{code}
The Error message should provide ConnectionRefused exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved

2018-05-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8309:


 Summary: Diagnostic message for yarn service app failure due token 
renewal should be improved
 Key: YARN-8309
 URL: https://issues.apache.org/jira/browse/YARN-8309
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


When Yarn service application failed due to token renewal issue , The 
diagonstic message was unclear . 
{code:java}
Application application_1526413043392_0002 failed 20 times due to AM Container 
for appattempt_1526413043392_0002_20 exited with exitCode: 1 Failing this 
attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from container-launch. 
Container id: container_e04_1526413043392_0002_20_01 Exit code: 1 Exception 
message: Launch container failed Shell output: main : command provided 1 main : 
run as user is hbase main : requested yarn user is hbase Getting exit code 
file... Creating script paths... Writing pid file... Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp
 Writing to cgroup task files... Creating local dirs... Launching container... 
Getting exit code file... Creating script paths... [2018-05-15 
23:15:28.806]Container exited with a non-zero exit code 1. Error file: 
prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 
23:15:28.807]Container exited with a non-zero exit code 1. Error file: 
prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, 
check the application tracking page: 
https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on links 
to logs of each attempt. . Failing the application.{code}
Here, diagnostic message should be improved to specify that AM is failing due 
to token renewal issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8308:


 Summary: Yarn service app fails due to issues with Renew Token
 Key: YARN-8308
 URL: https://issues.apache.org/jira/browse/YARN-8308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Run Yarn service application beyond dfs.namenode.delegation.token.max-lifetime. 
Here, yarn service application fails with below error. 

{code}
2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service Service 
Master failed in state INITED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1437)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
at 
org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
at 
org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
at 
org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app master
2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
service master
org.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)

[jira] [Created] (YARN-8302) ATS v2 should handle HBase connection issue properly

2018-05-15 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8302:


 Summary: ATS v2 should handle HBase connection issue properly
 Key: YARN-8302
 URL: https://issues.apache.org/jira/browse/YARN-8302
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.1.0
Reporter: Yesha Vora


ATS v2 call times out with below error when it can't connect to HBase instance.
{code}
bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: 
application/json' --max-time 5   --negotiate -u : 
'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
{code}

{code:title=ATS log}
2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl 
(RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, 
retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=xxx,17020,1526348294182, seqNum=-1
2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl 
(RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, 
retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=xxx,17020,1526348294182, seqNum=-1
2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl 
(RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, 
retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=xxx,17020,1526348294182, seqNum=-1
2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl 
(RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, 
retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: xxx/xxx:17020, details=row 'prod.timelineservice.app_flow,
,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=xxx,17020,1526348294182, seqNum=-1{code}

There are two issues here.
1) Check why ATS can't connect to HBase
2) In case of connection error,  ATS call should not get timeout. It should 
fail with proper error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster

2018-05-15 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8297:


 Summary: Incorrect ATS Url used for Wire encrypted cluster
 Key: YARN-8297
 URL: https://issues.apache.org/jira/browse/YARN-8297
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


"Service" page uses incorrect web url for ATS in wire encrypted env. For ATS 
urls, it uses https protocol with http port.

This issue causes all ATS call to fail and UI does not display component 
details.

url used: 
https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320

expected url : 
https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-14 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8290:


 Summary: Yarn application failed to recover with "Error Launching 
job : User is not set in the application report" error after RM restart
 Key: YARN-8290
 URL: https://issues.apache.org/jira/browse/YARN-8290
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Scenario:

1) Start 5 streaming application in background

2) Kill Active RM and cause RM failover

After RM failover, The application failed with below error.

{code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
Invocation returned exception on [rm2] : 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1517520038847_0003' doesn't exist in RM. Please check that 
the job submission was successful.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
, so propagating back to caller.
18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
application_1517520038847_0003
18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/hrt_qa/.staging/job_1517520038847_0003
18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is not 
set in the application report
Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services

2018-05-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8283:


 Summary: [Umbrella] MaWo - A Master Worker framework on top of 
YARN Services
 Key: YARN-8283
 URL: https://issues.apache.org/jira/browse/YARN-8283
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Yesha Vora


There is a need for an application / framework to handle Master-Worker 
scenarios. There are existing frameworks on YARN which can be used to run a job 
in distributed manner such as Mapreduce, Tez, Spark etc. But master-worker 
use-cases usually are force-fed into one of these existing frameworks which 
have been designed primarily around data-parallelism instead of generic Master 
Worker type of computations.

In this JIRA, we’d like to contribute MaWo - a YARN Service based framework 
that achieves this goal. The overall goal is to create an app that can take an 
input job specification with tasks, their durations and have a Master dish the 
tasks off to a predetermined set of workers. The components will be responsible 
for making sure that the tasks and the overall job finish in specific time 
durations.

We have been using a version of the MaWo framework for running unit tests of 
Hadoop in a parallel manner on an existing Hadoop YARN cluster. What typically 
takes 10 hours to run all of Hadoop project’s unit-tests can finish under 20 
minutes on a MaWo app of about 50 containers!

YARN-3307 was an original attempt at this but through a first-class YARN app. 
In this JIRA, we instead use YARN Service for orchestration so that our code 
can focus on the core Master Worker paradigm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8271) Change UI2 labeling of certain tables to avoid confusion

2018-05-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8271:


 Summary: Change UI2 labeling of certain tables to avoid confusion
 Key: YARN-8271
 URL: https://issues.apache.org/jira/browse/YARN-8271
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Update labeling for few items to avoid confusion
 - Cluster Page (/cluster-overview):
 -- "Finished apps" --> "Finished apps from all users"
 -- "Running apps" --> "Running apps from all users"
 - Queues overview page (/yarn-queues/root) && Per queue page 
(/yarn-queue/root/apps)
 -- "Running Apps" --> "Running apps from all users in queue "
 - Nodes Page - side bar for all pages 
 -- "List of Applications" --> "List of Applications on this node"
 -- "List of Containers" --> "List of Containers on this node"
 - Yarn Tools
 ** Yarn Tools --> YARN Tools
 - Queue page
 ** Running Apps: --> Running Apps From All Users



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page

2018-05-08 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8266:


 Summary: Clicking on application from cluster view should redirect 
to application attempt page
 Key: YARN-8266
 URL: https://issues.apache.org/jira/browse/YARN-8266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

1) Start one application
 2) Go to cluster overview page
 3) Click on applicationId from Cluster Resource Usage By Application

This action redirects to 
[http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is 
invalid url. It does not show any details.

Instead It should redirect to attempt page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8253:


 Summary: HTTPS Ats v2 api call fails with "bad HTTP parsed"
 Key: YARN-8253
 URL: https://issues.apache.org/jira/browse/YARN-8253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.1.0
Reporter: Yesha Vora


When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.

Here, ATS v2 call is failing with below error.
{code:java}
[hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
'Accept: application/json' --negotiate -u: 
'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'

[hrt_qa@xxx root]$ echo $?

35{code}
{code:java|title=Ats v2}
2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) - 
Illegal character 0x16 in state=START for buffer 
HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
2018-05-02 05:45:40,428 WARN  http.HttpParser (HttpParser.java:parseNext(1435)) 
- bad HTTP parsed: 400 Illegal character 0x16 for 
HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8251) Clicking on app link at the header goes to Diagnostics Tab instead of AppAttempt Tab

2018-05-04 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8251:


 Summary: Clicking on app link at the header goes to Diagnostics 
Tab instead of AppAttempt Tab
 Key: YARN-8251
 URL: https://issues.apache.org/jira/browse/YARN-8251
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


1. Click on Application link under Application tab
2. It goes to Specific Application page with appAttempt Tab
3. Click on the "Application \[app ID\]" link at the top
4. It goes to Specific Application page with Diagnostic Tab instead of 
appAttempt Tab

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8231) Dshell application fails when one of the docker container gets killed

2018-04-27 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8231:


 Summary: Dshell application fails when one of the docker container 
gets killed
 Key: YARN-8231
 URL: https://issues.apache.org/jira/browse/YARN-8231
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


1) Launch dshell application
{code}
yarn  jar hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
"sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest 
-keep_containers_across_application_attempts -jar 
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
2) Kill container_1524681858728_0012_01_02

Expected behavior:
Application should start new instance and finish successfully

Actual behavior:
Application Failed as soon as container was killed

{code:title=AM log}
18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from RM 
for container ask, completedCnt=1
18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: 
appattempt_1524681858728_0012_01 got container status for 
containerID=container_1524681858728_0012_01_02, state=COMPLETE, 
exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on 
request. Exit code is 137
[2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137. 
[2018-04-27 23:05:09.332]Killed by external signal

18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from RM 
for container ask, completedCnt=1
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: 
appattempt_1524681858728_0012_01 got container status for 
containerID=container_1524681858728_0012_01_03, state=COMPLETE, 
exitStatus=0, diagnostics=
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container completed 
successfully., containerId=container_1524681858728_0012_01_03
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application 
completed. Stopping running containers
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application 
completed. Signalling finish to RM
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics., 
total=2, completed=2, allocated=2, failed=1
18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8215) Ats v2 returns invalid YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS

2018-04-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8215:


 Summary: Ats v2 returns invalid 
YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS
 Key: YARN-8215
 URL: https://issues.apache.org/jira/browse/YARN-8215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

1) Run Httpd yarn service

2) Stop Httpd yarn service

3) Validate application attempt page.

ATS v2 call is returning invalid data for 
YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS
{code:java}
http://xxx:8198/ws/v2/timeline/apps/application_1524698886838_0005/entities/YARN_CONTAINER?fields=ALL&_=1524705653569{code}

{code}
[{"metrics":[{"type":"SINGLE_VALUE","id":"CPU","aggregationOp":"NOP","values":{"1524704571187":0}},{"type":"SINGLE_VALUE","id":"MEMORY","aggregationOp":"NOP","values":{"1524704562126":30973952}}],"events":[{"id":"YARN_CONTAINER_FINISHED","timestamp":1524704571552,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_FINISHED","timestamp":1524704488410,"info":{}},{"id":"YARN_CONTAINER_CREATED","timestamp":1524704482976,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_STARTED","timestamp":1524704482976,"info":{}}],"createdtime":1524704482973,"idprefix":9223370512150292834,"id":"container_e12_1524698886838_0005_01_03","info":{"YARN_CONTAINER_STATE":"COMPLETE","YARN_CONTAINER_ALLOCATED_HOST":"xxx","YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS":"xxx:0","YARN_CONTAINER_ALLOCATED_VCORE":1,"FROM_ID":"yarn-cluster!hbase!httpd-docker-config-3!1524704463727!application_1524698886838_0005!YARN_CONTAINER!9223370512150292834!container_e12_1524698886838_0005_01_03","YARN_CONTAINER_ALLOCATED_PORT":25454,"UID":"yarn-cluster!application_1524698886838_0005!YARN_CONTAINER!9223370512150292834!container_e12_1524698886838_0005_01_03","YARN_CONTAINER_ALLOCATED_MEMORY":1024,"SYSTEM_INFO_PARENT_ENTITY":{"type":"YARN_APPLICATION_ATTEMPT","id":"appattempt_1524698886838_0005_01"},"YARN_CONTAINER_EXIT_STATUS":-105,"YARN_CONTAINER_ALLOCATED_PRIORITY":"0","YARN_CONTAINER_DIAGNOSTICS_INFO":"[2018-04-26
 01:02:34.486]Container killed by the ApplicationMaster.\n[2018-04-26 
01:02:45.616]Container killed on request. Exit code is 137\n[2018-04-26 
01:02:49.387]Container exited with a non-zero exit code 137. 
\n","YARN_CONTAINER_FINISHED_TIME":1524704571552},"relatesto":{},"configs":{},"isrelatedto":{},"type":"YARN_CONTAINER"},{"metrics":[{"type":"SINGLE_VALUE","id":"CPU","aggregationOp":"NOP","values":{"1524704564690":6}},{"type":"SINGLE_VALUE","id":"MEMORY","aggregationOp":"NOP","values":{"1524704564690":3710976}}],"events":[{"id":"YARN_CONTAINER_FINISHED","timestamp":1524704567244,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_FINISHED","timestamp":1524704487938,"info":{}},{"id":"YARN_CONTAINER_CREATED","timestamp":1524704483140,"info":{}},{"id":"YARN_NM_CONTAINER_LOCALIZATION_STARTED","timestamp":1524704483140,"info":{}}],"createdtime":1524704482919,"idprefix":9223370512150292888,"id":"container_e12_1524698886838_0005_01_04","info":{"YARN_CONTAINER_STATE":"COMPLETE","YARN_CONTAINER_ALLOCATED_HOST":"xxx","YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS":"xxx:0","YARN_CONTAINER_ALLOCATED_VCORE":1,"FROM_ID":"yarn-cluster!hbase!httpd-docker-config-3!1524704463727!application_1524698886838_0005!YARN_CONTAINER!9223370512150292888!container_e12_1524698886838_0005_01_04","YARN_CONTAINER_ALLOCATED_PORT":25454,"UID":"yarn-cluster!application_1524698886838_0005!YARN_CONTAINER!9223370512150292888!container_e12_1524698886838_0005_01_04","YARN_CONTAINER_ALLOCATED_MEMORY":1024,"SYSTEM_INFO_PARENT_ENTITY":{"type":"YARN_APPLICATION_ATTEMPT","id":"appattempt_1524698886838_0005_01"},"YARN_CONTAINER_EXIT_STATUS":-105,"YARN_CONTAINER_ALLOCATED_PRIORITY":"1","YARN_CONTAINER_DIAGNOSTICS_INFO":"[2018-04-26
 01:02:34.500]Container killed by the ApplicationMaster.\n[2018-04-26 
01:02:45.771]Container killed on request. Exit code is 137\n[2018-04-26 
01:02:47.242]Container exited with a non-zero exit code 137. 

[jira] [Created] (YARN-8211) Yarn registry dns log finds BufferUnderflowException

2018-04-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8211:


 Summary: Yarn registry dns log finds BufferUnderflowException
 Key: YARN-8211
 URL: https://issues.apache.org/jira/browse/YARN-8211
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Yarn registry dns server is constantly getting BufferUnderflowException. 
{code:java}
2018-04-25 01:36:56,139 WARN  concurrent.ExecutorHelper 
(ExecutorHelper.java:logThrowableFromAfterExecute(50)) - Execution exception 
when running task in RegistryDNS 76

2018-04-25 01:36:56,139 WARN  concurrent.ExecutorHelper 
(ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in 
thread RegistryDNS 76:

java.nio.BufferUnderflowException

        at java.nio.Buffer.nextGetIndex(Buffer.java:500)

        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:135)

        at 
org.apache.hadoop.registry.server.dns.RegistryDNS.getMessgeLength(RegistryDNS.java:820)

        at 
org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:767)

        at 
org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846)

        at 
org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748){code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7878) Docker container IP detail missing when service is in STABLE state

2018-04-16 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora resolved YARN-7878.
--
Resolution: Duplicate

> Docker container IP detail missing when service is in STABLE state
> --
>
> Key: YARN-7878
> URL: https://issues.apache.org/jira/browse/YARN-7878
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Critical
>
> Scenario
>  1) Launch Hbase on docker app
>  2) Validate yarn service status using cli
> {code:java}
> {"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m
>  
> -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep
>  15; /usr/hdp/current/hbase-master/bin/hbase master 
> start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70
>  -Xmx2048m 
> -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep
>  15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver 
> start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep
>  
> 

[jira] [Created] (YARN-8167) Improve Diagonstic message when a user without privileged permission deploys a privileged app

2018-04-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8167:


 Summary: Improve Diagonstic message when a user without privileged 
permission deploys a privileged app
 Key: YARN-8167
 URL: https://issues.apache.org/jira/browse/YARN-8167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

1) Validate hrt_qa user is not mentioned in 
yarn.nodemanager.runtime.linux.docker.privileged-containers.acl 
2) launch a dshell app with 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true
{code}
/usr/hdp/current/hadoop-yarn-client/bin/yarn jar 
/usr/hdp/3.0.0.0/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.0.0.jar
 -shell_command "sleep 30" -num_containers 1 -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
/usr/hdp/3.0.0.0/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.0.0.jar{code}

The application is failing to launch the container. However, diagnostic message 
of the app is not proper. 
It returns "Diagnostics., total=1, completed=1, allocated=1, failed=1"

The AM log also does not have proper error message
{code:title=AppMaster.stderr}
18/04/16 20:45:56 INFO distributedshell.ApplicationMaster: 
appattempt_1523387473707_0049_01 got container status for 
containerID=container_e24_1523387473707_0049_01_02, state=COMPLETE, 
exitStatus=-1, diagnostics=[2018-04-16 20:45:49.062]Exception from 
container-launch.
Container id: container_e24_1523387473707_0049_01_02
Exit code: -1
Exception message: 
Shell output: 

[2018-04-16 20:45:49.085]Container exited with a non-zero exit code -1.
[2018-04-16 20:45:49.085]Container exited with a non-zero exit code -1.

18/04/16 20:45:56 INFO distributedshell.ApplicationMaster: Application 
completed. Stopping running containers{code}

Diagonstic message should be improved to explicitly mention that "hrt_qa user 
does not have permission to launch privilege containers"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8166) Service AppId page throws HTTP Error 401

2018-04-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8166:


 Summary: Service AppId page throws HTTP Error 401
 Key: YARN-8166
 URL: https://issues.apache.org/jira/browse/YARN-8166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:

1) Launch a yarn service in unsecure cluster

2) Go to component info page for sleeper-0

3) click on sleeper link

http://xxx:8088/ui2/#/yarn-component-instances/sleeper/components?service=yesha-sleeper&=application_1518804855867_0002

Above url fails with HTTP Error 401

 {code}
401, Authorization required.
Please check your security settings.
 {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8142) yarn service application stops when AM is killed

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8142:


 Summary: yarn service application stops when AM is killed
 Key: YARN-8142
 URL: https://issues.apache.org/jira/browse/YARN-8142
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0\{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8140:


 Summary: Improve log message when launch cmd is ran for stopped 
yarn service
 Key: YARN-8140
 URL: https://issues.apache.org/jira/browse/YARN-8140
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms

18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
application_1523387473707_0007

Exit Code: 0\{code}

2) Stop the application

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
sleeper2-duplicate-app-stopped

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms

18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
sleeper2-duplicate-app-stopped

Exit Code: 0\{code}

3) Launch the application with same name

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms

18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json

Exit Code: 56

{code}

 

Here, launch cmd fails with "Service Instance dir already exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".

 

The log message should be more meaningful. It should return that 
"sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-04 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8116:


 Summary: Nodemanager fails with NumberFormatException: For input 
string: ""
 Key: YARN-8116
 URL: https://issues.apache.org/jira/browse/YARN-8116
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Steps followed.
1) Update nodemanager debug delay config
{code}

  yarn.nodemanager.delete.debug-delay-sec
  350
{code}
2) Launch distributed shell application multiple times
{code}
/usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
-num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
hadoop-yarn-applications-distributedshell-*.jar{code}
3) restart NM

Nodemanager fails to start with below error.
{code}

{code:title=NM log}
2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: true
2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
(LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
as 3600. The logs will be aggregated every 3600 seconds
2018-03-23 21:32:14,455 INFO  service.AbstractService 
(AbstractService.java:noteFailure(267)) - Service 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl 
failed in state INITED
java.lang.NumberFormatException: For input string: ""
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:601)
at java.lang.Long.parseLong(Long.java:631)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
(LogAggregationService.java:serviceStop(148)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 waiting for pending aggregation during exit
2018-03-23 21:32:14,460 INFO  service.AbstractService 
(AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
INITED
java.lang.NumberFormatException: For input string: ""
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:601)
at java.lang.Long.parseLong(Long.java:631)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
2018-03-23 21:32:14,463 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(210)) - Stopping NodeManager metrics system...
2018-03-23 21:32:14,464 INFO  impl.MetricsSinkAdapter 
(MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread 

[jira] [Created] (YARN-7961) Improve status response when yarn application is destroyed

2018-02-22 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7961:


 Summary: Improve status response when yarn application is destroyed
 Key: YARN-7961
 URL: https://issues.apache.org/jira/browse/YARN-7961
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Yarn should provide some way to figure out if yarn service is destroyed.

If yarn service application is stopped, "yarn app -status " shows that 
service is Stopped. 

After destroying yarn service, "yarn app -status " returns 404
{code}
[hdpuser@cn005 sleeper]$ yarn app -status yesha-sleeper

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/02/16 11:02:30 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/02/16 11:02:31 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xx.xx.xx.xx:8050

18/02/16 11:02:31 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xx.xx.xx.x:10200

18/02/16 11:02:31 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xx.xx.xx.x:8050

18/02/16 11:02:31 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xx.xx.xx.x:10200

18/02/16 11:02:31 INFO util.log: Logging initialized @2075ms

yesha-sleeper Failed : HTTP error code : 404
{code}
Yarn should be able to notify user that whether a certain app is destroyed or 
never created. HTTP 404 error does not explicitly provide information.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7954) Component status stays "Ready" when yarn service is stopped

2018-02-21 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7954:


 Summary: Component status stays "Ready" when yarn service is 
stopped
 Key: YARN-7954
 URL: https://issues.apache.org/jira/browse/YARN-7954
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Steps:

1) Launch yarn service application

2) Stop application

3) Run get status from yarn cli

 {code}
[hdpuser@cn005 sleeper]$ yarn app -status yesha-sleeper

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/02/16 10:54:37 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/02/16 10:54:37 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xx.xx.xx.xx:8050

18/02/16 10:54:37 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xx.xx.xx.xx:10200

18/02/16 10:54:37 INFO client.RMProxy: Connecting to ResourceManager at 
xxx/xx.xx.xx.xx:8050

18/02/16 10:54:37 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xx.xx.xx.xx:10200

18/02/16 10:54:38 INFO util.log: Logging initialized @1957ms

{"name":"yesha-sleeper","lifetime":-1,"components":[],"configuration":{"properties":{},"env":{},"files":[]},"state":"STOPPED","quicklinks":{},"kerberos_principal":{}}

 {code}
4) Validate UI2 for service status

Here, Yarn service status is marked as "finished". However, components status 
still shows Ready. 

On stopping yarn service, component status should be updated to "Stop"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7957) Yarn service delete option disappears after stopping application

2018-02-21 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7957:


 Summary: Yarn service delete option disappears after stopping 
application
 Key: YARN-7957
 URL: https://issues.apache.org/jira/browse/YARN-7957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

1) Launch yarn service
2) Go to service page and click on Setting button->"Stop Service". The 
application will be stopped.
3) Refresh page

Here, setting button disappears. Thus, user can not delete service from UI 
after stopping application

Expected behavior:
Setting button should be present on UI page after application is stopped. If 
application is stopped, setting button should only have "Delete Service" action 
available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7956) HOME/Services/ and HOME/Services//Components refer to same page

2018-02-21 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7956:


 Summary: HOME/Services/ and 
HOME/Services//Components refer to same page
 Key: YARN-7956
 URL: https://issues.apache.org/jira/browse/YARN-7956
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Scenario:

1) Start Yarn service
2) Click on a Running yarn service ( example : yesha-sleeper)
http://:8088/ui2/#/yarn-app/application_1518804855867_0002/components?service=yesha-sleeper
3) Now click on yesha-sleeper [application_1518804855867_0002] link

Both components and yesha-sleeper [application_1518804855867_0002] link point 
to one page. 
HOME/Services/ and HOME/Services//Components refer to 
same page.

We should not need two links to refer to one page
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7949) ArtifactsId should not be a compulsory field for new service

2018-02-20 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7949:


 Summary: ArtifactsId should not be a compulsory field for new 
service
 Key: YARN-7949
 URL: https://issues.apache.org/jira/browse/YARN-7949
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


1) Click on New Service 
2) Create a component

Create Component page has Artifacts Id as compulsory entry. Few yarn service 
example such as sleeper.json does not need to provide artifacts id.
{code:java|title=sleeper.json}
{
  "name": "sleeper-service",
  "components" :
  [
{
  "name": "sleeper",
  "number_of_containers": 2,
  "launch_command": "sleep 90",
  "resource": {
"cpus": 1,
"memory": "256"
  }
}
  ]
}{code}
Thus, artifactsId should not be compulsory field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7944) Remove master node link from headers of application pages

2018-02-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7944:


 Summary: Remove master node link from headers of application pages
 Key: YARN-7944
 URL: https://issues.apache.org/jira/browse/YARN-7944
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.1.0
Reporter: Yesha Vora


Rm UI2 has links for Master container log and master node. 

This link published on application and service page. This links are not 
required on all pages because AM container node link and container log link are 
already present in Application view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7928) [UI2] Components details not present for Yarn service with Yarn authentication

2018-02-13 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7928:


 Summary: [UI2] Components details not present for Yarn service 
with Yarn authentication
 Key: YARN-7928
 URL: https://issues.apache.org/jira/browse/YARN-7928
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Scenario:

Launch Hbase app in secure hadoop cluster where yarn UI authentication is 
enabled
Validate Components page.

Here, Component details are missing from UI


{code:java}
Failed to load 
http://xxx:8198/ws/v2/timeline/apps/application_1518564922635_0001/entities/SERVICE_ATTEMPT?fields=ALL&_=1518567830088:
 No 'Access-Control-Allow-Origin' header is present on the requested resource. 
Origin 'http://xxx:8088' is therefore not allowed access.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7907) Yarn app CLI client does not send Kerberos header to Resource Manager rest API

2018-02-08 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7907:


 Summary: Yarn app CLI client does not send Kerberos header to 
Resource Manager rest API
 Key: YARN-7907
 URL: https://issues.apache.org/jira/browse/YARN-7907
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.0.0
Reporter: Yesha Vora


Launch of yarn service app is failing in secure mode with below stacktrace.
{code:java}
[hrt_qa@xxx root]$ kinit -kt 
/home/hrt_qa/hadoopqa/keytabs/hrt_qa.headless.keytab hrt_qa
[hrt_qa@xxx root]$ yarn app -launch test2 sleeper
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/02/07 22:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/02/07 22:50:41 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/02/07 22:50:41 INFO client.AHSProxy: Connecting to Application History 
server at xxx/xxx:10200
18/02/07 22:50:41 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-800/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
18/02/07 22:50:42 ERROR client.ApiServiceClient: Authentication required{code}

CLI client does not send Kerberos header to Resource Manager rest API. Tcpdump 
indicate that there is no token being sent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7897) Invalid NM log & NM UI link published on Yarn UI when container fails

2018-02-05 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7897:


 Summary: Invalid NM log & NM UI link published on Yarn UI when 
container fails
 Key: YARN-7897
 URL: https://issues.apache.org/jira/browse/YARN-7897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:

1) Launch Httpd example via rest api in unsecure mode
2) container_e04_1517875972784_0001_01_02 fails with "Unable to find image 
'centos/httpd-24-centos7:latest"
3) Go To RM UI2 to debug issue.

The Yarn app attempt page has incorrect Value for Logs and Nodemanager UI

Logs = N/A
Nodemanager UI = http://nmhost:0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7896) AM log link in diagnostic is redirected to old RM UI

2018-02-05 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7896:


 Summary: AM log link in diagnostic is redirected to old RM UI
 Key: YARN-7896
 URL: https://issues.apache.org/jira/browse/YARN-7896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Scenario:

1) Run Httpd yarn service in secure mode and make sure application gets 
launched as dr.who
2) Go to Diagnostic tab



The message showed in Diagnostics mentions to AM UI link for old UI. 

In diagnostic message UI2 link should be mentioned.
{code:java|title=Diagnostics}
Application application_1517253048795_0001 failed 20 times due to AM Container 
for appattempt_1517253048795_0001_20 exited with exitCode: -1000 Failing 
this attempt.Diagnostics: [2018-01-29 23:01:46.234]Application 
application_1517253048795_0001 initialization failed (exitCode=255) with 
output: main : command provided 0 main : run as user is dr.who main : requested 
yarn user is dr.who User dr.who not found For more detailed output, check the 
application tracking page: 
http://xxx:8088/cluster/app/application_1517253048795_0001 Then click on links 
to logs of each attempt. . Failing the application.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7878) Docker container IP detail missing when service is in STABLE state

2018-02-01 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7878:


 Summary: Docker container IP detail missing when service is in 
STABLE state
 Key: YARN-7878
 URL: https://issues.apache.org/jira/browse/YARN-7878
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Scenario
 1) Launch Hbase on docker app
 2) Validate yarn service status using cli
{code:java}
{"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m
 
-Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep
 15; /usr/hdp/current/hbase-master/bin/hbase master 
start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70
 -Xmx2048m 
-Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep
 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver 
start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep
 

[jira] [Created] (YARN-7851) Graph view does not show all AM attempts

2018-01-29 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7851:


 Summary: Graph view does not show all AM attempts
 Key: YARN-7851
 URL: https://issues.apache.org/jira/browse/YARN-7851
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Scenario:

1) Run an application where all AM attempt fails
2) Go to Graph view for application

Here, The application started 10 attempt of AM. However, Graph view has 
pictorial representation of only has 4 AM attempts. 
It should show all 10 attempts in Graph



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7850) New UI does not show status for Log Aggregation

2018-01-29 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7850:


 Summary: New UI does not show status for Log Aggregation
 Key: YARN-7850
 URL: https://issues.apache.org/jira/browse/YARN-7850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


The status of Log Aggregation is not specified any where.

New UI should show the Log aggregation status for finished application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7832) Logs page does not work for Running applications

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7832:


 Summary: Logs page does not work for Running applications
 Key: YARN-7832
 URL: https://issues.apache.org/jira/browse/YARN-7832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.0.0
Reporter: Yesha Vora


Scenario
 * Run yarn service application
 * When application is Running, go to log page
 * Select AttemptId and Container Id

Logs are not showed on UI. It complains "No log data available!"

 

Here 
[http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358|http://ctr-e137-1514896590304-35963-01-04.hwx.site:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358]
 API fails with 500 Internal Server Error.

{"exception":"WebApplicationException","message":"java.io.IOException: 
","javaClassName":"javax.ws.rs.WebApplicationException"}
{code:java}
GET 
http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358
 500 (Internal Server Error)
(anonymous) @ VM779:1
send @ vendor.js:572
ajax @ vendor.js:548
(anonymous) @ vendor.js:5119
initializePromise @ vendor.js:2941
Promise @ vendor.js:3005
ajax @ vendor.js:5117
ajax @ yarn-ui.js:1
superWrapper @ vendor.js:1591
query @ vendor.js:5112
ember$data$lib$system$store$finders$$_query @ vendor.js:5177
query @ vendor.js:5334
fetchLogFilesForContainerId @ yarn-ui.js:132
showLogFilesForContainerId @ yarn-ui.js:126
run @ vendor.js:648
join @ vendor.js:648
run.join @ vendor.js:1510
closureAction @ vendor.js:1865
trigger @ vendor.js:302
(anonymous) @ vendor.js:339
each @ vendor.js:61
each @ vendor.js:51
trigger @ vendor.js:339
d.select @ vendor.js:5598
(anonymous) @ vendor.js:5598
d.invoke @ vendor.js:5598
d.trigger @ vendor.js:5598
e.trigger @ vendor.js:5598
(anonymous) @ vendor.js:5598
d.invoke @ vendor.js:5598
d.trigger @ vendor.js:5598
(anonymous) @ vendor.js:5598
dispatch @ vendor.js:306
elemData.handle @ vendor.js:281{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7830:


 Summary:  If attempt has selected grid view, attempt info page 
should be opened with grid view
 Key: YARN-7830
 URL: https://issues.apache.org/jira/browse/YARN-7830
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:

1) Start Application and visit attempt page

2) click on Grid view

 3) Click on attempt 1

 

Current behavior:

This page is redirected to attempt info page. This page redirects to graph view 
. 

 

Expected behavior:

In this scenario, It should redirect to grid view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7827) Stop and Delete Yarn Service from RM UI fails with HTTP ERROR 404

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7827:


 Summary: Stop and Delete Yarn Service from RM UI fails with HTTP 
ERROR 404
 Key: YARN-7827
 URL: https://issues.apache.org/jira/browse/YARN-7827
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:
1) Enable Ats v2
2) Start Httpd Yarn service
3) Go to UI2 attempts page for yarn service 
4) Click on setting icon
5) Click on stop service
6) This action will pop up a box to confirm stop. click on "Yes"

Expected behavior:
Yarn service should be stopped

Actual behavior:
Yarn UI is not notifying on whether Yarn service is stopped or not.
On checking network stack trace, the PUT request failed with HTTP error 404
{code}
Sorry, got error 404
Please consult RFC 2616 for meanings of the error code.
Error Details
org.apache.hadoop.yarn.webapp.WebAppException: /v1/services/httpd-hrt-qa-n: 
controller for v1 not found
at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247)
at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:143)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
at 
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1578)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 

***UNCHECKED*** [jira] [Created] (YARN-7826) Yarn service status cli does not update lifetime if its updated with -appId

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7826:


 Summary: Yarn service status cli does not update lifetime if its 
updated with -appId
 Key: YARN-7826
 URL: https://issues.apache.org/jira/browse/YARN-7826
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


1) Create Httpd yarn service with lifetime = 3600 sec.
2) Run yarn application -status , The lifetime field has 3600 sec.
3) Update lifetime of service using applicationId
{code}
 yarn application -appId application_1516919074719_0001 -updateLifetime 
48000{code}
4) Verify Application status using ApplicationId. Lifetime detail is updated 
correctly
5) Verify Lifetime using application name
{code}
 [hrt_qa@xxx hadoopqe]$ yarn application -status httpd-hrt-qa-n
{
  "uri" : null,
  "name" : "httpd-hrt-qa-n",
  "id" : "application_1516919074719_0001",
  "artifact" : null,
  "resource" : null,
  "launch_time" : null,
  "number_of_running_containers" : null,
  "lifetime" : 3600,
  "placement_policy" : null,
  "components" : [ {
"name" : "httpd",
"dependencies" : [ ],
"readiness_check" : null,
"artifact" : {
  "id" : "centos/httpd-24-centos7:latest",
  "type" : "DOCKER",
  "uri" : null
},
"launch_command" : "/usr/bin/run-httpd",
"resource" : {
  "uri" : null,
  "profile" : null,
  "cpus" : 1,
  "memory" : "1024",
  "additional" : null
},
"number_of_containers" : 2,
"run_privileged_container" : false,
"placement_policy" : null,
"state" : "STABLE",
"configuration" : {
  "properties" : { },
  "env" : { },
  "files" : [ {
"type" : "TEMPLATE",
"dest_file" : "/var/www/html/index.html",
"src_file" : null,
"properties" : {
  "content" : "TitleHello 
from ${COMPONENT_INSTANCE_NAME}!"
}
  } ]
},
"quicklinks" : [ ],
"containers" : [ {
  "uri" : null,
  "id" : "container_e07_1516919074719_0001_01_02",
  "launch_time" : 1516919372633,
  "ip" : "xxx.xxx.xxx.xxx",
  "hostname" : "httpd-0.httpd-hrt-qa-n.hrt_qa.test.com",
  "bare_host" : "xxx",
  "state" : "READY",
  "component_instance_name" : "httpd-0",
  "resource" : null,
  "artifact" : null,
  "privileged_container" : null
}, {
  "uri" : null,
  "id" : "container_e07_1516919074719_0001_01_03",
  "launch_time" : 1516919372637,
  "ip" : "xxx.xxx.xxx.xxx",
  "hostname" : "httpd-1.httpd-hrt-qa-n.hrt_qa.test.com",
  "bare_host" : "xxx",
  "state" : "READY",
  "component_instance_name" : "httpd-1",
  "resource" : null,
  "artifact" : null,
  "privileged_container" : null
} ]
  }, {
"name" : "httpd-proxy",
"dependencies" : [ ],
"readiness_check" : null,
"artifact" : {
  "id" : "centos/httpd-24-centos7:latest",
  "type" : "DOCKER",
  "uri" : null
},
"launch_command" : "/usr/bin/run-httpd",
"resource" : {
  "uri" : null,
  "profile" : null,
  "cpus" : 1,
  "memory" : "1024",
  "additional" : null
},
"number_of_containers" : 1,
"run_privileged_container" : false,
"placement_policy" : null,
"state" : "STABLE",
"configuration" : {
  "properties" : { },
  "env" : { },
  "files" : [ {
"type" : "TEMPLATE",
"dest_file" : "/etc/httpd/conf.d/httpd-proxy.conf",
"src_file" : "httpd-proxy.conf",
"properties" : { }
  } ]
},
"quicklinks" : [ ],
"containers" : [ {
  "uri" : null,
  "id" : "container_e07_1516919074719_0001_01_04",
  "launch_time" : 1516919372638,
  "ip" : "xxx.xxx.xxx.xxx",
  "hostname" : "httpd-proxy-0.httpd-hrt-qa-n.hrt_qa.test.com",
  "bare_host" : "xxx",
  "state" : "READY",
  "component_instance_name" : "httpd-proxy-0",
  "resource" : null,
  "artifact" : null,
  "privileged_container" : null
} ]
  } ],
  "configuration" : {
"properties" : { },
"env" : { },
"files" : [ ]
  },
  "state" : "STABLE",
  "quicklinks" : {
"Apache HTTP Server" : 
"http://httpd-proxy-0.httpd-hrt-qa-n.hrt_qa.test.com:8080;
  },
  "queue" : null,
  "kerberos_principal" : {
"principal_name" : null,
"keytab" : null
  }
}
{code}

Here, App status with app-name did not have new lifetime. The application 
status with app name should also reflect the new lifetime.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7825) Maintain constant horizontal application info bar for all pages

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7825:


 Summary: Maintain constant horizontal application info bar for all 
pages
 Key: YARN-7825
 URL: https://issues.apache.org/jira/browse/YARN-7825
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:
1) enable Ats v2
2) Start Yarn service application ( Httpd )
3) Fix horizontal info bar for below pages.
 * component page
 * Component Instance info page 
 * Application attempt Info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7824) Yarn Component Instance page should include link to container logs

2018-01-26 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7824:


 Summary: Yarn Component Instance page should include link to 
container logs
 Key: YARN-7824
 URL: https://issues.apache.org/jira/browse/YARN-7824
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:
1) Launch Httpd example
2) Visit component Instance page for httpd-proxy-0

This page has information regarding httpd-proxy-0 component.

This page should also include a link to container logs for this component
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7818) DistributedShell Container fails with exitCode=143 when NM restarts and recovers

2018-01-25 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7818:


 Summary: DistributedShell Container fails with exitCode=143 when 
NM restarts and recovers
 Key: YARN-7818
 URL: https://issues.apache.org/jira/browse/YARN-7818
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


steps:
1) Run Dshell Application
{code}
yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
/usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar
 -keep_containers_across_application_attempts -timeout 90 -shell_command 
"sleep 110" -num_containers 4{code}
2) Find out host where AM is running. 
3) Find Containers launched by application
4) Restart NM where AM is running
5) Validate that new attempt is not started and containers launched before 
restart are in RUNNING state.

In this test, step#5 fails because containers failed to launch with error 143
{code}
2018-01-24 09:48:30,547 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1516787230461_0001_01_03 transitioned from RUNNING to KILLING
2018-01-24 09:48:30,547 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
container_e04_1516787230461_0001_01_03
2018-01-24 09:48:30,552 WARN  privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
execution returned exit code: 143. Privileged Execution Operation Stderr:

Stdout: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Full command array for failed execution:
[/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, 
application_1516787230461_0001, container_e04_1516787230461_0001_01_03, 
/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_03,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/launch_container.sh,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.tokens,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid,
 /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none]
2018-01-24 09:48:30,553 WARN  runtime.DefaultLinuxContainerRuntime 
(DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container 
failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=143:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=143:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
at org.apache.hadoop.util.Shell.run(Shell.java:902)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
... 10 more
2018-01-24 

[jira] [Created] (YARN-7805) Yarn should update container as failed on docker container failure

2018-01-23 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7805:


 Summary: Yarn should update container as failed on docker 
container failure
 Key: YARN-7805
 URL: https://issues.apache.org/jira/browse/YARN-7805
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Steps:

Start hbase yarn service example on docker
when Hbase master fails, it lead the master daemon docker to fail.

{code}
[root@xx bin]# docker ps -a
CONTAINER IDIMAGE   
  COMMAND  CREATED STATUS 
PORTS   NAMES
a57303b1a736x/xxxhbase:x.x.x.x.0.0.0   "bash /grid/0/hadoop/"   5 
minutes ago   Exited (1) 4 minutes ago   
container_e07_1516734339938_0018_01_02
[root@xxx bin]# docker exec -it a57303b1a736 bash
Error response from daemon: Container 
a57303b1a7364a733428ec76581368253e5a701560a510204b8c302e3bbeed26 is not running
{code}

Expected behavior:
Yarn should mark this container as failed and start new docker container

Actual behavior:
Yarn did not capture that container is failed. It kept showing container status 
as Running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7804) Refresh action on Grid view page should not be redirected to graph view

2018-01-23 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7804:


 Summary: Refresh action on Grid view page should not be redirected 
to graph view
 Key: YARN-7804
 URL: https://issues.apache.org/jira/browse/YARN-7804
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Affects Versions: 3.0.0
Reporter: Yesha Vora


Steps:

1) Go to application attempt page
http://host:8088/ui2/#/yarn-app/application_1516734339938_0020/attempts?service=abc
2) click on grid view
3) click refresh button of the page

Actual behavior:
on refresh page, the page moves come back to graph view.

Expected behavior:
on refreshing page, the page should stay at grid view



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7802) Application regex search did not work properly with app name

2018-01-23 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7802:


 Summary: Application regex search did not work properly with app 
name
 Key: YARN-7802
 URL: https://issues.apache.org/jira/browse/YARN-7802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Yesha Vora


Steps:
1) Start yarn services with "yesha-hbase-retry-2"
2) put regex = yesha-hbase-retry-2
http://host:8088/ui2/#/yarn-apps/apps?searchText=yesha-hbase-retry-2

Here, the application does not gets listed. The regex work with 
"yesha-hbase-retry-" input but does not work with full app name.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7768) yarn application -status appName does not return valid json

2018-01-17 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7768:


 Summary: yarn application -status appName does not return valid 
json
 Key: YARN-7768
 URL: https://issues.apache.org/jira/browse/YARN-7768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


yarn application -status  does not return valid json

1) It has classname added to json content such as class Service, class 
KerberosPrincipal , class Component etc
2) The json object should be comma separated.

{code}
[hrt_qa@2 hadoopqe]$ yarn application -status httpd-hrt-qa
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/01/18 00:33:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/01/18 00:33:08 WARN shortcircuit.DomainSocketFactory: The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
18/01/18 00:33:08 INFO utils.ServiceApiUtil: Loading service definition from 
hdfs://mycluster/user/hrt_qa/.yarn/services/httpd-hrt-qa/httpd-hrt-qa.json
18/01/18 00:33:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
class Service {
name: httpd-hrt-qa
id: application_1516234304810_0001
artifact: null
resource: null
launchTime: null
numberOfRunningContainers: null
lifetime: 3600
placementPolicy: null
components: [class Component {
name: httpd
state: STABLE
dependencies: []
readinessCheck: null
artifact: class Artifact {
id: centos/httpd-24-centos7:latest
type: DOCKER
uri: null
}
launchCommand: /usr/bin/run-httpd
resource: class Resource {
profile: null
cpus: 1
memory: 1024
additional: null
}
numberOfContainers: 2
containers: [class Container {
id: container_e05_1516234304810_0001_01_02
launchTime: Thu Jan 18 00:19:22 UTC 2018
ip: 172.17.0.2
hostname: httpd-0.httpd-hrt-qa.hrt_qa.test.com
bareHost: 5.hwx.site
state: READY
componentInstanceName: httpd-0
resource: null
artifact: null
privilegedContainer: null
}, class Container {
id: container_e05_1516234304810_0001_01_03
launchTime: Thu Jan 18 00:19:23 UTC 2018
ip: 172.17.0.3
hostname: httpd-1.httpd-hrt-qa.hrt_qa.test.com
bareHost: 5.hwx.site
state: READY
componentInstanceName: httpd-1
resource: null
artifact: null
privilegedContainer: null
}]
runPrivilegedContainer: false
placementPolicy: null
configuration: class Configuration {
properties: {}
env: {}
files: [class ConfigFile {
type: TEMPLATE
destFile: /var/www/html/index.html
srcFile: null
properties: 
{content=TitleHello from 
${COMPONENT_INSTANCE_NAME}!}
}]
}
quicklinks: []
}, class Component {
name: httpd-proxy
state: FLEXING
dependencies: []
readinessCheck: null
artifact: class Artifact {
id: centos/httpd-24-centos7:latest
type: DOCKER
uri: null
}
launchCommand: /usr/bin/run-httpd
resource: class Resource {
profile: null
cpus: 1
memory: 1024
additional: null
}
numberOfContainers: 1
containers: []
runPrivilegedContainer: false
placementPolicy: null
configuration: class Configuration {
properties: {}
env: {}
files: [class ConfigFile {
type: TEMPLATE
destFile: /etc/httpd/conf.d/httpd-proxy.conf
srcFile: httpd-proxy.conf
properties: {}
}]
}
quicklinks: []
}]
configuration: class Configuration {
properties: {}
env: {}
files: []
}
state: STARTED
quicklinks: {Apache HTTP 
Server=http://httpd-proxy-0.httpd-hrt-qa.hrt_qa.test.com:8080}
queue: null
kerberosPrincipal: class KerberosPrincipal {
principalName: null
keytab: null
  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Created] (YARN-7744) Fix Get status rest api response when application is destroyed

2018-01-12 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7744:


 Summary: Fix Get status rest api response when application is 
destroyed
 Key: YARN-7744
 URL: https://issues.apache.org/jira/browse/YARN-7744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora
Priority: Critical


Steps:
1) Create a yarn service 
2) Destroy a yarn service

Run get status for application using REST API:
{code}
response json = {u'diagnostics': u'Failed to retrieve service: File does not 
exist: 
hdfs://mycluster/user/yarn/.yarn/services/httpd-service/httpd-service.json'}
status code = 500{code}

The REST API should respond with proper json including diagnostics and HTTP 
status code 404



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7743) UI component placement overlaps on Safari

2018-01-12 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7743:


 Summary: UI component placement overlaps on Safari
 Key: YARN-7743
 URL: https://issues.apache.org/jira/browse/YARN-7743
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Yesha Vora


Browser: Safari Version 9.1.1 (11601.6.17)

Issue with new RM UI:

The two tables on application and service page overlaps on Safari browser. Find 
screenshot for details



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7741) Eliminate extra log statement from yarn app -destroy cli

2018-01-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7741:


 Summary: Eliminate extra log statement from yarn app -destroy cli
 Key: YARN-7741
 URL: https://issues.apache.org/jira/browse/YARN-7741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


"Yarn destroy -app " cli prints very long stacktrace from zookeeper client 
which is not required. 
This cli prints 44009 characters (38 lines and 358 words)

This api should only print the message whether app was destroyed successfully 
or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7740) Fix logging for destroy yarn service cli when app does not exist

2018-01-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7740:


 Summary: Fix logging for destroy yarn service cli when app does 
not exist
 Key: YARN-7740
 URL: https://issues.apache.org/jira/browse/YARN-7740
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Scenario:

Run "yarn app -destroy" cli with a application name which does not exist.

Here, The cli should return a message " Application does not exists" instead it 
is returning a message "Destroyed cluster httpd-xxx"





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7730) Add memory management configs to yarn-default

2018-01-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7730:


 Summary: Add memory management configs to yarn-default
 Key: YARN-7730
 URL: https://issues.apache.org/jira/browse/YARN-7730
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Priority: Minor


Add below configuration and description to yarn-defaults.xml
{code}
"yarn.nodemanager.resource.memory.enabled"
// the default value is false, we need to set to true here to enable the 
cgroups based memory monitoring.


"yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
// the default value is 90.0f, which means in memory congestion case, the 
container can still keep/reserve 90% resource for its claimed value. It cannot 
be set to above 100 or set as negative value.

"yarn.nodemanager.resource.memory.cgroups.swappiness"
// The percentage that memory can be swapped or not. default value is 0, which 
means container memory cannot be swapped out. If not set, linux cgroup setting 
by default set to 60 which means 60% of memory can potentially be swapped out 
when system memory is not enough.{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7719) [Yarn services] Yarn application logs does not collect all AM log files

2018-01-08 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7719:


 Summary: [Yarn services] Yarn application logs does not collect 
all AM log files
 Key: YARN-7719
 URL: https://issues.apache.org/jira/browse/YARN-7719
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Steps:

1) Run Yarn Service Application such as httpd
2) Gather yarn application log after application is finished 

The log collection only shows content of container-localizer-syslog. 
Log collection should also gather below files from AM.

* directory.info
* launch_container.sh
* prelaunch.err
* prelaunch.out
* serviceam-err.txt
* serviceam-out.txt

Without these log files, debugging of an failure app becomes impossible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7717) Add configuration consistency for module.enabled and docker.privileged-containers.enabled

2018-01-08 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7717:


 Summary: Add configuration consistency for module.enabled and 
docker.privileged-containers.enabled
 Key: YARN-7717
 URL: https://issues.apache.org/jira/browse/YARN-7717
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Yesha Vora


container-executor.cfg has two properties related to dockerization. 
1)  module.enabled = true/false
2) docker.privileged-containers.enabled = 1/0

Here, both property takes different value to enable / disable feature. Module 
enabled take true/false string while docker.privileged-containers.enabled  
takes 1/0 integer value. 

This properties behavior should be consistent. Both properties should have true 
or false string as value to enable or disable feature/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7706) httpd yarn service example fails with "java.lang.IllegalArgumentException: Src_file does not exist for config file: httpd-proxy.conf"

2018-01-05 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7706:


 Summary: httpd yarn service example fails with 
"java.lang.IllegalArgumentException: Src_file does not exist for config file: 
httpd-proxy.conf"
 Key: YARN-7706
 URL: https://issues.apache.org/jira/browse/YARN-7706
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


steps:

* Enable yarn containerization in cluster
* Launch httpd example.

httpd.json and httpd-proxy.conf file are present at /yarn-service-examples/httpd

{code}
[hrt_qa@xxx httpd]$ ls -la
total 8
drwxr-xr-x. 2 root root   46 Jan  5 02:52 .
drwxr-xr-x. 5 root root   51 Jan  5 02:52 ..
-rw-r--r--. 1 root root 1337 Jan  1 04:21 httpd.json
-rw-r--r--. 1 root root 1065 Jan  1 04:21 httpd-proxy.conf{code}

{code}
[hrt_qa@xxx yarn-service-examples]$ yarn app -launch httpd-hrtqa 
httpd/httpd.json
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/01/05 20:39:22 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/01/05 20:39:23 WARN shortcircuit.DomainSocketFactory: The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
18/01/05 20:39:23 INFO client.ServiceClient: Loading service definition from 
local FS: /xxx/yarn-service-examples/httpd/httpd.json
Exception in thread "main" java.lang.IllegalArgumentException: Src_file does 
not exist for config file: httpd-proxy.conf
at 
org.apache.hadoop.yarn.service.provider.AbstractClientProvider.validateConfigFiles(AbstractClientProvider.java:105)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateComponent(ServiceApiUtil.java:224)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateAndResolveService(ServiceApiUtil.java:189)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:213)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionLaunch(ServiceClient.java:204)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:447)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7671) Improve Diagonstic message for stop yarn native service

2017-12-18 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7671:


 Summary: Improve Diagonstic message for stop yarn native service 
 Key: YARN-7671
 URL: https://issues.apache.org/jira/browse/YARN-7671
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Yesha Vora


Steps:

1) Install Hadoop 3.0 cluster
2) Run Yarn service application
{code:title=sleeper.json}{
  "name": "sleeper-service",
  "components" : 
[
  {
"name": "sleeper",
"number_of_containers": 1,
"launch_command": "sleep 90",
"resource": {
  "cpus": 1, 
  "memory": "256"
   }
  }
]
}{code}
{code:title=cmd}
yarn app -launch my-sleeper1 sleeper.json{code}
3) stop yarn service app 
{code:title=cmd}
yarn app -stop my-sleeper1{code}

On stopping yarn service, appId finishes with YarnApplicationState: FINISHED , 
FinalStatus Reported by AM: ENDED and Diagnostics:   Navigate to the failed 
component for more details.

Here,  Diagnostics message should be improved. When an application is 
explicitly stopped by user, the diagnostics message should say " Application 
stopped by user" 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7597) testContainerLogsWithNewAPI and testContainerLogsWithOldAPI UT fails

2017-12-01 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7597:


 Summary: testContainerLogsWithNewAPI and 
testContainerLogsWithOldAPI UT fails
 Key: YARN-7597
 URL: https://issues.apache.org/jira/browse/YARN-7597
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Yesha Vora


testContainerLogsWithNewAPI and testContainerLogsWithOldAPI UT fails
{code}
Stacktrace

java.util.NoSuchElementException: null
at java.util.LinkedList.getFirst(LinkedList.java:244)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.getFileControllerForWrite(LogAggregationFileControllerFactory.java:149)
at 
org.apache.hadoop.yarn.logaggregation.TestContainerLogsUtils.uploadContainerLogIntoRemoteDir(TestContainerLogsUtils.java:122)
at 
org.apache.hadoop.yarn.logaggregation.TestContainerLogsUtils.createContainerLogFileInRemoteFS(TestContainerLogsUtils.java:96)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:541)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogsWithNewAPI(TestNMWebServices.java:342){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7268) testCompareXmlAgainstConfigurationClass fails due to 1 missing property from yarn-default

2017-09-28 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7268:


 Summary: testCompareXmlAgainstConfigurationClass fails due to 1 
missing property from yarn-default
 Key: YARN-7268
 URL: https://issues.apache.org/jira/browse/YARN-7268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Yesha Vora


{code}
Error Message

yarn-default.xml has 1 properties missing in  class 
org.apache.hadoop.yarn.conf.YarnConfiguration
Stacktrace

java.lang.AssertionError: yarn-default.xml has 1 properties missing in  class 
org.apache.hadoop.yarn.conf.YarnConfiguration
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:414)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Standard Output

File yarn-default.xml (253 properties)

yarn-default.xml has 1 properties missing in  class 
org.apache.hadoop.yarn.conf.YarnConfiguration

  yarn.log-aggregation.file-controller.TFile.class

={code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7175) Log collection fails when a container is acquired but not launched on NM

2017-09-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7175:


 Summary: Log collection fails when a container is acquired but not 
launched on NM
 Key: YARN-7175
 URL: https://issues.apache.org/jira/browse/YARN-7175
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Yesha Vora


Scenario:
* Run Spark App
* As soon as spark application finishes, Run "yarn application -status " 
cli in a loop for 2-3 mins to check Log_aggreagtion status. 

I'm noticing that log_aggregation status remains in "RUNNING" and eventually 
ends up with "TIMED_OUT" status.  

This situation happens when an application has acquired a container but it is 
not launched on NM. 

This scenario should be better handled and should not cause this delay to get 
the application log. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7090) testRMRestartAfterNodeLabelDisabled[1] UT Fails

2017-08-23 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7090:


 Summary: testRMRestartAfterNodeLabelDisabled[1] UT Fails
 Key: YARN-7090
 URL: https://issues.apache.org/jira/browse/YARN-7090
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Yesha Vora


testRMRestartAfterNodeLabelDisabled[1] UT fails with below error. 
{code}
Error Message

expected:<[x]> but was:<[]>
Stacktrace

org.junit.ComparisonFailure: expected:<[x]> but was:<[]>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterNodeLabelDisabled(TestRMRestart.java:2408)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7065) [RM UI] App status not getting updated in "All application" page

2017-08-21 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-7065:


 Summary: [RM UI] App status not getting updated in "All 
application" page
 Key: YARN-7065
 URL: https://issues.apache.org/jira/browse/YARN-7065
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Scenario:
1) Run Spark Long Running application
2) Do RM and NN failover randomly
3) Validate App state in Yarn

The Spark applications are finished. Yarn-cli returns correct status of yarn 
application.
{code}
[hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014
17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History 
server at host1 xxx.xx.xx.x:10200
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking 
for the active RM in [rm1, rm2]...
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found 
active RM [rm1]
Application Report : 
Application-Id : application_1503203977699_0014
Application-Name : 
org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources
Application-Type : SPARK
User : hrt_qa
Queue : default
Application Priority : null
Start-Time : 1503215983532
Finish-Time : 1503250203806
Progress : 0%
State : FAILED
Final-State : FAILED
Tracking-URL : 
https://host1:8090/cluster/app/application_1503203977699_0014
RPC Port : -1
AM Host : N/A
Aggregate Resource Allocation : 174722793 MB-seconds, 170603 
vcore-seconds
Log Aggregation Status : SUCCEEDED
Diagnostics : Application application_1503203977699_0014 failed 20 
times due to AM Container for appattempt_1503203977699_0014_20 exited with  
exitCode: 1
For more detailed output, check the application tracking page: 
https://host1:8090/cluster/app/application_1503203977699_0014 Then click on 
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1503203977699_0014_20_01
Exit code: 1
Stack trace: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Launch container failed
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Shell output: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_01/container_e04_1503203977699_0014_20_01.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
Unmanaged Application : false
Application Node Label Expression : 
AM container Node Label Expression : {code}

However, RM UI "All application" page still shows the application in "RUNNING" 
State.  
https://host1:8090/cluster
On clicking application_id ( 
https://host1:8090/cluster/app/application_1503203977699_0014) , it redirects 
to application page and there it shows correct application state = Failed. 

The App status is not getting updated on Yarn All Application page. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6576) Improve Diagonstic by moving Error stack trace from NM to slider AM

2017-05-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6576:


 Summary: Improve Diagonstic by moving Error stack trace from NM to 
slider AM
 Key: YARN-6576
 URL: https://issues.apache.org/jira/browse/YARN-6576
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Slider Master diagonstics should improve to show root cause of App failures for 
issues like missing docker image.

Currently, Slider Master log does not show proper error message to debug such 
failure. User have to access Nodemanager logs to find out root cause of such 
issues where container failed to start. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6233) FSRMStateStore UT fails with IO Timed out Error

2017-02-24 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6233:


 Summary: FSRMStateStore UT fails with IO Timed out Error
 Key: YARN-6233
 URL: https://issues.apache.org/jira/browse/YARN-6233
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


FSRMStateStore UT fails with IO Timed out Error as below.
{code:title=test cmd}
 export MAVEN_OPTS=-Xmx1024m; mvn -B -nsu test 
-Dtest=TestFifoScheduler,TestFairOrderingPolicy,TestFSAppAttempt,TestFSParentQueue,TestQueueManager,TestFairSchedulerFairShare,TestMaxRunningAppsEnforcer,TestAppRunnability,TestFairSchedulerConfiguration,TestFairSchedulerPreemption,TestSchedulingPolicy,TestComputeFairShares,TestFSLeafQueue,TestFairSchedulerEventLog,TestQueuePlacementPolicy,TestFairSchedulerQueueACLs,TestAllocationFileLoaderService,TestFairScheduler,TestDominantResourceFairnessPolicy,TestEmptyQueues,TestQueueCapacities,TestChildQueueOrder,TestQueueMappings,TestParentQueue,TestCapacitySchedulerNodeLabelUpdate,TestNodeLabelContainerAllocation,TestCapacityScheduler,TestApplicationLimits,TestWorkPreservingRMRestartForNodeLabel,TestReservationQueue,TestApplicationLimitsByPartition,TestCapacitySchedulerDynamicBehavior,TestQueueParsing,TestCapacitySchedulerLazyPreemption,TestContainerAllocation,TestLeafQueue,TestCapacitySchedulerSurgicalPreemption,TestReservations,TestCapacitySchedulerQueueACLs,TestUtils,TestPriorityUtilizationQueueOrderingPolicy,TestRMApplicationHistoryWriter,TestResources,TestResourceWeights,TestRMNMRPCResponseId,TestNMReconnect,TestNMExpiry,TestLeveldbRMStateStore,TestZKRMStateStore,TestMemoryRMStateStore,TestFSRMStateStore,TestZKRMStateStoreZKClientConnections,TestSystemMetricsPublisher,TestSimpleCapacityReplanner,TestInMemoryPlan,TestNoOverCommitPolicy,TestRLESparseResourceAllocation,TestCapacitySchedulerPlanFollower,TestInMemoryReservationAllocation,TestSchedulerPlanFollowerBase,TestGreedyReservationAgent,TestReservationInputValidator,TestRpcCall
 --projects :hadoop-yarn-server-resourcemanager,:hadoop-nfs{code}

{code}
Results :

Tests in error: 
  TestFSRMStateStore.testFSRMStateStoreClientRetry:385 »  test timed out after 
3...
  TestFSRMStateStore.testFSRMStateStore:168 » IO Timed out waiting for Mini 
HDFS...

Tests run: 487, Failures: 0, Errors: 2, Skipped: 2

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop NFS .. SUCCESS [  4.172 s]
[INFO] hadoop-yarn-server-resourcemanager . FAILURE [21:57 min]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 22:05 min
[INFO] Finished at: 2017-02-23T21:33:03+00:00
[INFO] Final Memory: 53M/873M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/xxx/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-server-resourcemanager{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6220) Few TestSecureRMRegistryOperations UT fails

2017-02-22 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6220:


 Summary: Few TestSecureRMRegistryOperations UT fails
 Key: YARN-6220
 URL: https://issues.apache.org/jira/browse/YARN-6220
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


8 Tests from TestSecureRMRegistryOperations fails as below.
* testAlicePathRestrictedAnonAccess
* testAnonNoWriteAccess
* testAnonNoWriteAccessOffRoot
* testAnonReadAccess
* testDigestAccess
* testUserHomedirsPermissionsRestricted
* testUserZookeeperHomePathAccess
* testZookeeperCanWriteUnderSystem

{code}
java.lang.reflect.UndeclaredThrowableException: null
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at 
org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations$1.run(TestSecureRMRegistryOperations.java:107)
at 
org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations$1.run(TestSecureRMRegistryOperations.java:98)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations.startRMRegistryOperations(TestSecureRMRegistryOperations.java:97)
at 
org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations.testAnonNoWriteAccess(TestSecureRMRegistryOperations.java:148){code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6219) NM web server related UT fails with "NMWebapps failed to start."

2017-02-22 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6219:


 Summary: NM web server related UT fails with "NMWebapps failed to 
start."
 Key: YARN-6219
 URL: https://issues.apache.org/jira/browse/YARN-6219
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


TestNodeStatusUpdater.testCompletedContainerStatusBackup and TestNMWebServer UT 
fails with NMWebapps failed to start.

{code}
Error Message

NMWebapps failed to start.
Stacktrace

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to 
start.
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices.(NMWebServices.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$$FastClassByGuice$$84485dc9.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at 
com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at 
com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
at 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory$GuiceInstantiatedComponentProvider.getInstance(GuiceComponentProviderFactory.java:332)
at 
com.sun.jersey.server.impl.component.IoCResourceFactory$SingletonWrapper.init(IoCResourceFactory.java:178)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:584)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:581)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.getResourceComponentProvider(WebApplicationImpl.java:581)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:658)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:653)
at 
com.sun.jersey.server.impl.application.RootResourceUriRules.(RootResourceUriRules.java:124)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1298)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:169)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:775)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:771)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:771)
at 
com.sun.jersey.guice.spi.container.servlet.GuiceContainer.initiate(GuiceContainer.java:121)
at 
com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:318)
at 
com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:609)
at 
com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:210)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:710)
at 
com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114)
at 
com.google.inject.servlet.ManagedFilterPipeline.initPipeline(ManagedFilterPipeline.java:98)
at com.google.inject.servlet.GuiceFilter.init(GuiceFilter.java:172)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 

[jira] [Created] (YARN-6189) Improve application status log message when RM restarted when app is in NEW state

2017-02-14 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6189:


 Summary: Improve application status log message when RM restarted 
when app is in NEW state
 Key: YARN-6189
 URL: https://issues.apache.org/jira/browse/YARN-6189
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Yesha Vora


When RM restart/failover happens when application is in NEW state, application 
status command for that application prints below stacktrace.  Improve exception 
message to less confusion to say something like: "application  
is not unknown, may be previous submission is not successful."

{code}
hrt_qa@:/root> yarn application -status application_1470379565464_0001
16/08/05 17:24:29 INFO impl.TimelineClientImpl: Timeline service address: 
https://hostxxx:8190/ws/v1/timeline/
16/08/05 17:24:30 INFO client.AHSProxy: Connecting to Application History 
server at hostxxx/xxx:10200
16/08/05 17:24:31 WARN retry.RetryInvocationHandler: Exception while invoking 
ApplicationClientProtocolPBClientImpl.getApplicationReport over rm1. Not 
retrying because try once and fail.
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1470379565464_0001' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy18.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:436)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:481)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:160)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:83)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException):
 Application with id 'application_1470379565464_0001' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:331)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 

[jira] [Created] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-01 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-6137:


 Summary: Yarn client implicitly invoke ATS client which accesses 
HDFS
 Key: YARN-6137
 URL: https://issues.apache.org/jira/browse/YARN-6137
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Yarn is implicitly trying to invoke ATS Client even though client does not need 
it. and ATSClient code is trying to access hdfs. Due to that service is hitting 
GSS exception. 
Yarnclient is implicitly creating ats client that tries to access Hdfs.
All servers that use yarnclient cannot be expected to change to accommodate 
this behavior.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5941) Slider handles "site.mawo-site.per.component" for multiple components incorrectly

2016-11-29 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-5941:


 Summary: Slider handles "site.mawo-site.per.component" for 
multiple components incorrectly
 Key: YARN-5941
 URL: https://issues.apache.org/jira/browse/YARN-5941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


When multiple components are started by slider and each component should have a 
different property file, "per.component" should be set to true for each 
component.  

{code:title=component1}
'properties': {
'site.app-site.job-builder.class': 'xxx',
'site.app-site.rpc.server.hostname': 'xxx',
'site.app-site.per.component': 'true'
}
{code}

{code:title=component2}
'properties': {
'site.app-site.job-builder.class.component2': 
'yyy',
'site.app-site.rpc.server.hostname.component2': 
'yyy',
'site.app-site.per.component': 'true'
}
{code}

While doing that, one of the component's property file gets 
"per.component"="true" in the slider generated property file. 

{code:title=property file for component1}
#Generated by Apache Slider
#Tue Nov 29 23:20:25 UTC 2016
per.component=true
job-builder.class=xxx
rpc.server.hostname=xxx{code}

{code:title=property file for component2}
#Generated by Apache Slider
#Tue Nov 29 23:20:25 UTC 2016
job-builder.class.component2=yyy
rpc.server.hostname.component2=yyy{code}

"per.component" should not be added in any component's property file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5497) Use different color for Undefined and Succeeded final state in application page

2016-08-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-5497:


 Summary: Use different color for Undefined and Succeeded final 
state in application page
 Key: YARN-5497
 URL: https://issues.apache.org/jira/browse/YARN-5497
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Yesha Vora
Priority: Trivial



When application is in Running state, Final status value is set to "Undefined"
When application is succeeded , Final status value is set to "SUCCEEDED".

Yarn UI use same green color for both the above final status. 
It will be good to have different colors for each final status value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5496) Make Node Heatmap Chart categories clickable

2016-08-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-5496:


 Summary: Make Node Heatmap Chart categories clickable
 Key: YARN-5496
 URL: https://issues.apache.org/jira/browse/YARN-5496
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Make Node Heatmap Chart categories clickable. 

This Heatmap chart has few categories like 10% used, 30% used etc.

This tags should be clickable. If user clicks on 10% used tag, it should shows 
hosts with 10% usage.  This can be a useful feature for clusters having 1000s 
of nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5494) Nodes page throws "Sorry Error Occurred" message

2016-08-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-5494:


 Summary: Nodes page throws "Sorry Error Occurred" message
 Key: YARN-5494
 URL: https://issues.apache.org/jira/browse/YARN-5494
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Priority: Critical


Steps to reproduce:
* Click on Nodes. This page will list nodes of the cluster
* Click on one the nodes  such as node1. ( It will redirect to 
http://:4200/#/yarn-node/:31924/:8042 url)

This url prompts "Sorry Error Occurred" error. 

{code}jquery.js:8630 XMLHttpRequest cannot load http://xxx:xxx:8042/ws/v1/node. 
Cross origin requests are only supported for protocol schemes: http, data, 
chrome, chrome-extension, https, chrome-extension-resource.send @ jquery.js:8630
ember.debug.js:30877 Error: Adapter operation failed
at new Error (native)
at Error.EmberError (http://xxx:4200/assets/vendor.js:25278:21)
at Error.ember$data$lib$adapters$errors$$AdapterError 
(http://xxx:4200/assets/vendor.js:91198:50)
at Class.handleResponse (http://xxx:4200/assets/vendor.js:92494:16)
at Class.hash.error (http://xxx:4200/assets/vendor.js:92574:33)
at fire (http://xxx:4200/assets/vendor.js:3306:30)
at Object.fireWith [as rejectWith] (http://xxx:4200/assets/vendor.js:3418:7)
at done (http://xxx:4200/assets/vendor.js:8473:14)
at XMLHttpRequest. (http://xxx:4200/assets/vendor.js:8806:9)
at Object.send (http://xxx:4200/assets/vendor.js:8837:10)onerrorDefault @ 
ember.debug.js:30877{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5493) In leaf queue page, list applications should only show applications from that leaf queues

2016-08-09 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-5493:


 Summary: In leaf queue page, list applications should only show 
applications from that leaf queues
 Key: YARN-5493
 URL: https://issues.apache.org/jira/browse/YARN-5493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Steps to reproduce:
* Create a 2 queues
* Go to leaf queue page at http://:/#/yarn-queue-apps/ 
url
* click on application list. 

Here, it list down all the applications. Instead , It should list down only 
applications from that particular leaf queue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



  1   2   >