[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id

2013-10-15 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794942#comment-13794942
 ] 

Devaraj K commented on YARN-1299:
-

can you also take care of avoiding lines longer than 80 characters guide line 
for the changes.

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794949#comment-13794949
 ] 

Hadoop QA commented on YARN-1068:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608393/yarn-1068-11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2177//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2177//console

This message is automatically generated.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795069#comment-13795069
 ] 

Hudson commented on YARN-1259:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #363 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/363/])
YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps 
switched. (Robert Kanter via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java


 In Fair Scheduler web UI, queue num pending and num active apps switched
 

 Key: YARN-1259
 URL: https://issues.apache.org/jira/browse/YARN-1259
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Robert Kanter
  Labels: newbie
 Fix For: 2.2.1

 Attachments: YARN-1259.patch


 The values returned in FairSchedulerLeafQueueInfo by numPendingApplications 
 and numActiveApplications should be switched.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795071#comment-13795071
 ] 

Hudson commented on YARN-1182:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #363 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/363/])
YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik 
Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 MiniYARNCluster creates and inits the RM/NM only on start()
 ---

 Key: YARN-1182
 URL: https://issues.apache.org/jira/browse/YARN-1182
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.3.0

 Attachments: yarn-1182-1.patch, yarn-1182-2.patch


 MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
 and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795173#comment-13795173
 ] 

Hudson commented on YARN-1259:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1553 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1553/])
YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps 
switched. (Robert Kanter via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java


 In Fair Scheduler web UI, queue num pending and num active apps switched
 

 Key: YARN-1259
 URL: https://issues.apache.org/jira/browse/YARN-1259
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Robert Kanter
  Labels: newbie
 Fix For: 2.2.1

 Attachments: YARN-1259.patch


 The values returned in FairSchedulerLeafQueueInfo by numPendingApplications 
 and numActiveApplications should be switched.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795175#comment-13795175
 ] 

Hudson commented on YARN-1182:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1553 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1553/])
YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik 
Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 MiniYARNCluster creates and inits the RM/NM only on start()
 ---

 Key: YARN-1182
 URL: https://issues.apache.org/jira/browse/YARN-1182
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.3.0

 Attachments: yarn-1182-1.patch, yarn-1182-2.patch


 MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
 and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1259) In Fair Scheduler web UI, queue num pending and num active apps switched

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795222#comment-13795222
 ] 

Hudson commented on YARN-1259:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1579 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1579/])
YARN-1259. In Fair Scheduler web UI, queue num pending and num active apps 
switched. (Robert Kanter via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerLeafQueueInfo.java


 In Fair Scheduler web UI, queue num pending and num active apps switched
 

 Key: YARN-1259
 URL: https://issues.apache.org/jira/browse/YARN-1259
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Robert Kanter
  Labels: newbie
 Fix For: 2.2.1

 Attachments: YARN-1259.patch


 The values returned in FairSchedulerLeafQueueInfo by numPendingApplications 
 and numActiveApplications should be switched.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1182) MiniYARNCluster creates and inits the RM/NM only on start()

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795224#comment-13795224
 ] 

Hudson commented on YARN-1182:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1579 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1579/])
YARN-1182. MiniYARNCluster creates and inits the RM/NM only on start() (Karthik 
Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532109)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 MiniYARNCluster creates and inits the RM/NM only on start()
 ---

 Key: YARN-1182
 URL: https://issues.apache.org/jira/browse/YARN-1182
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Fix For: 2.3.0

 Attachments: yarn-1182-1.patch, yarn-1182-2.patch


 MiniYARNCluster creates and inits the RM/NM only on start(). It should create 
 and init() during init() itself.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-10-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795266#comment-13795266
 ] 

Jason Lowe commented on YARN-451:
-

bq. current allocation can be seen from the scheduler page.

I took a look at the scheduler page, and all I see for current allocation is 
per-user-per-queue and not per app.  Where are you seeing the current 
assignment for each app on the scheduler page?

As for the instance you recently encountered, showing the current ask would 
have quickly isolated the issue as all 30K maps would have been asked for once 
the app launched.

My main concern with a current-plus-estimated-future approach is that it's 
optional for AMs to implement and requires an API change.  I see showing the 
current and/or ask as more robust across different app frameworks (doesn't 
require AMs to implement anything), easier to implement, and should solve most 
of the problems with identifying where the bottlenecks currently are in 
scheduling containers.  Doing so doesn't preclude adding a total estimate 
metric at some point.

Quick question on the estimate -- is it a calculation of the total app weight 
at the start of the app or do the values decrease as containers are granted?  
The former is useful as a gauge of how big an app is/was overall, while the 
latter is more useful for identifying upcoming demands if the application has 
been running for some time.

 Add more metrics to RM page
 ---

 Key: YARN-451
 URL: https://issues.apache.org/jira/browse/YARN-451
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Sangjin Lee
Priority: Blocker
 Attachments: in_progress_2x.png, yarn-451-trunk-20130916.1.patch


 ResourceManager webUI shows list of RUNNING applications, but it does not 
 tell which applications are requesting more resource compared to others. With 
 cluster running hundreds of applications at once it would be useful to have 
 some kind of metric to show high-resource usage applications vs low-resource 
 usage ones. At the minimum showing number of containers is good option.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage

2013-10-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795350#comment-13795350
 ] 

Zhijie Shen commented on YARN-975:
--

Having thought more about the implementation detail:

1. It seems that the cache mechanism is required immediately. It is a general 
case that users will access the information of the application, its attempts 
and containers consequently by clicking the links on the web page. If we don't 
have the cache mechanism, for every single piece of information, we need to 
read the TFile again from HDFS, which results in poor performance.

To cache the complete history data of an application, we've two choices: one is 
cache the raw TFile, and the other is cache the all the protobuf objects 
recovered from the TFile. I incline to the latter choice, because we can 
organize them in a better data structure for quick access.

2.  The current APIs allow users to write each piece of the information in the 
scope of one application individually. Limited by the current API design, we 
need to open a TFile, when it's a first writing operation for a certain 
application, and keep it open until the last writing operation is finished.

Then, the problem is how we judge all the information for one application has 
been written. One method is to tell the history storage how many attempts and 
containers the application has. Another method is to let the caller to 
explicitly say closing the TFile. However, these two methods will involve the 
interface change, opening more methods.

3. It further raises the question w.r.t the integrity of the history data. In a 
normal case, we expect all the application, the attempts and the containers are 
written into a TFile. However, for some reason, one piece of information is 
missing, and writing operation for it is never done. Then, TFile will always be 
open to wait the missing piece.

Probably we need a timeout trigger to close the TFile no matter all the data 
comes in or not. However, then, should we persist the TFile into HDFS? The 
history data for this application is not complete.

4. However, if we have a timeout trigger for a TFile, RM cannot write the each 
piece of the history information at the end of each object's life cycle without 
coordination. We will then want the writing operations of all the pieces to be 
scheduled together. Then, RM side need more work to coordinate the write 
operations (YARN-953).

[~vinodkv], any suggestions? 

 Add a file-system implementation for history-storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch, YARN-975.5.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795391#comment-13795391
 ] 

Vinod Kumar Vavilapalli commented on YARN-975:
--

Looks good overall. One file per app is reasonable. We already use TFile for 
log-aggregation, so yeah it is good to pick that up too.

Regarding the implementation, see JobHistoryEventHandler. There we flush based 
on two triggers: An upper limit on unflushed records and a time-based trigger. 
We can add one more trigger: Application state change.

 Add a file-system implementation for history-storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch, YARN-975.5.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1292) De-link container life cycle from the process it runs

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1292.
---

Resolution: Duplicate

YARN-1040 existed before this. Closing as duplicate.

 De-link container life cycle from the process it runs
 -

 Key: YARN-1292
 URL: https://issues.apache.org/jira/browse/YARN-1292
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha

 Currently, a container is considered done when its OS process exits. This 
 makes it cumbersome for apps to be able to reuse containers for different 
 processes. Long running daemons may want to run in the same containers as the 
 previous versions. So eg. is an hbase region server crashes/upgraded it would 
 want to restart in the same container where everything it needs would already 
 be warm and ready.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Reopened] (YARN-925) HistoryStorage Reader Interface for Application History Server

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-925:
--


The interface may be changed accordingly like the writer interface

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795418#comment-13795418
 ] 

Arpit Gupta commented on YARN-1308:
---

I think we should set the defaults to

{code}
property
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle/value
descriptionAuxilliary services of NodeManager/description
  /property

  property
nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
  /property
{code}

so nodemanagers will start out of the box.

 set default value for nodemanager aux service
 -

 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor

 Currently in order to get the nodemanagers to start you have to define 
 yarn.nodemanager.aux-services and 
 yarn.nodemanager.aux-services.mapreduce_shuffle.class.
 We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Arpit Gupta (JIRA)
Arpit Gupta created YARN-1308:
-

 Summary: set default value for nodemanager aux service
 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor


Currently in order to get the nodemanagers to start you have to define 
yarn.nodemanager.aux-services and 
yarn.nodemanager.aux-services.mapreduce_shuffle.class.

We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-975) Add a file-system implementation for history-storage

2013-10-15 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795423#comment-13795423
 ] 

Mayank Bansal commented on YARN-975:


[~zjshen]
Is it one TFIle per application or 3 files (1 for application, 1 for attempt 
and one for All containers)?
Protobuf cache is a good Idea.
[~vinodkv] I agrre with you that we should have all the three triggers.

Thanks,
Mayank 

 Add a file-system implementation for history-storage
 

 Key: YARN-975
 URL: https://issues.apache.org/jira/browse/YARN-975
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, 
 YARN-975.4.patch, YARN-975.5.patch


 HDFS implementation should be a standard persistence strategy of history 
 storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-925) HistoryStorage Reader Interface for Application History Server

2013-10-15 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795425#comment-13795425
 ] 

Mayank Bansal commented on YARN-925:


[~zjshen] Why you think readers will be changed?

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1042:
--

Issue Type: Sub-task  (was: New Feature)
Parent: YARN-397

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Junping Du
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795428#comment-13795428
 ] 

Sandy Ryza commented on YARN-1308:
--

This looks like a duplicate of YARN-1289

 set default value for nodemanager aux service
 -

 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor

 Currently in order to get the nodemanagers to start you have to define 
 yarn.nodemanager.aux-services and 
 yarn.nodemanager.aux-services.mapreduce_shuffle.class.
 We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-1308.
--

Resolution: Duplicate

 set default value for nodemanager aux service
 -

 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor

 Currently in order to get the nodemanagers to start you have to define 
 yarn.nodemanager.aux-services and 
 yarn.nodemanager.aux-services.mapreduce_shuffle.class.
 We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-796:
-

Issue Type: Sub-task  (was: New Feature)
Parent: YARN-397

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-445) Ability to signal containers

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-445:


Assignee: Andrey Klochkov

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795461#comment-13795461
 ] 

Omkar Vinit Joshi commented on YARN-1185:
-

I think it would be fair to assume that rename operation is atomic in nature 
and we can split the existing writeFile operation into two calls
* First write the data to .tmp file
* rename it to actual file.

Similarly when we are loading the state if we encounter any file with .tmp 
extension then we will discard it. Attaching the patch which does the same 
thing. Let me know your thoughts.

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1185.1.patch


 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-15 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1185:


Attachment: YARN-1185.1.patch

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1185.1.patch


 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795460#comment-13795460
 ] 

Vinod Kumar Vavilapalli commented on YARN-445:
--

Sorry for jumping real late on this. I see Andrey has been working on patches, 
but haven't looked at them. Trying to see if we are doing it right.

bq. Add YARN API support for ContainerLaunchContext to accept a mapping of 
externally-triggered command names to code. (i.e. 
ctx.setExternalCommand(gracefulShutdown, kill -TERM $CONTAINER_PID).
I think this is a better approach overall. We already support running arbitrary 
command-lines as part of start-container. Even without signalling, we have a 
stopContainer API which clearly indicates that the container be shut-down. 
Either via a flag or a new API, for signalling containers, why don't we just 
implement it as an additional command that is run on the NM. NM can provide 
important information, like user-name, pid, pgrpid, sid etc in a platform 
agnostic manner for that command and we should be all done?

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-896) Roll up for long-lived services in YARN

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-896:
-

Summary: Roll up for long-lived services in YARN  (was: Roll up for long 
lived YARN)

 Roll up for long-lived services in YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1292) De-link container life cycle from the process it runs

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795489#comment-13795489
 ] 

Vinod Kumar Vavilapalli commented on YARN-1292:
---

bq. Please also copy relevant comments from duplicate jira into the parent so 
that they dont get lost. I have done it for this one.
If a duplicate isn't caught early, it is difficult to capture all the 
conversation on both the tickets. We just link both the tickets and assume that 
conversation just moves over.

 De-link container life cycle from the process it runs
 -

 Key: YARN-1292
 URL: https://issues.apache.org/jira/browse/YARN-1292
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.1-beta
Reporter: Bikas Saha

 Currently, a container is considered done when its OS process exits. This 
 makes it cumbersome for apps to be able to reuse containers for different 
 processes. Long running daemons may want to run in the same containers as the 
 previous versions. So eg. is an hbase region server crashes/upgraded it would 
 want to restart in the same container where everything it needs would already 
 be warm and ready.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795511#comment-13795511
 ] 

Hadoop QA commented on YARN-1185:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608545/YARN-1185.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2178//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2178//console

This message is automatically generated.

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1185.1.patch


 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Reopened] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-947:
--


Need to add more history records

 Defining the history data classes for the implementation of the 
 reading/writing interface
 -

 Key: YARN-947
 URL: https://issues.apache.org/jira/browse/YARN-947
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-947.1.patch, YARN-947.2.patch


 We need to define the history data classes have the exact fields to be 
 stored. Therefore, all the implementations don't need to have the duplicate 
 logic to exact the required information from RMApp, RMAppAttempt and 
 RMContainer.
 We use protobuf to define these classes, such that they can be ser/des 
 to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-925) HistoryStorage Reader Interface for Application History Server

2013-10-15 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795590#comment-13795590
 ] 

Mayank Bansal commented on YARN-925:


As discussed closing it as there is no change here.

Thanks,
Mayank

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-925) HistoryStorage Reader Interface for Application History Server

2013-10-15 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal resolved YARN-925.


Resolution: Fixed

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795597#comment-13795597
 ] 

Hitesh Shah commented on YARN-1289:
---

I believe this jira should be considered invalid. YARN does not and should not 
have any implicit dependencies on mapreduce. *If* anyone wants to runs 
MapReduce jobs on a YARN cluster, configuration of the mapreduce shuffle 
service is mandatory - however this does not hold true for cases where someone 
is using YARN to run non-MR applications. 

If someone wanted to change the implementation of the shuffle service to a 
potentially better/faster implementation, defining a default would create a 
problem. Also, in terms of future-proofing, what is the expectation if MR in 
later versions switches to use a different service? Is that expectation that 
default services will keep on changing over time based on MR implementation 
changes? 

 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
Assignee: Junping Du
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795604#comment-13795604
 ] 

Hitesh Shah commented on YARN-1289:
---

[~wenwu] Can you confirm whether your yarn-site.xml has no aux-services 
configured or the mapreduce shuffle service was mis-configured using 
mapreduce.shuffle instead of mapreduce_shuffle?



 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
Assignee: Junping Du
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795610#comment-13795610
 ] 

Bikas Saha commented on YARN-1068:
--

Looks good to me. Will give a day or so for some other committers to take a 
look.

There isnt any need for this to wrap the IOException in another exception. The 
base AdminService protocol signature already supports throwing IOException. 
(ResourceManagerAdministrationProtocol). If its small enough, we could fix it 
this here or do it in a separate jira.
{code}
   private UserGroupInformation checkAcls(String method) throws YarnException {
-UserGroupInformation user;
 try {
-  user = UserGroupInformation.getCurrentUser();
+  return RMServerUtils.verifyAccess(adminAcl, method, LOG);
 } catch (IOException ioe) {
-  LOG.warn(Couldn't get current user, ioe);
-
-  RMAuditLogger.logFailure(UNKNOWN, method,
-  adminAcl.toString(), AdminService,
-  Couldn't get current user);
   throw RPCUtil.getRemoteException(ioe);
 }
{code}

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795611#comment-13795611
 ] 

Karthik Kambatla commented on YARN-1289:


+1 to not adding an MR specific config value by default to YARN.

 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
Assignee: Junping Du
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795605#comment-13795605
 ] 

Hitesh Shah commented on YARN-1308:
---

[~arpitgupta] Can you confirm whether your yarn-site.xml has no aux-services 
configured or the mapreduce shuffle service was mis-configured using 
mapreduce.shuffle instead of mapreduce_shuffle? This clarification will 
help get to the underlying issue. 
 


 set default value for nodemanager aux service
 -

 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor

 Currently in order to get the nodemanagers to start you have to define 
 yarn.nodemanager.aux-services and 
 yarn.nodemanager.aux-services.mapreduce_shuffle.class.
 We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException

2013-10-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1309:
--

 Summary: AdminService unnecessarily wraps an IOException into a 
YarnException
 Key: YARN-1309
 URL: https://issues.apache.org/jira/browse/YARN-1309
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


ResourceManagerAdministrationProtocol allows methods to throw an IOException. 
Still, AdminService wraps IOExceptions as YarnExceptions before throwing them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795629#comment-13795629
 ] 

Karthik Kambatla commented on YARN-1068:


Thanks Bikas. 

bq. There isnt any need for this to wrap the IOException in another exception. 
The base AdminService protocol signature already supports throwing IOException. 
Agree. I did consider leaving it as IOE. However, there are several places in 
AdminService where an IOE is being wrapped into a YarnException. We should 
probably address all of them together in another JIRA. Created YARN-1309.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1295) In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy errors

2013-10-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795669#comment-13795669
 ] 

Hudson commented on YARN-1295:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4609 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4609/])
YARN-1295. In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file 
busy errors. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1532532)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java


 In UnixLocalWrapperScriptBuilder, using bash -c can cause Text file busy 
 errors
 -

 Key: YARN-1295
 URL: https://issues.apache.org/jira/browse/YARN-1295
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.2.1

 Attachments: YARN-1295.patch


 I missed this when working on YARN-1271.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1309.
---

Resolution: Invalid

All YARN protocols only allow IOException for the sake of RPC layer exceptions. 
Exceptions coming from the application layer (as opposed to RPC layer) should 
all be YarnExceptions. See the discussion at YARN-142.

 AdminService unnecessarily wraps an IOException into a YarnException
 

 Key: YARN-1309
 URL: https://issues.apache.org/jira/browse/YARN-1309
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 ResourceManagerAdministrationProtocol allows methods to throw an IOException. 
 Still, AdminService wraps IOExceptions as YarnExceptions before throwing them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-202) Log Aggregation generates a storm of fsync() for namenode

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-202:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-431

 Log Aggregation generates a storm of fsync() for namenode
 -

 Key: YARN-202
 URL: https://issues.apache.org/jira/browse/YARN-202
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 0.23.4
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: yarn-202.patch


 When the log aggregation is on, write to each aggregated container log causes 
 hflush() to be called. For large clusters, this can creates a lot of fsync() 
 calls for namenode. 
 We have seen 6-7x increase in the average number of fsync operations compared 
 to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log 
 aggregation writing to tmp files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1303:


Attachment: YARN-1303.1.patch

In this patch, the Client code will check whether If client give multiple 
--shell_command  or --shell_script options, ds will output a message and saying 
something like Do not support it, please create a shell_script for them

Also, it will check whether in --shell_command has exit ';' or '|', if they are 
exist, ds will output a message and saying something like Do not support 
multiple shell commands and command pipeline, please create a shell_script for 
them 

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795696#comment-13795696
 ] 

Xuan Gong commented on YARN-1303:
-

Did the test on a single cluster:
Input:
{code}
hadoop jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 --shell_command pwd --help
{code}

chosen Output:
{code}
 -shell_command argShell command to be executed by the
 Application Master. Does not support multiple
 --shell_command options, multiple shell
 commands and command pipline. For multiple
 shell commands or command pipeline, please
 create a shell script and use --shell_script
 option
 -shell_script arg Location of the shell script to be executed.
 Support only one --shell_script option. For
 multiple shell scripts, combine them into one
 shell script
{code}


 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795698#comment-13795698
 ] 

Xuan Gong commented on YARN-1303:
-

Input:
{code}
hadoop-3.0.0-SNAPSHOT/bin/hadoop jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 --shell_command pwd --shell_command ls
{code}

part of output:
{code}
INFO distributedshell.Client: Initializing Client
DistributedShell does not support multiple shell commands. Please create a 
shell script and use --shell_script option.
{code}

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1309) AdminService unnecessarily wraps an IOException into a YarnException

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795702#comment-13795702
 ] 

Karthik Kambatla commented on YARN-1309:


Thanks [~vinodkv]. Makes sense. IIUC, on YARN-1068, even the 
RMHAProtocolService should throw YarnExceptions and not IOException. 

 AdminService unnecessarily wraps an IOException into a YarnException
 

 Key: YARN-1309
 URL: https://issues.apache.org/jira/browse/YARN-1309
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 ResourceManagerAdministrationProtocol allows methods to throw an IOException. 
 Still, AdminService wraps IOExceptions as YarnExceptions before throwing them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795701#comment-13795701
 ] 

Xuan Gong commented on YARN-1303:
-

Input:
{code}
hadoop-3.0.0-SNAPSHOT/bin/hadoop jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
hadoop-yarn-project-3.0.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar
 --shell_command ls|pwd
{code}

part of output:
{code}
13/10/15 14:37:41 INFO distributedshell.Client: Initializing Client
DistributedShell does not support multiple commands or command pipeline. Please 
create a shell script for them and use --shell_script option
{code}

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1303:


Attachment: YARN-1303.2.patch

Fix a typo

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795708#comment-13795708
 ] 

Hitesh Shah commented on YARN-1303:
---

[~xgong] Could you clarify why ls;ls or ls | grep foo does not work in the 
first place? Is there a bug in the implementation that needs to be fixed to 
address this basic functionality? 

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795719#comment-13795719
 ] 

Hadoop QA commented on YARN-1303:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608581/YARN-1303.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2179//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2179//console

This message is automatically generated.

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795723#comment-13795723
 ] 

Karthik Kambatla commented on YARN-1068:


Per discussion on YARN-1309 and YARN-142, looks like we should throw 
YarnException and not IOException. However, the actual exceptions to be thrown 
are defined in HAServiceProtocol which doesn't have YarnException listed.

So, I guess we will have to leave the RMHAProtocolService as is.

 Add admin support for HA operations
 ---

 Key: YARN-1068
 URL: https://issues.apache.org/jira/browse/YARN-1068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
 yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
 yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
 yarn-1068-9.patch, yarn-1068-prelim.patch


 Support HA admin operations to facilitate transitioning the RM to Active and 
 Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795727#comment-13795727
 ] 

Hadoop QA commented on YARN-1303:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608583/YARN-1303.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2180//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2180//console

This message is automatically generated.

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-261) Ability to kill AM attempts

2013-10-15 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-261:
-

Attachment: YARN-261--n5.patch

Jason, thanks for review. All your points make sense for me. Attaching a patch 
with fixes.

 Ability to kill AM attempts
 ---

 Key: YARN-261
 URL: https://issues.apache.org/jira/browse/YARN-261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
 YARN-261--n4.patch, YARN-261--n5.patch, YARN-261.patch


 It would be nice if clients could ask for an AM attempt to be killed.  This 
 is analogous to the task attempt kill support provided by MapReduce.
 This feature would be useful in a scenario where AM retries are enabled, the 
 AM supports recovery, and a particular AM attempt is stuck.  Currently if 
 this occurs the user's only recourse is to kill the entire application, 
 requiring them to resubmit a new application and potentially breaking 
 downstream dependent jobs if it's part of a bigger workflow.  Killing the 
 attempt would allow a new attempt to be started by the RM without killing the 
 entire application, and if the AM supports recovery it could potentially save 
 a lot of work.  It could also be useful in workflow scenarios where the 
 failure of the entire application kills the workflow, but the ability to kill 
 an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.1.patch

This patch is for verifying which strategy is better - convert *SecretManagers 
to services with composite pattern or convert SecretManager to be an 
AbstractService.

In this patch, *SecretManagers are converted to services with composite 
pattern. Note that this implementation has lots duplication of the code. This 
code is also a bit tricky, because we need to implement Service interface to 
composite the instance of AbstraceService. If this change is not acceptable, we 
should convert SecretManager to be an AbstractService in HADOOP-10043.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795743#comment-13795743
 ] 

Xuan Gong commented on YARN-1303:
-

[~hitesh]
bq.Could you clarify why ls;ls or ls | grep foo does not work in the first 
place? Is there a bug in the implementation that needs to be fixed to address 
this basic functionality?

I am not sure whether this can be counted as implementation bug. Why those 
commands does not work is on how bash read them.

For example : I give the --shell_command ls;pwd (the command pipeline has the 
same issue)
The script that used to launch ApplicationMaster has something like this :
{code}
exec /bin/bash -c $JAVA_HOME/bin/java -Xmx512m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 
--shell_command ls;pwd 
1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stdout
 
2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stderr
 
{code}

The bash will treat that as two separate command. 

The one is 
{code}
$JAVA_HOME/bin/java -Xmx512m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 
--shell_command ls
{code}
And all the containers will execute shell_command ls.
Verify it by checking the shell script for container
{code}
exec /bin/bash -c ls  
1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_02/stdout
 
2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_02/stderr
 
{code}

The other one is :
{code}
pwd 
1/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stdout
 
2/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1381875664135_0001/container_1381875664135_0001_01_01/AppMaster.stderr
{code}
At the AppMaster.stdout, we can only find out those message
{code}
/Users/xuan/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-localDir-nm-0_0/usercache/xuan/appcache/application_1381875664135_0001/container_1381875664135_0001_01_01
{code}
Which the result when we do pwd

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-15 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795753#comment-13795753
 ] 

Andrey Klochkov commented on YARN-445:
--

Vinod,
Accepting a mapping of arbitrary commands is indeed the most powerful approach. 
Also, this would require lots of changes in the Yarn, as well as an additional 
complexity for app writers. At the same time, are we sure that this flexibility 
is needed, and it won't be an over-engineering and probably an abstraction leak 
in the Yarn framework? By the latter I mean that we will give app writers an 
ability to run arbitrary commands on any node at any point of time, but is it 
in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just 
asking.

Anyway, the scope of what I has proposed with the patch is much smaller and 
solves the task the initial description of this Jira stated - troubleshooting 
of timed out containers by dumping jstack. This would be useful for many Yarn 
uses, so I thought it may make sense to implement it this way now and extend in 
the future if there is a demand. Agree that the way it is exposed in the API 
may be changed to a signal value in the stopContainers request instead of a 
separate call which is indeed a bit confusing.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795755#comment-13795755
 ] 

Junping Du commented on YARN-1289:
--

OK. I am good with removing other MR related configurations from YARN and agree 
this (decoupling MR and YARN) is the right direction. Will file a JIRA soon. 
Thanks for sharing the vision, [~hitesh]!

 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
Assignee: Junping Du
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1303) Allow multiple commands separating with ;

2013-10-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795762#comment-13795762
 ] 

Hitesh Shah commented on YARN-1303:
---

[[~xgong] From what you mention, it seems like that there is a bug in the 
client code which is not escaping and quoting the command line args for the 
ApplicationMaster correctly. i.e. it should be doing something like:

org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_memory 128 --container_vcores 1 --num_containers 2 --priority 
0 --shell_command ls;pwd

 Allow multiple commands separating with ;
 -

 Key: YARN-1303
 URL: https://issues.apache.org/jira/browse/YARN-1303
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1303.1.patch, YARN-1303.2.patch


 In shell, we can do ls; ls to run 2 commands at once. 
 In distributed shell, this is not working. We should improve to allow this to 
 occur. There are practical use cases that I know of to run multiple commands 
 or to set environment variables before a command.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1310) Get rid of MR settings in YARN configuration

2013-10-15 Thread Junping Du (JIRA)
Junping Du created YARN-1310:


 Summary: Get rid of MR settings in YARN configuration
 Key: YARN-1310
 URL: https://issues.apache.org/jira/browse/YARN-1310
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Junping Du


Per discussion in YARN-1289, we should get rid of MR settings (like below) and 
default values in YARN configuration which put unnecessary dependency for YARN 
on MR. 

{code}
  !--Map Reduce configuration--
  property
nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
  /property

  property
namemapreduce.job.jar/name
value/
  /property

  property
namemapreduce.job.hdfs-servers/name
value${fs.defaultFS}/value
  /property
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1289:
-

Assignee: (was: Junping Du)

 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1289) Configuration yarn.nodemanager.aux-services should have default value for mapreduce_shuffle.

2013-10-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795770#comment-13795770
 ] 

Junping Du commented on YARN-1289:
--

Filed YARN-1310 to track removing MR settings in yarn configuration. If nobody 
against, will mark this JIRA as invalid later.

 Configuration yarn.nodemanager.aux-services should have default value for 
 mapreduce_shuffle.
 --

 Key: YARN-1289
 URL: https://issues.apache.org/jira/browse/YARN-1289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: wenwupeng
Assignee: Junping Du
 Attachments: YARN-1289.patch


 Failed to run benchmark when not configure yarn.nodemanager.aux-services 
 value in yarn-site.xml', it is better to configure default value.
 13/10/09 22:19:23 INFO mapreduce.Job: Task Id : 
 attempt_1381371516570_0001_m_00_1, Status : FAILED
 Container launch failed for container_1381371516570_0001_01_05 : 
 org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The 
 auxService:mapreduce_shuffle does not exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
 at 
 org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
 at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-677) Increase coverage to FairScheduler

2013-10-15 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov resolved YARN-677.
--

Resolution: Won't Fix

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
Assignee: Andrey Klochkov
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1310) Get rid of MR settings in YARN configuration

2013-10-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1310:
---

Hadoop Flags: Incompatible change

Marking this as an incompatible change as it requires user-action.

 Get rid of MR settings in YARN configuration
 

 Key: YARN-1310
 URL: https://issues.apache.org/jira/browse/YARN-1310
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Junping Du

 Per discussion in YARN-1289, we should get rid of MR settings (like below) 
 and default values in YARN configuration which put unnecessary dependency for 
 YARN on MR. 
 {code}
   !--Map Reduce configuration--
   property
 nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
 valueorg.apache.hadoop.mapred.ShuffleHandler/value
   /property
   property
 namemapreduce.job.jar/name
 value/
   /property
   property
 namemapreduce.job.hdfs-servers/name
 value${fs.defaultFS}/value
   /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1310) Get rid of MR settings in YARN configuration

2013-10-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1310:
---

Affects Version/s: 2.2.0

 Get rid of MR settings in YARN configuration
 

 Key: YARN-1310
 URL: https://issues.apache.org/jira/browse/YARN-1310
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Junping Du

 Per discussion in YARN-1289, we should get rid of MR settings (like below) 
 and default values in YARN configuration which put unnecessary dependency for 
 YARN on MR. 
 {code}
   !--Map Reduce configuration--
   property
 nameyarn.nodemanager.aux-services.mapreduce_shuffle.class/name
 valueorg.apache.hadoop.mapred.ShuffleHandler/value
   /property
   property
 namemapreduce.job.jar/name
 value/
   /property
   property
 namemapreduce.job.hdfs-servers/name
 value${fs.defaultFS}/value
   /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796209#comment-13796209
 ] 

Karthik Kambatla commented on YARN-1172:


Copying the code from AbstractService doesn't seem like a good idea. I think we 
should avoid it if possible. 

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796212#comment-13796212
 ] 

Karthik Kambatla commented on YARN-1172:


Just thinking out loud. Along the lines of Suresh's suggestion, how about 
create a Service for each of the YARN-related SecretManger which actually 
instantiates, starts and kills the SecretManager. For instance, there could be 
a RMContainerTokenSecretManagerService that has an instance of 
RMContainerTokenSecretManager. Creates it on init(), starts it on start(), and 
stops it on stop(). Thoughts?

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-261) Ability to kill AM attempts

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796225#comment-13796225
 ] 

Hadoop QA commented on YARN-261:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608589/YARN-261--n5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestJobCleanup

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.TestUberAM

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2181//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2181//console

This message is automatically generated.

 Ability to kill AM attempts
 ---

 Key: YARN-261
 URL: https://issues.apache.org/jira/browse/YARN-261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
 YARN-261--n4.patch, YARN-261--n5.patch, YARN-261.patch


 It would be nice if clients could ask for an AM attempt to be killed.  This 
 is analogous to the task attempt kill support provided by MapReduce.
 This feature would be useful in a scenario where AM retries are enabled, the 
 AM supports recovery, and a particular AM attempt is stuck.  Currently if 
 this occurs the user's only recourse is to kill the entire application, 
 requiring them to resubmit a new application and potentially breaking 
 downstream dependent jobs if it's part of a bigger workflow.  Killing the 
 attempt would allow a new attempt to be started by the RM without killing the 
 entire application, and if the AM supports recovery it could potentially save 
 a lot of work.  It could also be useful in workflow scenarios where the 
 failure of the entire application kills the workflow, but the ability to kill 
 an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1308) set default value for nodemanager aux service

2013-10-15 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796235#comment-13796235
 ] 

Arpit Gupta commented on YARN-1308:
---

[~hitesh]

Confirmed when yarn.nodemanager.aux-services and 
yarn.nodemanager.aux-services.mapreduce_shuffle.class are not defined the 
nodemanager comes up and registers with the RM.

 set default value for nodemanager aux service
 -

 Key: YARN-1308
 URL: https://issues.apache.org/jira/browse/YARN-1308
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Arpit Gupta
Priority: Minor

 Currently in order to get the nodemanagers to start you have to define 
 yarn.nodemanager.aux-services and 
 yarn.nodemanager.aux-services.mapreduce_shuffle.class.
 We should set these as defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1311:
-

 Summary: Fix app specific scheduler-events' names to be 
app-attempt based
 Key: YARN-1311
 URL: https://issues.apache.org/jira/browse/YARN-1311
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Trivial


Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers 
as schedulers only deal with AppAttempts today. This JIRA is for fixing their 
names so that we can add App-level events in the near future, notably for 
work-preserving RM-restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1311:
--

Attachment: YARN-1311-20131015.txt

Straight forward patch with event-renaming. No change to any logic.

 Fix app specific scheduler-events' names to be app-attempt based
 

 Key: YARN-1311
 URL: https://issues.apache.org/jira/browse/YARN-1311
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Trivial
 Attachments: YARN-1311-20131015.txt


 Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are 
 misnomers as schedulers only deal with AppAttempts today. This JIRA is for 
 fixing their names so that we can add App-level events in the near future, 
 notably for work-preserving RM-restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796248#comment-13796248
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

I came up with creating SecretManagerServiceT as a base class for 
YARN-related SecretManager like this:https://gist.github.com/oza/7000796.

This may be better than creating a Service for each of the YARN-related 
SecretManger, because we can avoid code duplication between 
*SecretManagerService.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796275#comment-13796275
 ] 

Hadoop QA commented on YARN-1311:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12608614/YARN-1311-20131015.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2182//console

This message is automatically generated.

 Fix app specific scheduler-events' names to be app-attempt based
 

 Key: YARN-1311
 URL: https://issues.apache.org/jira/browse/YARN-1311
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Trivial
 Attachments: YARN-1311-20131015.txt


 Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are 
 misnomers as schedulers only deal with AppAttempts today. This JIRA is for 
 fixing their names so that we can add App-level events in the near future, 
 notably for work-preserving RM-restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.2.patch

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1181) Implement MiniYARNHACluster

2013-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796292#comment-13796292
 ] 

Karthik Kambatla commented on YARN-1181:


Have worked on this some. I think the best way to do this is to actually 
augment MiniYARNCluster to allow creating a cluster with multiple RMs, instead 
of duplicating the code in another class.

 Implement MiniYARNHACluster
 ---

 Key: YARN-1181
 URL: https://issues.apache.org/jira/browse/YARN-1181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for 
 end-to-end HA tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1181) Implement MiniYARNHACluster

2013-10-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1181:
---

Attachment: yarn-1181-1.patch

First-cut patch that adds the functionality.

 Implement MiniYARNHACluster
 ---

 Key: YARN-1181
 URL: https://issues.apache.org/jira/browse/YARN-1181
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1181-1.patch


 MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for 
 end-to-end HA tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796321#comment-13796321
 ] 

Hadoop QA commented on YARN-1172:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608627/YARN-1172.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2183//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2183//console

This message is automatically generated.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1312) Job History server queue attribute incorrectly reports default when username is actually used for queue at runtime

2013-10-15 Thread Philip Zeyliger (JIRA)
Philip Zeyliger created YARN-1312:
-

 Summary: Job History server queue attribute incorrectly reports 
default when username is actually used for queue at runtime
 Key: YARN-1312
 URL: https://issues.apache.org/jira/browse/YARN-1312
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Philip Zeyliger


If you run a MapReduce job with the fair scheduler and you query the JobHistory 
server for its metadata, you might see something like the following at 
http://jh_host:19888/ws/v1/history/mapreduce/jobs/job_1381878638171_0001/

{code}
job
startTime1381890132608/startTime
finishTime1381890141988/finishTime
idjob_1381878638171_0001/id
nameTeraGen/name
queuedefault/queue
userhdfs/user
...
/job
{code}

The same is true if you query the RM while it's running via 
http://rm_host:8088/ws/v1/cluster/apps/application_1381878638171_0002:
{code}
app
idapplication_1381878638171_0002/id
userhdfs/user
nameTeraGen/name
queuedefault/queue
...
/app
{code}

As it turns out, in both of these cases, the job is actually executing in 
root.hdfs and not in root.default because 
{{yarn.scheduler.fair.user-as-default-queue}} is set to true.

This makes it hard to figure out after the fact (or during!) what queue the MR 
job was running under.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-947:
-

Attachment: YARN-947.3.patch

Created a new patch incrementally, which includes the following modifications:

1. The biggest thing here is to add another two set of protobuf records in 
addition to the set of HistoryData, which are the set of StartData and 
that of FinishData. In fact, HistoryData = StartData + 
FinishData. The duplicate part is the Id, which serves as the key. 
StartData contains the fields that are determined when the object (RMApp, 
RMAppAttempt and RMContainer) starts, while FinishData contains the fields 
that are determined when the object finishes. With the separated records, we 
can redesign the writer interface to write part of the data when the object 
starts and the other when the object finishes, therefore reducing the loss of 
information when the history data cannot be completely record (e.g. RM crash).

2. Change all protobuf records from interface to abstract class, and add the 
builtin newInstance method for users to call.

3. Improve toString() of PBImpl here as well, which is filed in YARN-1066. 
Therefore, I'll close that jira as duplicate

4. Fix a bug in ContainerHistoryDataPBImpl.

5. Instead of recording ContainerState, I change to record ContainerExitCode. 
The reason is stated in YARN-1123: 
https://issues.apache.org/jira/browse/YARN-1123?focusedCommentId=13793962page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13793962
ContainerState is always FINISHED for all the containers, which is meaningless. 
Instead, ContainerExitStatus, which is the exit code, can indicate the problems 
in the container.

[~vinodkv], would you please review it again?

 Defining the history data classes for the implementation of the 
 reading/writing interface
 -

 Key: YARN-947
 URL: https://issues.apache.org/jira/browse/YARN-947
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch


 We need to define the history data classes have the exact fields to be 
 stored. Therefore, all the implementations don't need to have the duplicate 
 logic to exact the required information from RMApp, RMAppAttempt and 
 RMContainer.
 We use protobuf to define these classes, such that they can be ser/des 
 to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1066) Improve toString implementation for PBImpls for AHS

2013-10-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796392#comment-13796392
 ] 

Zhijie Shen commented on YARN-1066:
---

YARN-947 is reopened, so let's fix the issue together there. Close this ticket 
as duplicate

 Improve toString implementation for PBImpls for AHS
 ---

 Key: YARN-1066
 URL: https://issues.apache.org/jira/browse/YARN-1066
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-1045 improves toString implementation for PBImpls, AHS's PBImpls should 
 be changed accordingly



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1066) Improve toString implementation for PBImpls for AHS

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1066.
---

Resolution: Duplicate

 Improve toString implementation for PBImpls for AHS
 ---

 Key: YARN-1066
 URL: https://issues.apache.org/jira/browse/YARN-1066
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 YARN-1045 improves toString implementation for PBImpls, AHS's PBImpls should 
 be changed accordingly



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-947) Defining the history data classes for the implementation of the reading/writing interface

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796399#comment-13796399
 ] 

Hadoop QA commented on YARN-947:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608648/YARN-947.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2184//console

This message is automatically generated.

 Defining the history data classes for the implementation of the 
 reading/writing interface
 -

 Key: YARN-947
 URL: https://issues.apache.org/jira/browse/YARN-947
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-947.1.patch, YARN-947.2.patch, YARN-947.3.patch


 We need to define the history data classes have the exact fields to be 
 stored. Therefore, all the implementations don't need to have the duplicate 
 logic to exact the required information from RMApp, RMAppAttempt and 
 RMContainer.
 We use protobuf to define these classes, such that they can be ser/des 
 to/from bytes, which are easier for persistence.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-934) HistoryStorage writer interface for Application History Server

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-934:
-

Attachment: YARN-934.4.patch

Talked to [~vinodkv], and we thought it's better to split the each writing 
operations into two. One is executed when the object (RMApp, RMAppAttempt  or 
RMContainer) is started, recording the information that is already available. 
The other is executed when the object reaches its finishing stage, recording 
the information that is finally determined.

I uploaded a new incremental patch to draft the new writer interface. In 
addition to that, I modified ApplicationHistoryStore as well. I change it from 
the interface to the abstract class, which extends AbstractService. Therefore, 
The implementations of it (e.g. FS storage, DB storage) can make use of the 
life cycle of a service, doing the necessary initialization and cleanup work in 
the corresponding stage.

 HistoryStorage writer interface for Application History Server
 --

 Key: YARN-934
 URL: https://issues.apache.org/jira/browse/YARN-934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch, 
 YARN-934.4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1002) Optimizing the reading/writing operations of FileSystemHistoryStorage

2013-10-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796420#comment-13796420
 ] 

Zhijie Shen commented on YARN-1002:
---

Brainstormed with [~vinodkv] and [~mayank_bansal]. Since we've already made the 
proof-of-concept end-to-end AHS work, we should move on to make the 
production-ready AHS. Therefore, we need to make sure FileSystemHistoryStorage 
performs well before merging AHS into trunk. Therefore, this optimization work 
will be currently done with YARN-975. Close this ticket as duplicate.

 Optimizing the reading/writing operations of FileSystemHistoryStorage
 -

 Key: YARN-1002
 URL: https://issues.apache.org/jira/browse/YARN-1002
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Whenever the end-to-end system is done, we need to improve the performance of 
 the reading/writing operations of FileSystemHistoryStorage.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1002) Optimizing the reading/writing operations of FileSystemHistoryStorage

2013-10-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1002.
---

Resolution: Duplicate

 Optimizing the reading/writing operations of FileSystemHistoryStorage
 -

 Key: YARN-1002
 URL: https://issues.apache.org/jira/browse/YARN-1002
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Whenever the end-to-end system is done, we need to improve the performance of 
 the reading/writing operations of FileSystemHistoryStorage.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-934) HistoryStorage writer interface for Application History Server

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796422#comment-13796422
 ] 

Hadoop QA commented on YARN-934:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608654/YARN-934.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2185//console

This message is automatically generated.

 HistoryStorage writer interface for Application History Server
 --

 Key: YARN-934
 URL: https://issues.apache.org/jira/browse/YARN-934
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-321

 Attachments: YARN-934.1.patch, YARN-934.2.patch, YARN-934.3.patch, 
 YARN-934.4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)