from:"Harsh J \(JIRA\)"

[jira] [Assigned] (YARN-511) ConverterUtils's getPathFromYarnURL and getYarnUrlFromPath work with fully qualified paths alone but don't state or check that

2018-05-15 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-511:


Assignee: (was: Harsh J)

> ConverterUtils's getPathFromYarnURL and getYarnUrlFromPath work with fully 
> qualified paths alone but don't state or check that
> --
>
> Key: YARN-511
> URL: https://issues.apache.org/jira/browse/YARN-511
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Major
>
> See thread: http://search-hadoop.com/m/IFGhp1C1o4j
> Aside: Naming discrepancy here: getPathFromYarnURL and getYarnUrlFromPath 
> should have consistent URL capitalization. Generally, Url is preferred to fit 
> the camel case and other classnames around Hadoop these days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-349) Send out last-minute load averages in TaskTrackerStatus

2017-03-08 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-349:


Assignee: (was: Harsh J)

> Send out last-minute load averages in TaskTrackerStatus
> ---
>
> Key: YARN-349
> URL: https://issues.apache.org/jira/browse/YARN-349
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
> Attachments: mapreduce.loadaverage.r3.diff, 
> mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff, 
> mapreduce.loadaverage.r6.diff
>
>   Original Estimate: 20m
>  Remaining Estimate: 20m
>
> Load averages could be useful in scheduling. This patch looks to extend the 
> existing Linux resource plugin (via /proc/loadavg file) to allow transmitting 
> load averages of the last one minute via the TaskTrackerStatus.
> Patch is up for review, with test cases added, at: 
> https://reviews.apache.org/r/20/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage

2016-03-04 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180718#comment-15180718
 ] 

Harsh J commented on YARN-4767:
---

One add-on note that we can likely also address with this one: The AmIpFilter 
resolves the proxy addresses to host addresses (getAllByName, getHostAddress) 
every single time a request is made to it, vs. caching it upfront. I think we 
should not try to resolve it on-request unless we have errors, cause the proxy 
address list does not usually change over time on an already running AM?

> Network issues can cause persistent RM UI outage
> 
>
> Key: YARN-4767
> URL: https://issues.apache.org/jira/browse/YARN-4767
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4263) Capacity scheduler 60%-40% formatting floating point issue

2015-10-21 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966500#comment-14966500
 ] 

Harsh J commented on YARN-4263:
---

Thank you for the fix and tests! Some comments:

- Could you remove the whitespaces in the test addition? Also, did you check if 
the tests fail reliably without the change added along, just to eliminate away 
any format changes?
- Lets switch to using {{org.apache.hadoop.util.StringUtils.formatPercent}} 
method instead of adding a duplicate inside YARN.
- Am also wondering if we should do a single decimal place instead of two, just 
to be compatible with the usual 40.0/60.0/0.0/100.0 outputs.

> Capacity scheduler 60%-40% formatting floating point issue
> --
>
> Key: YARN-4263
> URL: https://issues.apache.org/jira/browse/YARN-4263
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.1
>Reporter: Adrian Kalaszi
>Priority: Trivial
>  Labels: easyfix
> Attachments: YARN-4263.001.patch
>
>
> If capacity scheduler is set with two queues to 60% and 40% capacity, due to 
> a java float floating representation issue
> {code}
> > hadoop queue -list
> ==
> Queue Name : default 
> Queue State : running 
> Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 
> 0.0 
> ==
> Queue Name : large 
> Queue State : running 
> Scheduling Info : Capacity: 60.04, MaximumCapacity: 100.0, 
> CurrentCapacity: 0.0 
> {code}
> Because 
> {code} System.err.println((0.6f) * 100); {code}
> results in 60.04.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common

2015-10-03 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942274#comment-14942274
 ] 

Harsh J commented on YARN-4222:
---

Failed tests aren't related. Thanks for the changes!

+1, committing shortly.

Quick notes:
- Please do not set a Fix Version. Use Target Version field instead. The Fix 
Version must indicate only the branches where it has *already* been committed 
to. The former is to indicate requests of branches it must go to, so is more 
appropriate.
- For more typo corrections in future, please also feel free to roll up 
multiple corrections into the same patch.

> Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
> 
>
> Key: YARN-4222
> URL: https://issues.apache.org/jira/browse/YARN-4222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
> Attachments: YARN-4222.001.patch
>
>
> Spotted this typo in the code while working on a separate YARN issue.
> E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES
> Checked in the whole project. Found a few occurrences of the typo in 
> code/comment. 
> The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common

2015-10-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-4222:
--
Fix Version/s: (was: 2.8.0)

> Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
> 
>
> Key: YARN-4222
> URL: https://issues.apache.org/jira/browse/YARN-4222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
> Attachments: YARN-4222.001.patch
>
>
> Spotted this typo in the code while working on a separate YARN issue.
> E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES
> Checked in the whole project. Found a few occurrences of the typo in 
> code/comment. 
> The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common

2015-10-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-4222:
--
Target Version/s: 2.8.0

> Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
> 
>
> Key: YARN-4222
> URL: https://issues.apache.org/jira/browse/YARN-4222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
> Attachments: YARN-4222.001.patch
>
>
> Spotted this typo in the code while working on a separate YARN issue.
> E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES
> Checked in the whole project. Found a few occurrences of the typo in 
> code/comment. 
> The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-4065) container-executor error should include effective user id

2015-08-28 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-4065:
-

Assignee: Casey Brotherton

 container-executor error should include effective user id
 -

 Key: YARN-4065
 URL: https://issues.apache.org/jira/browse/YARN-4065
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Casey Brotherton
Assignee: Casey Brotherton
Priority: Trivial

 When container-executor fails to access it's config file, the following 
 message will be thrown:
 {code}
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
 from container executor initialization is : 24
 ExitCodeException exitCode=24: Invalid conf file provided : 
 /etc/hadoop/conf/container-executor.cfg
 {code}
 The real problem may be a change in the container-executor not running as set 
 uid root.
 From:
 https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
 {quote}
 The container-executor program must be owned by root and have the permission 
 set ---sr-s---.
 {quote}
 The error message could be improved by printing out the effective user id 
 with the error message, and possibly the executable trying to access the 
 config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4065) container-executor error should include effective user id

2015-08-24 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708940#comment-14708940
 ] 

Harsh J commented on YARN-4065:
---

Agreed - and figuring this has wasted a few mins at another customer I worked 
with last week. This would be a welcome change - would you be willing to submit 
a patch adding the context to the error message?

 container-executor error should include effective user id
 -

 Key: YARN-4065
 URL: https://issues.apache.org/jira/browse/YARN-4065
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Casey Brotherton
Priority: Trivial

 When container-executor fails to access it's config file, the following 
 message will be thrown:
 {code}
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
 from container executor initialization is : 24
 ExitCodeException exitCode=24: Invalid conf file provided : 
 /etc/hadoop/conf/container-executor.cfg
 {code}
 The real problem may be a change in the container-executor not running as set 
 uid root.
 From:
 https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
 {quote}
 The container-executor program must be owned by root and have the permission 
 set ---sr-s---.
 {quote}
 The error message could be improved by printing out the effective user id 
 with the error message, and possibly the executable trying to access the 
 config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2

2015-04-14 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-3462:
--
Target Version/s: 2.8.0, 2.7.1
Hadoop Flags: Reviewed

Thanks [~Naganarasimha], lgtm, +1. Committing shortly.

 Patches applied for YARN-2424 are inconsistent between trunk and branch-2
 -

 Key: YARN-3462
 URL: https://issues.apache.org/jira/browse/YARN-3462
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Sidharta Seethana
Assignee: Naganarasimha G R
 Attachments: YARN-3462.20150508-1.patch


 It looks like the changes for YARN-2424 are not the same for trunk (commit 
 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
 and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2015-04-06 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482505#comment-14482505
 ] 

Harsh J commented on YARN-2424:
---

[~sidharta-s] - Yes, it appears the warning was skipped in the branch-2 patch, 
likely by accident. Thanks for spotting this!

Could you file a new YARN JIRA to port the warning back into branch-2?

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
 Fix For: 2.6.0

 Attachments: Y2424-1.patch, YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377378#comment-14377378
 ] 

Harsh J commented on YARN-1880:
---

+1, this still applies. Committing shortly, thanks [~ozawa] (and [~ajisakaa] 
for the earlier review)!



 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-1880:
--
Component/s: test

 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA

2015-03-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-1880:
--
Affects Version/s: 2.6.0

 Cleanup TestApplicationClientProtocolOnHA
 -

 Key: YARN-1880
 URL: https://issues.apache.org/jira/browse/YARN-1880
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Trivial
 Fix For: 2.8.0

 Attachments: YARN-1880.1.patch


 The tests introduced on YARN-1521 includes multiple assertion with . We 
 should separate them because it's difficult to identify which condition is 
 illegal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3376) [MR-279] NM UI should get a read-only view instead of the actual NMContext

2015-03-19 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-2745 to YARN-3376:
--

  Component/s: (was: mrv2)
   nodemanager
Affects Version/s: (was: 0.23.0)
   2.6.0
  Key: YARN-3376  (was: MAPREDUCE-2745)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 [MR-279] NM UI should get a read-only view instead of the actual NMContext 
 ---

 Key: YARN-3376
 URL: https://issues.apache.org/jira/browse/YARN-3376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Anupam Seth
Priority: Trivial
  Labels: newbie
 Attachments: MAPREDUCE-2745-branch-0_23.patch, 
 MAPREDUCE-2745-branch-0_23_v2.patch


 NMContext is modifiable, the UI should only get read-only access. Just like 
 the AM web-ui.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-09 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313593#comment-14313593
]

Harsh J commented on YARN-3021:
---

Thanks again [~vinodkv] and [~yzhangal],

bq. bq. RM can simply inspect the incoming renewer specified in the token and
skip renewing those tokens if the renewer doesn't match it's own address. This
way, we don't need an explicit API in the submission context.
bq. I think this will work, and is a preferable solution to me. What do others
think?

I'd be willing to accept that approach, but for one small worry: Any app
sending in a token with a bad renewer set could get through with such a change,
whereas previously it'd be rejected outright. Not that it'd be harmful (as it
is ignored), but it could still be seen as a behaviour change, no?

The current patch OTOH, is explicit in demanding a config/flag to be set for
direct awareness of such a thing. That sounds more cleaner to me to do.

YARN's delegation-token handling disallows certain trust setups to operate
properly over DistCp
---

Key: YARN-3021
URL: https://issues.apache.org/jira/browse/YARN-3021
Project: Hadoop YARN
Issue Type: Bug
Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Attachments: YARN-3021.001.patch, YARN-3021.002.patch,
YARN-3021.003.patch, YARN-3021.patch

Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON,
and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN
clusters.
Now if one logs in with a COMMON credential, and runs a job on A's YARN that
needs to access B's HDFS (such as a DistCp), the operation fails in the RM,
as it attempts a renewDelegationToken(…) synchronously during application
submission (to validate the managed token before it adds it to a scheduler
for automatic renewal). The call obviously fails cause B realm will not trust
A's credentials (here, the RM's principal is the renewer).
In the 1.x JobTracker the same call is present, but it is done asynchronously
and once the renewal attempt failed we simply ceased to schedule any further
attempts of renewals, rather than fail the job immediately.
We should change the logic such that we attempt the renewal but go easy on
the failure and skip the scheduling alone, rather than bubble back an error
to the client, failing the app submission. This way the old behaviour is
retained.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-06 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-3021:
--
Summary: YARN's delegation-token handling disallows certain trust setups to 
operate properly over DistCp  (was: YARN's delegation-token handling disallows 
certain trust setups to operate properly)

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-06 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310570#comment-14310570
]

Harsh J commented on YARN-3021:
---

[~vinodkv],

Many thanks for the response here!

bq. Though the patch unblocks the jobs in the short term, it seems like long
term this is still bad.

I agree in that it does not resolve the problem. The goal we're seeking is also
short-term, in that of bringing back a behaviour that got allowed on MR1, in
MR2 - even though both end up facing the same issue.

The longer term approach sounds like the most optimal thing to do for proper
resolution, but given some users are getting blocked by this behaviour change
I'd like to know if there'll be any objections in adding the current approach
as an interim-fix (the doc for the property does/will claim it disables several
necessary features of the job), and file subsequent JIRAs for implementing the
standalone renewer?

bq. Irrespective of how we decide to skip tokens, the way the patch is skipping
renewal will not work. In secure mode, DelegationTokenRenewer drives the app
state machine. So if you skip adding the app itself to DTR, the app will be
completely stuck.

In our simple tests the app did run through successfully with such an approach,
but there was multiple factors we did not test for (app recovery, task
failures, etc. which could be impacted). Would it be better if we added in a
morphed DelegationTokenRenewer (which does NOP as part of actual renewal
logic), instead of skipping adding in the renewer completely?

YARN's delegation-token handling disallows certain trust setups to operate
properly
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-30 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298513#comment-14298513
 ] 

Harsh J commented on YARN-3021:
---

Overall the patch looks fine to me, but please do hold up for [~vinodkv] or 
another YARN active committer to take a look.

Could you conceive a test case for this as well, to catch regressions in 
behaviour in future? For example it could be done by adding an invalid token 
with the app, but with this option turned on. With the option turned off, such 
a thing will always fail and app gets rejected, but with the fix in proper 
behaviour it will pass through the submit procedure at least. Checkout the 
test-case modified in the earlier patch for a reusable reference.

Also, could you document the added MR config in mapred-default.xml, describing 
its use and marking it also as advanced, as it disables some features of a 
regular resilient application such as token reuse and renewals.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-09 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-3021:
--
Attachment: YARN-3021.patch

A patch that illustrates the change.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-08 Thread Harsh J (JIRA)

Harsh J created YARN-3021:
-

 Summary: YARN's delegation-token handling disallows certain trust 
setups to operate properly
 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J


Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and 
B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
clusters.

Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as 
it attempts a renewDelegationToken(…) synchronously during application 
submission (to validate the managed token before it adds it to a scheduler for 
automatic renewal). The call obviously fails cause B realm will not trust A's 
credentials (here, the RM's principal is the renewer).

In the 1.x JobTracker the same call is present, but it is done asynchronously 
and once the renewal attempt failed we simply ceased to schedule any further 
attempts of renewals, rather than fail the job immediately.

We should change the logic such that we attempt the renewal but go easy on the 
failure and skip the scheduling alone, rather than bubble back an error to the 
client, failing the app submission. This way the old behaviour is retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2999) Compilation error in AllocationConfiguration.java in java1.7 while running tests

2015-01-01 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2999:
--
Labels: jdk7  (was: )

 Compilation error in AllocationConfiguration.java in java1.7 while running 
 tests
 

 Key: YARN-2999
 URL: https://issues.apache.org/jira/browse/YARN-2999
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
  Labels: jdk7
 Attachments: 0001-YARN-2999.patch


 In AllocationConfiguration, in the below object creation, generic type must 
 be specified as instance variable,otherwise java1.7 lead compilation error 
 while running tests for RM and NM
 {{reservableQueues = new HashSet();}}
 Report :
 {code}
 java.lang.Error: Unresolved compilation problem: 
   '' operator is not allowed for source level below 1.7
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfiguration.init(AllocationConfiguration.java:150)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1276)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1320)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:559)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:985)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart$TestSecurityMockRM.init(TestRMRestart.java:2027)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart$TestSecurityMockRM.init(TestRMRestart.java:2020)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testAppAttemptTokensRestoredOnRMRestart(TestRMRestart.java:1199)
 {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI

2014-12-12 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2950:
--
Attachment: YARN-2950-2.patch

Thanks Dustin!

Looks good to me. I've gone ahead and made a small change to keep the line 
lengths less than 80 characters as per formatting requirements.

Committing in a bit.

 Change message to mandate, not suggest JS requirement on UI
 ---

 Key: YARN-2950
 URL: https://issues.apache.org/jira/browse/YARN-2950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Dustin Cote
Priority: Minor
  Labels: newbie
 Attachments: YARN-2950-1.patch, YARN-2950-2.patch


 Most of YARN's UIs do not work with JavaScript disabled on the browser, cause 
 they appear to send back data as JS arrays instead of within the actual HTML 
 content.
 The JQueryUI prints only a mild warning about this suggesting that {{This 
 page works best with javascript enabled.}}, when in fact it ought to be 
 {{This page will not function without javascript enabled. Please enable 
 javascript on your browser.}} or something as such (more direct).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2950) Change message to mandate, not suggest JS requirement on UI

2014-12-11 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242829#comment-14242829
 ] 

Harsh J commented on YARN-2950:
---

File of message is 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java}}

 Change message to mandate, not suggest JS requirement on UI
 ---

 Key: YARN-2950
 URL: https://issues.apache.org/jira/browse/YARN-2950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Reporter: Harsh J
Priority: Minor

 Most of YARN's UIs do not work with JavaScript disabled on the browser, cause 
 they appear to send back data as JS arrays instead of within the actual HTML 
 content.
 The JQueryUI prints only a mild warning about this suggesting that {{This 
 page works best with javascript enabled.}}, when in fact it ought to be 
 {{This page will not function without javascript enabled. Please enable 
 javascript on your browser.}} or something as such (more direct).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2950) Change message to mandate, not suggest JS requirement on UI

2014-12-11 Thread Harsh J (JIRA)

Harsh J created YARN-2950:
-

 Summary: Change message to mandate, not suggest JS requirement on 
UI
 Key: YARN-2950
 URL: https://issues.apache.org/jira/browse/YARN-2950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Reporter: Harsh J
Priority: Minor


Most of YARN's UIs do not work with JavaScript disabled on the browser, cause 
they appear to send back data as JS arrays instead of within the actual HTML 
content.

The JQueryUI prints only a mild warning about this suggesting that {{This page 
works best with javascript enabled.}}, when in fact it ought to be {{This page 
will not function without javascript enabled. Please enable javascript on your 
browser.}} or something as such (more direct).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI

2014-12-11 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2950:
--
Labels: newbie  (was: )

 Change message to mandate, not suggest JS requirement on UI
 ---

 Key: YARN-2950
 URL: https://issues.apache.org/jira/browse/YARN-2950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Reporter: Harsh J
Priority: Minor
  Labels: newbie

 Most of YARN's UIs do not work with JavaScript disabled on the browser, cause 
 they appear to send back data as JS arrays instead of within the actual HTML 
 content.
 The JQueryUI prints only a mild warning about this suggesting that {{This 
 page works best with javascript enabled.}}, when in fact it ought to be 
 {{This page will not function without javascript enabled. Please enable 
 javascript on your browser.}} or something as such (more direct).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI

2014-12-11 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2950:
--
Affects Version/s: 2.5.0

 Change message to mandate, not suggest JS requirement on UI
 ---

 Key: YARN-2950
 URL: https://issues.apache.org/jira/browse/YARN-2950
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Affects Versions: 2.5.0
Reporter: Harsh J
Priority: Minor
  Labels: newbie

 Most of YARN's UIs do not work with JavaScript disabled on the browser, cause 
 they appear to send back data as JS arrays instead of within the actual HTML 
 content.
 The JQueryUI prints only a mild warning about this suggesting that {{This 
 page works best with javascript enabled.}}, when in fact it ought to be 
 {{This page will not function without javascript enabled. Please enable 
 javascript on your browser.}} or something as such (more direct).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2891) Failed Container Executor does not provide a clear error message

2014-12-01 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2891:
--
Hadoop Flags: Reviewed

 Failed Container Executor does not provide a clear error message
 

 Key: YARN-2891
 URL: https://issues.apache.org/jira/browse/YARN-2891
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.1
 Environment: any
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor
 Attachments: YARN-2891-1.patch


 When checking access to directories, the container executor does not provide 
 clear information on which directory actually could not be accessed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2891) Failed Container Executor does not provide a clear error message

2014-11-23 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-2891:
-

Assignee: Dustin Cote

Assigning to Dustin as he mentioned offline that he'd like to contribute on 
this.

[~rohithsharma] - The clarity issue is within the LinuxContainerExecutor (C++) 
code.

 Failed Container Executor does not provide a clear error message
 

 Key: YARN-2891
 URL: https://issues.apache.org/jira/browse/YARN-2891
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.1
 Environment: any
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor

 When checking access to directories, the container executor does not provide 
 clear information on which directory actually could not be accessed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails

2014-11-17 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214806#comment-14214806
 ] 

Harsh J commented on YARN-2578:
---

bq. We never implemented health monitoring like in ZKFC with HDFS

Was this not desired for some reason, or just punted in the early 
implementation? Seems worthy to always have such a thing.

 NM does not failover timely if RM node network connection fails
 ---

 Key: YARN-2578
 URL: https://issues.apache.org/jira/browse/YARN-2578
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wilfred Spiegelenburg
 Attachments: YARN-2578.patch


 The NM does not fail over correctly when the network cable of the RM is 
 unplugged or the failure is simulated by a service network stop or a 
 firewall that drops all traffic on the node. The RM fails over to the standby 
 node when the failure is detected as expected. The NM should than re-register 
 with the new active RM. This re-register takes a long time (15 minutes or 
 more). Until then the cluster has no nodes for processing and applications 
 are stuck.
 Reproduction test case which can be used in any environment:
 - create a cluster with 3 nodes
 node 1: ZK, NN, JN, ZKFC, DN, RM, NM
 node 2: ZK, NN, JN, ZKFC, DN, RM, NM
 node 3: ZK, JN, DN, NM
 - start all services make sure they are in good health
 - kill the network connection of the RM that is active using one of the 
 network kills from above
 - observe the NN and RM failover
 - the DN's fail over to the new active NN
 - the NM does not recover for a long time
 - the logs show a long delay and traces show no change at all
 The stack traces of the NM all show the same set of threads. The main thread 
 which should be used in the re-register is the Node Status Updater This 
 thread is stuck in:
 {code}
 Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in 
 Object.wait() [0x7f5a51fc1000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
   - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.Client.call(Client.java:1362)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
 {code}
 The client connection which goes through the proxy can be traced back to the 
 ResourceTrackerPBClientImpl. The generated proxy does not time out and we 
 should be using a version which takes the RPC timeout (from the 
 configuration) as a parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-28 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2760:
--
Attachment: YARN-2760.patch

Re-uploading patch to retry after the patching issue was fixed in buildbot.

 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Attachments: YARN-2760.patch, YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-27 Thread Harsh J (JIRA)

Harsh J created YARN-2760:
-

 Summary: Completely remove word 'experimental' from FairScheduler 
docs
 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial


After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
use, but the doc change done in that did not entirely cover removal of that 
word, leaving a remnant in the preemption sub-point. This needs to be removed 
as well, as the feature has been good to use for a long time now, and is not 
experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-27 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-2760:
--
Attachment: YARN-2760.patch

 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Attachments: YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-27 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186325#comment-14186325
 ] 

Harsh J commented on YARN-2760:
---

Patch can certainly be applied. Script or build box is having issues:
{code}
YARN-2760 patch is being downloaded at Tue Oct 28 03:11:10 UTC 2014 from
http://issues.apache.org/jira/secure/attachment/12677508/YARN-2760.patch
cp: cannot stat '/home/jenkins/buildSupport/lib/*': No such file or directory
Error: Patch dryrun couldn't detect changes the patch would make. Exiting.
PATCH APPLICATION FAILED
{code}

 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Attachments: YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits

2014-08-14 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-281:


Assignee: Wangda Tan  (was: Harsh J)

Sorry on delay, reassigned.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'

2014-05-04 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-1918:
-

Assignee: Anandha L Ranganathan

 Typo in description and error message for 'yarn.resourcemanager.cluster-id'
 ---

 Key: YARN-1918
 URL: https://issues.apache.org/jira/browse/YARN-1918
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Anandha L Ranganathan
Priority: Trivial
  Labels: newbie

 1.  In yarn-default.xml
 {code:xml}
 property
 descriptionName of the cluster. In a HA setting,
   this is used to ensure the RM participates in leader
   election fo this cluster and ensures it does not affect
   other clusters/description
 nameyarn.resourcemanager.cluster-id/name
 !--valueyarn-cluster/value--
   /property
 {code}
 Here the line 'election fo this cluster and ensures it does not affect' 
 should be replaced with  'election for this cluster and ensures it does not 
 affect'.
 2. 
 {code:xml}
 org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't 
 specifyyarn.resourcemanager.cluster-id
   at 
 org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336)
 {code}
 In the above exception message, it is missing a space between message and 
 configuration name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1487) How to develop with Eclipse

2013-12-09 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved YARN-1487.
---

Resolution: Invalid

The plugin effort has moved out of Apache Hadoop into its own Apache 
(incubator) project called Hadoop Developer Tools (HDT), which you can visit 
and ask further questions at, http://hdt.incubator.apache.org.

In future, please do not open JIRAs to ask general questions. Please post them 
to the u...@hadoop.apache.org mailing lists instead. The JIRA instance exists 
for the project developers and contributors to use for tracking validated bugs, 
features and enhancements, not for serving the user community.

 How to develop with Eclipse
 ---

 Key: YARN-1487
 URL: https://issues.apache.org/jira/browse/YARN-1487
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 2.2.0
 Environment: Linux,Hadoop2
Reporter: Yang Hao
  Labels: eclipse, plugin, yarn
 Fix For: 2.2.0


 We can develop an application on Eclipse, but the Eclipse plugin is not 
 provided on Hadoop2. Will the new version provide Eclipse plugin for 
 developers?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Resolved] (YARN-1486) How to develop an application with Eclipse

2013-12-09 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved YARN-1486.
---

Resolution: Invalid

Resolving as Invalid. Please see my comment on YARN-1487 on why.

 How to develop an application with Eclipse
 --

 Key: YARN-1486
 URL: https://issues.apache.org/jira/browse/YARN-1486
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 2.2.0
Reporter: Yang Hao
 Fix For: trunk-win






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (YARN-1200) Provide a central view for rack topologies

2013-09-16 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768195#comment-13768195
 ] 

Harsh J commented on YARN-1200:
---

A reasonable regression-fixing first step is to match that of the HDFS 
functionality: Each NameNode (and NameNode alone) needs the rack resolution 
script, not all the DNs.

 Provide a central view for rack topologies
 --

 Key: YARN-1200
 URL: https://issues.apache.org/jira/browse/YARN-1200
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Harsh J

 It appears that with YARN, any AM (such as the MRv2 AM) that tries to do 
 rack-info-based work, will need to resolve racks locally rather than get rack 
 info from YARN directly: 
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054
  and its use of a simple implementation of 
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
 This is a regression, as we've traditionally only had users maintain rack 
 mappings and its associated script on a single master role node (JobTracker), 
 not at every compute node. Task spawning hosts have never done/needed rack 
 resolution of their own.
 It is silly to have to maintain rack configs and their changes on all nodes. 
 We should have the RM host a stable interface service so that there's only a 
 single view of the topology across the cluster, and document for AMs to use 
 that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1200) Provide a central view for rack topologies

2013-09-13 Thread Harsh J (JIRA)

Harsh J created YARN-1200:
-

 Summary: Provide a central view for rack topologies
 Key: YARN-1200
 URL: https://issues.apache.org/jira/browse/YARN-1200
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Harsh J


It appears that with YARN, any AM (such as the MRv2 AM) that tries to do 
rack-info-based work, will need to resolve racks locally rather than get rack 
info from YARN directly: 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054
 and its use of a simple implementation of 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java

This is a regression, as we've traditionally only had users maintain rack 
mappings and its associated script on a single master role node (JobTracker), 
not at every compute node. Task spawning hosts have never done/needed rack 
resolution of their own.

It is silly to have to maintain rack configs and their changes on all nodes. We 
should have the RM host a stable interface service so that there's only a 
single view of the topology across the cluster, and document for AMs to use 
that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-553) Have YarnClient generate a directly usable ApplicationSubmissionContext

2013-06-18 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686972#comment-13686972
 ] 

Harsh J commented on YARN-553:
--

I'm fine with what Arun's proposed above - a single API call that does it all 
for you (since it has the relevant context) would be very nice for app writers.

 Have YarnClient generate a directly usable ApplicationSubmissionContext
 ---

 Key: YARN-553
 URL: https://issues.apache.org/jira/browse/YARN-553
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-553-1.patch, yarn-553-2.patch


 Right now, we're doing multiple steps to create a relevant 
 ApplicationSubmissionContext for a pre-received GetNewApplicationResponse.
 {code}
 GetNewApplicationResponse newApp = yarnClient.getNewApplication();
 ApplicationId appId = newApp.getApplicationId();
 ApplicationSubmissionContext appContext = 
 Records.newRecord(ApplicationSubmissionContext.class);
 appContext.setApplicationId(appId);
 {code}
 A simplified way may be to have the GetNewApplicationResponse itself provide 
 a helper method that builds a usable ApplicationSubmissionContext for us. 
 Something like:
 {code}
 GetNewApplicationResponse newApp = yarnClient.getNewApplication();
 ApplicationSubmissionContext appContext = 
 newApp.generateApplicationSubmissionContext();
 {code}
 [The above method can also take an arg for the container launch spec, or 
 perhaps pre-load defaults like min-resource, etc. in the returned object, 
 aside of just associating the application ID automatically.]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE

2013-06-17 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685549#comment-13685549
 ] 

Harsh J commented on YARN-842:
--

This seems relevant 
http://stackoverflow.com/questions/9433789/users-report-occasional-message-message-json-is-undefined.
 Was your IE also IE7?

 Resource Manager  Node Manager UI's doesn't work with IE
 -

 Key: YARN-842
 URL: https://issues.apache.org/jira/browse/YARN-842
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K
Assignee: Devaraj K

 {code:xml}
 Webpage error details
 User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
 SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
 Center PC 6.0)
 Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
 Message: 'JSON' is undefined
 Line: 41
 Char: 218
 Code: 0
 URI: http://10.18.40.24:8088/cluster/apps
 {code}
 RM  NM UI's are not working with IE and showing the above error for every 
 link on the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env

2013-05-13 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656454#comment-13656454
 ] 

Harsh J commented on YARN-356:
--

Hey Lohit,

I think we can doc these on the yarn-env.sh template we ship, to make users 
aware of its presence. I hadn't closed the ticket due to that non-doc factor, 
but wanted to note that these are already being looked-for.

 Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
 ---

 Key: YARN-356
 URL: https://issues.apache.org/jira/browse/YARN-356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Lohit Vijayarenu

 At present it is difficult to set different Xmx values for RM and NM without 
 having different yarn-env.sh. Like HDFS, it would be good to have 
 YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env

2013-05-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened YARN-356:
--


 Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
 ---

 Key: YARN-356
 URL: https://issues.apache.org/jira/browse/YARN-356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Lohit Vijayarenu

 At present it is difficult to set different Xmx values for RM and NM without 
 having different yarn-env.sh. Like HDFS, it would be good to have 
 YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml

2013-04-15 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened YARN-20:
-


 More information for yarn.resourcemanager.webapp.address in yarn-default.xml
 --

 Key: YARN-20
 URL: https://issues.apache.org/jira/browse/YARN-20
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: nemon lou
Priority: Trivial
 Attachments: YARN-20.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

   The parameter  yarn.resourcemanager.webapp.address in yarn-default.xml  is 
 in host:port format,which is noted in the cluster set up guide 
 (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html).
   When i read though the code,i find host format is also supported. In 
 host format,the port will be random.
   So we may add more documentation in  yarn-default.xml for easy understood.
   I will submit a patch if it's helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2013-04-12 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629907#comment-13629907
 ] 

Harsh J commented on YARN-570:
--

Thanks for the report and the patch!

With this patch it now renders it this way:

renderHadoopDate() - Wed, 10 Apr 2013 08:29:56 GMT+05:30
format() - 10-Apr-2013 08:29:56

Which I think is still inconsistent. Ideally, I think, we'd want the former 
everywhere for consistency. Can you update format() as well to print in the 
same style, if you agree?

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: PengZhang
 Attachments: MAPREDUCE-5141.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-555) ContainerLaunchContext is buggy when it comes to setter methods on a new instance

2013-04-08 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-555:
-

Description: 
If you look at the API of ContainerLaunchContext, its got setter methods, such 
as for setResource, setCommands, etc…:

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List)

However, there's certain things broken in its use here that am trying to 
understand. Let me explain with some code context:

1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext).

{code}
ContainerLaunchContext appMasterLaunchContext = 
Records.newRecord(ContainerLaunchContext.class);
appContext.setAMContainerSpec(appMasterLaunchContext);
{code}

2. I create a resource request of 130 MB, as applicationMasterResource, and try 
to set it into the CLC via:

{code}
appContext.getAMContainerSpec().setResource(applicationMasterResource);
{code}

3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} 
call.

4. So I attempt to do the same with setCommands/setEnvironment/etc., all of 
which fail to mutate cause the check in CLC's implementation class disregards 
whatever I try to set for some reason.

Edit: It seems like the issue is that when I do a 
appContext.getAMContainerSpec().getLocalResources() or similar call to get 
existing initialized data structures to populate further on, what I really get 
underneath is a silently non-mutative data structure that I can call .put or 
.add on, but it won't really reflect it.

  was:
If you look at the API of ContainerLaunchContext, its got setter methods, such 
as for setResource, setCommands, etc…:

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List)

However, there's certain things broken in its use here that am trying to 
understand. Let me explain with some code context:

1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext).

{code}
ContainerLaunchContext appMasterLaunchContext = 
Records.newRecord(ContainerLaunchContext.class);
appContext.setAMContainerSpec(appMasterLaunchContext);
{code}

2. I create a resource request of 130 MB, as applicationMasterResource, and try 
to set it into the CLC via:

{code}
appContext.getAMContainerSpec().setResource(applicationMasterResource);
{code}

3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} 
call.

4. So I attempt to do the same with setCommands/setEnvironment/etc., all of 
which fail to mutate cause the check in CLC's implementation class disregards 
whatever I try to set. This is cause of these null checks which keep passing:

{code}
  // ContainerLaunchContextPBImpl.java
  @Override
  public void setCommands(final ListString commands) {
if (commands == null)
  return;
initCommands();
this.commands.clear();
this.commands.addAll(commands);
  }
{code}

This is rather non intuitive as a check. If I am to set something, setting it 
should take place. If it is null, do not return but instead set whats provided? 
I'm not even sure why that null check exists - it seems to do so from the start 
of time.

However, {{setResource(…)}} works pretty fine, as the call has no such odd 
check:

{code}
  @Override
  public void setResource(Resource resource) {
maybeInitBuilder();
if (resource == null) 
  builder.clearResource();
this.resource = resource;
  }
{code}


 ContainerLaunchContext is buggy when it comes to setter methods on a new 
 instance
 -

 Key: YARN-555
 URL: https://issues.apache.org/jira/browse/YARN-555
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor

 If you look at the API of ContainerLaunchContext, its got setter methods, 
 such as for setResource, setCommands, etc…:
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List)
 However, there's certain things broken in its use here that am trying to 
 understand. Let me explain with some code context:
 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext).
 {code}
 ContainerLaunchContext appMasterLaunchContext = 
 Records.newRecord(ContainerLaunchContext.class);
 appContext.setAMContainerSpec(appMasterLaunchContext);
 {code}
 2. I create a resource request of 130 MB, as applicationMasterResource, and 
 try to set it into the CLC via:
 {code}
 appContext.getAMContainerSpec().setResource(applicationMasterResource);
 {code}
 3. This works OK. If I query it back now, it returns 130 for a 
 {{getMemory()}} call.
 4. So I attempt to do the same with

[jira] [Commented] (YARN-555) ContainerLaunchContext is buggy when it comes to setter methods on a new instance

2013-04-08 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625236#comment-13625236
 ] 

Harsh J commented on YARN-555:
--

If I do:

{code}
MapString, LocalResource localResources = new HashMapString, 
LocalResource();
localResources.put(node-ring-app-master.jar, appMasterJarResource);
appContext.getAMContainerSpec().setLocalResources(localResources);
{code}

Things work fine.

If I instead do the more extending form:

{code}
MapString, LocalResource localResources = 
appContext.getAMContainerSpec().getLocalResources();
localResources.put(node-ring-app-master.jar, appMasterJarResource);
appContext.getAMContainerSpec().setLocalResources(localResources);
{code}

Then the mutations don't stick.

Wonder if this is somehow a Java oddity?

 ContainerLaunchContext is buggy when it comes to setter methods on a new 
 instance
 -

 Key: YARN-555
 URL: https://issues.apache.org/jira/browse/YARN-555
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor

 If you look at the API of ContainerLaunchContext, its got setter methods, 
 such as for setResource, setCommands, etc…:
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List)
 However, there's certain things broken in its use here that am trying to 
 understand. Let me explain with some code context:
 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext).
 {code}
 ContainerLaunchContext appMasterLaunchContext = 
 Records.newRecord(ContainerLaunchContext.class);
 appContext.setAMContainerSpec(appMasterLaunchContext);
 {code}
 2. I create a resource request of 130 MB, as applicationMasterResource, and 
 try to set it into the CLC via:
 {code}
 appContext.getAMContainerSpec().setResource(applicationMasterResource);
 {code}
 3. This works OK. If I query it back now, it returns 130 for a 
 {{getMemory()}} call.
 4. So I attempt to do the same with setCommands/setEnvironment/etc., all of 
 which fail to mutate cause the check in CLC's implementation class disregards 
 whatever I try to set for some reason.
 Edit: It seems like the issue is that when I do a 
 appContext.getAMContainerSpec().getLocalResources() or similar call to get 
 existing initialized data structures to populate further on, what I really 
 get underneath is a silently non-mutative data structure that I can call .put 
 or .add on, but it won't really reflect it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-552) Expose resource metrics as part of YarnClusterMetrics

2013-04-07 Thread Harsh J (JIRA)

Harsh J created YARN-552:


 Summary: Expose resource metrics as part of YarnClusterMetrics
 Key: YARN-552
 URL: https://issues.apache.org/jira/browse/YARN-552
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


Right now, the YarnClusterMetrics just has the total number of node managers 
returned in it (when queried from a Client - RM).

It would be useful to also expose NodeManager resource capacities and scheduler 
max/min resource limits over it to allow clients to pre-determine or 
pre-compute runtime feasibility without having to request an Application first 
to get some of this information.

This does not need to be an incompatible change, and we can continue exposing 
the same values as part of the GetNewApplicationResponse too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-553) Have GetNewApplicationResponse generate a directly usable ApplicationSubmissionContext

2013-04-07 Thread Harsh J (JIRA)

Harsh J created YARN-553:


 Summary: Have GetNewApplicationResponse generate a directly usable 
ApplicationSubmissionContext
 Key: YARN-553
 URL: https://issues.apache.org/jira/browse/YARN-553
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


Right now, we're doing multiple steps to create a relevant 
ApplicationSubmissionContext for a pre-received GetNewApplicationResponse.

{code}
GetNewApplicationResponse newApp = yarnClient.getNewApplication();
ApplicationId appId = newApp.getApplicationId();

ApplicationSubmissionContext appContext = 
Records.newRecord(ApplicationSubmissionContext.class);

appContext.setApplicationId(appId);
{code}

A simplified way may be to have the GetNewApplicationResponse itself provide a 
helper method that builds a usable ApplicationSubmissionContext for us. 
Something like:

{code}
GetNewApplicationResponse newApp = yarnClient.getNewApplication();
ApplicationSubmissionContext appContext = 
newApp.generateApplicationSubmissionContext();
{code}

[The above method can also take an arg for the container launch spec, or 
perhaps pre-load defaults like min-resource, etc. in the returned object, aside 
of just associating the application ID automatically.]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env

2013-01-25 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562876#comment-13562876
 ] 

Harsh J commented on YARN-356:
--

These are already present and used (in the yarn script) but aren't doc'd in the 
yarn-env.sh template.

 Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
 ---

 Key: YARN-356
 URL: https://issues.apache.org/jira/browse/YARN-356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Lohit Vijayarenu

 At present it is difficult to set different Xmx values for RM and NM without 
 having different yarn-env.sh. Like HDFS, it would be good to have 
 YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-349) Send out last-minute load averages in TaskTrackerStatus

2013-01-20 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-2170 to YARN-349:
-

 Tags:   (was: load average, tasktracker)
  Component/s: (was: jobtracker)
   nodemanager
Fix Version/s: (was: 0.24.0)
Affects Version/s: (was: 0.22.0)
   2.0.0-alpha
 Release Note:   (was: Add support for transmitting previous-minute 
load averages in TaskTrackerStatus)
  Key: YARN-349  (was: MAPREDUCE-2170)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Send out last-minute load averages in TaskTrackerStatus
 ---

 Key: YARN-349
 URL: https://issues.apache.org/jira/browse/YARN-349
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Harsh J
 Attachments: mapreduce.loadaverage.r3.diff, 
 mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff, 
 mapreduce.loadaverage.r6.diff

   Original Estimate: 20m
  Remaining Estimate: 20m

 Load averages could be useful in scheduling. This patch looks to extend the 
 existing Linux resource plugin (via /proc/loadavg file) to allow transmitting 
 load averages of the last one minute via the TaskTrackerStatus.
 Patch is up for review, with test cases added, at: 
 https://reviews.apache.org/r/20/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-40) Provide support for missing yarn commands

2013-01-16 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555176#comment-13555176
 ] 

Harsh J commented on YARN-40:
-

Junping,

Do you mean an equivalent for the yarn node command for MRv1 tasktrackers? I 
guess it could be done if there is value in it (personally I've not seen people 
interested in monitoring a single TT's state of maps/reduces via the CLI). 
Other than that, the commands seem to be YARN specific?

 Provide support for missing yarn commands
 -

 Key: YARN-40
 URL: https://issues.apache.org/jira/browse/YARN-40
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.0-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 2.0.3-alpha

 Attachments: MAPREDUCE-4155-1.patch, MAPREDUCE-4155.patch, 
 YARN-40-1.patch, YARN-40-20120917.1.txt, YARN-40-20120917.txt, 
 YARN-40-20120924.txt, YARN-40-20121008.txt, YARN-40.patch


 1. status app-id
 2. kill app-id (Already issue present with Id : MAPREDUCE-3793)
 3. list-apps [all]
 4. nodes-report

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits

2012-12-20 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-4171 to YARN-281:
-

  Component/s: (was: mrv2)
   (was: test)
   scheduler
Affects Version/s: (was: 2.0.0-alpha)
   2.0.0-alpha
  Key: YARN-281  (was: MAPREDUCE-4171)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Harsh J
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-284) YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized cluster

2012-12-20 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-3268 to YARN-284:
-

  Component/s: (was: scheduler)
   scheduler
Affects Version/s: (was: 0.23.0)
   2.0.0-alpha
  Key: YARN-284  (was: MAPREDUCE-3268)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized 
 cluster
 --

 Key: YARN-284
 URL: https://issues.apache.org/jira/browse/YARN-284
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon

 The fair scheduler in MR1 has the behavior that, if a job is submitted to an 
 under-utilized cluster and the cluster has more open slots than tasks in the 
 job, the tasks are spread evenly throughout the cluster. This improves job 
 latency since more spindles and NICs are utilized to complete the job. In MR2 
 I see this issue causing significantly longer job runtimes when there is 
 excess capacity in the cluster -- especially on reducers which sometimmes end 
 up clumping together on a smaller set of nodes which then become disk/network 
 constrained.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-239) Make link in Aggregation is not enabled. Try the nodemanager at

2012-11-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-4509 to YARN-239:
-

  Component/s: (was: webapps)
   nodemanager
Fix Version/s: (was: 0.23.5)
   (was: 3.0.0)
Affects Version/s: (was: 0.23.0)
   2.0.0-alpha
  Key: YARN-239  (was: MAPREDUCE-4509)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Make link in Aggregation is not enabled. Try the nodemanager at
 -

 Key: YARN-239
 URL: https://issues.apache.org/jira/browse/YARN-239
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Radim Kolar
Priority: Trivial

 if log aggregation is disabled message is displayed 
 *Aggregation is not enabled. Try the nodemanager at reavers.com:9006*
 It would be helpfull to make link to nodemanager clickable.
 This message is located in 
 /hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
  but i could not figure out how to make link in hamlet framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (YARN-239) Make link in Aggregation is not enabled. Try the nodemanager at

2012-11-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened YARN-239:
--


Worth linking the NM if it can be done.

Apologies if no one had got back to you on the Hamlet question yet, it is a 
rather new part in the framework - but I think this is worth having.

 Make link in Aggregation is not enabled. Try the nodemanager at
 -

 Key: YARN-239
 URL: https://issues.apache.org/jira/browse/YARN-239
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: Radim Kolar
Priority: Trivial

 if log aggregation is disabled message is displayed 
 *Aggregation is not enabled. Try the nodemanager at reavers.com:9006*
 It would be helpfull to make link to nodemanager clickable.
 This message is located in 
 /hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
  but i could not figure out how to make link in hamlet framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-238) ClientRMProtocol needs to allow the specification of a ResourceRequest so that the Application Master's Container can be placed on the specified host

2012-11-22 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502897#comment-13502897
 ] 

Harsh J commented on YARN-238:
--

Is there also need for this to be a strict need or is it good to be flexible 
(i.e. non guaranteeing) like other resource requests (we do a good locality job 
to not have this concern very frequently, but still)?

 ClientRMProtocol needs to allow the specification of a ResourceRequest so 
 that the Application Master's Container can be placed on the specified host
 -

 Key: YARN-238
 URL: https://issues.apache.org/jira/browse/YARN-238
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Vinayak Borkar

 Currently a client is able to specify only resource requirements in terms of 
 amount of memory required while launching an ApplicationMaster. There needs 
 to be a way to ask for resources using a ResourceRequest so that a host name 
 could be specified in addition to the amount of memory required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-168) No way to turn off virtual memory limits without turning off physical memory limits

2012-10-18 Thread Harsh J (JIRA)

Harsh J created YARN-168:


 Summary: No way to turn off virtual memory limits without turning 
off physical memory limits
 Key: YARN-168
 URL: https://issues.apache.org/jira/browse/YARN-168
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Harsh J


Asked and reported by a user (Krishna) on ML:

{quote}
This is possible to do, but you've hit a bug with the current YARN
implementation. Ideally you should be able to configure the vmem-pmem
ratio (or an equivalent config) to be -1, to indicate disabling of
virtual memory checks completely (and there's indeed checks for this),
but it seems like we are enforcing the ratio to be at least 1.0 (and
hence negatives are disallowed).

You can't workaround by setting the NM's offered resource.mb to -1
either, as you'll lose out on controlling maximum allocations.

Please file a YARN bug on JIRA. The code at fault lies under
ContainersMonitorImpl#init(…).



On Thu, Oct 18, 2012 at 4:00 PM, Krishna Kishore Bonagiri
write2kish...@gmail.com wrote:
 Hi,

   Is there a way we can ask the YARN RM for not killing a container when it
 uses excess virtual memory than the maximum it can use as per the
 specification in the configuration file yarn-site.xml? We can't always
 estimate the amount of virtual memory needed for our application running on
 a container, but we don't want to get it killed in a case it exceeds the
 maximum limit.

   Please suggest as to how can we come across this issue.

 Thanks,
 Kishore
{quote}

Basically, we're doing:

{code}
// / Virtual memory configuration //
float vmemRatio = conf.getFloat(
YarnConfiguration.NM_VMEM_PMEM_RATIO,
YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO);
Preconditions.checkArgument(vmemRatio  0.99f,
YarnConfiguration.NM_VMEM_PMEM_RATIO +
 should be at least 1.0);
this.maxVmemAllottedForContainers =
  (long)(vmemRatio * maxPmemAllottedForContainers);
{code}

For virtual memory monitoring to be disabled, maxVmemAllottedForContainers has 
to be -1. For that to be -1, given the above buggy computation, vmemRatio must 
be -1 or maxPmemAllottedForContainers must be -1.

If vmemRatio were -1, we fail the precondition check and exit.
If maxPmemAllottedForContainers, we also end up disabling physical memory 
monitoring.

Or perhaps that makes sense - to disable both physical and virtual memory 
monitoring, but that way your NM becomes infinite in resource grants, I think.

We need a way to selectively disable kills done via virtual memory monitoring, 
which is the base request here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-149) ZK-based High Availability (HA) for ResourceManager (RM)

2012-10-09 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J moved MAPREDUCE-4345 to YARN-149:
-

Issue Type: New Feature  (was: Improvement)
   Key: YARN-149  (was: MAPREDUCE-4345)
   Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 ZK-based High Availability (HA) for ResourceManager (RM)
 

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha

 One of the goals presented on MAPREDUCE-279 was to have high availability. 
 One way that was discussed, per Mahadev/others on 
 https://issues.apache.org/jira/browse/MAPREDUCE-2648 and other places, was ZK:
 {quote}
 Am not sure, if you already know about the MR-279 branch (the next version of 
 MR framework). We've been trying to integrate ZK into the framework from the 
 beginning. As for now, we are just doing restart with ZK but soon we should 
 have a HA soln with ZK.
 {quote}
 There is now MAPREDUCE-4343 that tracks recoverability via ZK. This JIRA is 
 meant to track HA via ZK.
 Currently there isn't a HA solution for RM, via ZK or otherwise.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-138) Improve default config values for YARN

2012-09-29 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466357#comment-13466357
 ] 

Harsh J commented on YARN-138:
--

Thanks, Sid!

 Improve default config values for YARN
 --

 Key: YARN-138
 URL: https://issues.apache.org/jira/browse/YARN-138
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.0-alpha
Reporter: Arun C Murthy
Assignee: Harsh J
  Labels: performance
 Attachments: MAPREDUCE-4316.patch, YARN138.txt


 Currently some of our configs are way off e.g. min-alloc is 128M while 
 max-alloc is 10240.
 This leads to poor out-of-box performance as noticed by some users: 
 http://s.apache.org/avd

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-130) Yarn examples use wrong configuration

2012-09-25 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462920#comment-13462920
]

Harsh J commented on YARN-130:
--

This is interesting. In trunk today, the HDFS clients do not require
instantiating a HdfsConfiguration instance manually. They are auto-loaded by
the classes that get loaded for HDFS FS. Similar should be done by YARN, given
we use YARN client classes to interact with YARN anyway?

Regarding the error message improvements, can you file a new JIRA with what you
get and what to expect rather?

Yarn examples use wrong configuration
-

Key: YARN-130
URL: https://issues.apache.org/jira/browse/YARN-130
Project: Hadoop YARN
Issue Type: Bug
Components: applications
Affects Versions: 2.0.3-alpha
Reporter: Erich Schubert
Priority: Minor

AFAICT the example applications are broken when you don't use default ports.
So it probably won't show in a single node setup.
The bug fix seems to be:
-conf = new Configuration();
+conf = new YarnConfiguration();
Then the yarn settings file (containing relevant host and port information)
will also be read.
The error messages *need* to be improved. For me, they said something like
protocol not supported. The reason was that a different hadoop RPC was
running on the port it was connecting to. It took me a lot of debugging to
find out that it was just talking to the wrong service because it had not
read it's configuration file...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-116) Add exclude/include file , need restart NN or RM.

2012-09-23 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-116:
-

Environment: (was: suse)

 Add exclude/include file , need restart NN or RM.
 -

 Key: YARN-116
 URL: https://issues.apache.org/jira/browse/YARN-116
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
Reporter: xieguiming
 Attachments: HADOOP-835-0.patch, HADOOP-835-1.patch, HADOOP-835.patch


 yarn.resourcemanager.nodes.include-path default value is , if we need add 
 one include file. and we must restart the RM. 
 I suggest that adding one include or exclude file, no need restart the RM. 
 only execute the refresh command.
 NN is the same.
 Modify the HostsFileReader class:
 public HostsFileReader(String inFile, 
  String exFile)
 to
  public HostsFileReader(Configuration conf, 
  String NODES_INCLUDE_FILE_PATH,String 
 DEFAULT_NODES_INCLUDE_FILE_PATH,
 String NODES_EXCLUDE_FILE_PATH,String 
 DEFAULT_NODES_EXCLUDE_FILE_PATH)
 and thus, we can read the config file dynamic. and no need to restart the 
 NM/NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-116) RM is missing ability to add include/exclude files without a restart

2012-09-23 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-116:
-

Summary: RM is missing ability to add include/exclude files without a 
restart  (was: Add exclude/include file , need restart NN or RM.)

 RM is missing ability to add include/exclude files without a restart
 

 Key: YARN-116
 URL: https://issues.apache.org/jira/browse/YARN-116
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.0-alpha
Reporter: xieguiming
 Attachments: HADOOP-835-0.patch, HADOOP-835-1.patch, HADOOP-835.patch


 yarn.resourcemanager.nodes.include-path default value is , if we need add 
 one include file. and we must restart the RM. 
 I suggest that adding one include or exclude file, no need restart the RM. 
 only execute the refresh command.
 NN is the same.
 Modify the HostsFileReader class:
 public HostsFileReader(String inFile, 
  String exFile)
 to
  public HostsFileReader(Configuration conf, 
  String NODES_INCLUDE_FILE_PATH,String 
 DEFAULT_NODES_INCLUDE_FILE_PATH,
 String NODES_EXCLUDE_FILE_PATH,String 
 DEFAULT_NODES_EXCLUDE_FILE_PATH)
 and thus, we can read the config file dynamic. and no need to restart the 
 NM/NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-97) nodemanager depends on /bin/bash

2012-09-23 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461472#comment-13461472
]

Harsh J commented on YARN-97:
-

bq. It should be well documented for system not having bash installed by
default such as FreeBSD.

Why don't we simply document requirements then?

I've recently seen /bin/sh shbanged scripts cause trouble on Ubuntu cause
/bin/sh points to Ubuntu's dash (https://wiki.ubuntu.com/DashAsBinSh). You
don't wanna run into such a trouble and end up changing things (hadoop or OS
side) post-deploy.

I'll still vote we stick to one shell (bash) and be clear we need it.

nodemanager depends on /bin/bash

Key: YARN-97
URL: https://issues.apache.org/jira/browse/YARN-97
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Environment: FreeBSD 8.2 / 64 bit
Reporter: Radim Kolar
Labels: patch
Attachments: bash-replace-by-sh.txt

Currently nodemanager depends on bash shell. It should be well documented for
system not having bash installed by default such as FreeBSD. Because only
basic functionality of bash is used, probably changing bash to /bin/sh would
work enough.
i found 2 cases:
1. DefaultContainerExecutor.java creates file with /bin/bash hardcoded in
writeLocalWrapperScript. (this needs bash in /bin)
2. yarn-hduser-nodemanager-ponto.amerinoc.com.log:2012-04-03 19:50:10,798
INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
launchContainer: [bash, -c,
/tmp/nm-local-dir/usercache/hduser/appcache/application_1333474251533_0002/container_1333474251533_0002_01_12/default_container_executor.sh]
this created script is also launched by bash - bash anywhere in path works -
in freebsd it is /usr/local/bin/bash

[jira] [Commented] (YARN-97) nodemanager depends on /bin/bash

2012-09-23 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461473#comment-13461473
]

Harsh J commented on YARN-97:
-

bq. Why don't we simply document requirements then?

We can additionally be clear that we demand bash exists in /bin if thats the
whole trouble here? Or rely on {{env bash}}, but no idea if thats cross
platform properly as well.

nodemanager depends on /bin/bash

[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2012-09-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-101:
-

Description: 
see the red color:

org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java

 protected void startStatusUpdater() {

new Thread(Node Status Updater) {
  @Override
  @SuppressWarnings(unchecked)
  public void run() {
int lastHeartBeatID = 0;
while (!isStopped) {
  // Send heartbeat
  try {
synchronized (heartbeatMonitor) {
  heartbeatMonitor.wait(heartBeatInterval);
}
{color:red} 
// Before we send the heartbeat, we get the NodeStatus,
// whose method removes completed containers.
NodeStatus nodeStatus = getNodeStatus();
 {color}
nodeStatus.setResponseId(lastHeartBeatID);

NodeHeartbeatRequest request = recordFactory
.newRecordInstance(NodeHeartbeatRequest.class);
request.setNodeStatus(nodeStatus);   
{color:red} 

   // But if the nodeHeartbeat fails, we've already removed the 
containers away to know about it. We aren't handling a nodeHeartbeat failure 
case here.
HeartbeatResponse response =
  resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
   {color} 

if (response.getNodeAction() == NodeAction.SHUTDOWN) {
  LOG
  .info(Recieved SHUTDOWN signal from Resourcemanager as part 
of heartbeat, +
 hence shutting down.);
  NodeStatusUpdaterImpl.this.stop();
  break;
}
if (response.getNodeAction() == NodeAction.REBOOT) {
  LOG.info(Node is out of sync with ResourceManager,
  +  hence rebooting.);
  NodeStatusUpdaterImpl.this.reboot();
  break;
}

lastHeartBeatID = response.getResponseId();
ListContainerId containersToCleanup = response
.getContainersToCleanupList();
if (containersToCleanup.size() != 0) {
  dispatcher.getEventHandler().handle(
  new CMgrCompletedContainersEvent(containersToCleanup));
}
ListApplicationId appsToCleanup =
response.getApplicationsToCleanupList();
//Only start tracking for keepAlive on FINISH_APP
trackAppsForKeepAlive(appsToCleanup);
if (appsToCleanup.size() != 0) {
  dispatcher.getEventHandler().handle(
  new CMgrCompletedAppsEvent(appsToCleanup));
}
  } catch (Throwable e) {
// TODO Better error handling. Thread can die with the rest of the
// NM still running.
LOG.error(Caught exception in status-updater, e);
  }
}
  }
}.start();
  }



  private NodeStatus getNodeStatus() {

NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
nodeStatus.setNodeId(this.nodeId);

int numActiveContainers = 0;
ListContainerStatus containersStatuses = new ArrayListContainerStatus();
for (IteratorEntryContainerId, Container i =
this.context.getContainers().entrySet().iterator(); i.hasNext();) {
  EntryContainerId, Container e = i.next();
  ContainerId containerId = e.getKey();
  Container container = e.getValue();

  // Clone the container to send it to the RM
  org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
  container.cloneAndGetContainerStatus();
  containersStatuses.add(containerStatus);
  ++numActiveContainers;
  LOG.info(Sending out status for container:  + containerStatus);
  {color:red} 

  // Here is the part that removes the completed containers.
  if (containerStatus.getState() == ContainerState.COMPLETE) {
// Remove
i.remove();
  {color} 

LOG.info(Removed completed container  + containerId);
  }
}
nodeStatus.setContainersStatuses(containersStatuses);

LOG.debug(this.nodeId +  sending out status for 
+ numActiveContainers +  containers);

NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus();
nodeHealthStatus.setHealthReport(healthChecker.getHealthReport());
nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy());
nodeHealthStatus.setLastHealthReportTime(
healthChecker.getLastHealthReportTime());
if (LOG.isDebugEnabled()) {
  LOG.debug(Node's health-status :  + nodeHealthStatus.getIsNodeHealthy()
+ ,  + nodeHealthStatus.getHealthReport());
}
nodeStatus.setNodeHealthStatus(nodeHealthStatus);

ListApplicationId keepAliveAppIds = createKeepAliveApplicationList();
nodeStatus.setKeepAliveApplications(keepAliveAppIds);

[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2012-09-22 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461315#comment-13461315
 ] 

Harsh J commented on YARN-101:
--

[~xieguiming] - I tweaked the sentences a bit so you're sounding more clear. 
You're essentially saying that we may be removing completed containers 
completely, which in case of a node-heartbeat failure, we should make sure to 
propagate eventually again (on next successful heartbeat), correct?

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Priority: Minor

 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 }
   }
 }.start();
   }
   private NodeStatus getNodeStatus() {
 NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
 nodeStatus.setNodeId(this.nodeId);
 int numActiveContainers = 0;
 ListContainerStatus containersStatuses = new 
 ArrayListContainerStatus();
 for (IteratorEntryContainerId, Container i =
 this.context.getContainers().entrySet().iterator(); i.hasNext();) {
   EntryContainerId, Container e = i.next();
   ContainerId containerId = e.getKey();
   Container container = e.getValue();
   // Clone the container to send it to the RM
   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
   container.cloneAndGetContainerStatus();
   containersStatuses.add(containerStatus);
   ++numActiveContainers;
   LOG.info(Sending out status for container:  + containerStatus);
   {color:red} 
   // Here is the part that removes the completed containers.
   if (containerStatus.getState() == ContainerState.COMPLETE) {
 // Remove
 i.remove();
   {color}

[jira] [Commented] (YARN-56) Handle container requests that request more resources than available in the cluster

2012-09-22 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/YARN-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461316#comment-13461316
]

Harsh J commented on YARN-56:
-

bq. Handle container requests that request more resources than available in the
cluster

Won't this be better as a summary if it read Handle container requests that
request more resources than _presently_ available in the cluster? Since
there's another case where requests maximum allowed requests itself needs to
be first capped, so that scheduling may occur.

Handle container requests that request more resources than available in the
cluster
---

Key: YARN-56
URL: https://issues.apache.org/jira/browse/YARN-56
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Hitesh Shah

In heterogenous clusters, a simple check at the scheduler to check if the
allocation request is within the max allocatable range is not enough.
If there are large nodes in the cluster which are not available, there may be
situations where some allocation requests will never be fulfilled. Need an
approach to decide when to invalidate such requests. For application
submissions, there will need to be a feedback loop for applications that
could not be launched. For running AMs, AllocationResponse may need to
augmented with information for invalidated/cancelled container requests.

[jira] [Commented] (YARN-56) Handle container requests that request more resources than available in the cluster

2012-09-22 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461317#comment-13461317
 ] 

Harsh J commented on YARN-56:
-

+1 on Robert's timeout suggestion though (per app, with a reasonable default).

 Handle container requests that request more resources than available in the 
 cluster
 ---

 Key: YARN-56
 URL: https://issues.apache.org/jira/browse/YARN-56
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Hitesh Shah

 In heterogenous clusters, a simple check at the scheduler to check if the 
 allocation request is within the max allocatable range is not enough. 
 If there are large nodes in the cluster which are not available, there may be 
 situations where some allocation requests will never be fulfilled. Need an 
 approach to decide when to invalidate such requests. For application 
 submissions, there will need to be a feedback loop for applications that 
 could not be launched. For running AMs, AllocationResponse may need to 
 augmented with information for invalidated/cancelled container requests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2012-09-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-71:


Issue Type: Test  (was: Bug)

 Ensure/confirm that the NodeManager cleanup their local filesystem when they 
 restart
 

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli

 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2012-09-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-71:


Labels:   (was: test)

 Ensure/confirm that the NodeManager cleanup their local filesystem when they 
 restart
 

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli

 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2012-09-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated YARN-71:


Labels: test  (was: )

 Ensure/confirm that the NodeManager cleanup their local filesystem when they 
 restart
 

 Key: YARN-71
 URL: https://issues.apache.org/jira/browse/YARN-71
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Vinod Kumar Vavilapalli

 We have to make sure that NodeManagers cleanup their local files on restart.
 It may already be working like that in which case we should have tests 
 validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers

2012-09-19 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459016#comment-13459016
 ] 

Harsh J commented on YARN-111:
--

Robert,

I still see Job priority exist in MR1 (1.x). Which JIRA removed this, per your 
comment above? Or is this something CapacityScheduler specific we're discussing?

In YARN I see Priority coming in for generally all resource requests (which I 
assume does apply to the AM too) and hence 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setPriority(org.apache.hadoop.yarn.api.records.Priority)
 ought to work, as the CS's LeafQueue does look at it?

 Application level priority in Resource Manager Schedulers
 -

 Key: YARN-111
 URL: https://issues.apache.org/jira/browse/YARN-111
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.1-alpha
Reporter: nemon lou

 We need application level priority for Hadoop 2.0,both in FIFO scheduler and 
 Capacity Scheduler.
 In Hadoop 1.0.x,job priority is supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-80) Support delay scheduling for node locality in MR2's capacity scheduler

2012-09-09 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451772#comment-13451772
 ] 

Harsh J commented on YARN-80:
-

Hi Arun,

Thanks very much for doing this! We could probably address this in a new JIRA 
but I had two questions:

- Why was the feature decided to be disabled by default?
- Is there no way to not have people change configuration based on their # of 
racks (i.e. make it automated)?

 Support delay scheduling for node locality in MR2's capacity scheduler
 --

 Key: YARN-80
 URL: https://issues.apache.org/jira/browse/YARN-80
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Todd Lipcon
Assignee: Arun C Murthy
 Fix For: 2.0.2-alpha

 Attachments: YARN-80.patch, YARN-80.patch


 The capacity scheduler in MR2 doesn't support delay scheduling for achieving 
 node-level locality. So, jobs exhibit poor data locality even if they have 
 good rack locality. Especially on clusters where disk throughput is much 
 better than network capacity, this hurts overall job performance. We should 
 optionally support node-level delay scheduling heuristics similar to what the 
 fair scheduler implements in MR1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

75 matches

Mail list logo