[jira] [Assigned] (YARN-511) ConverterUtils's getPathFromYarnURL and getYarnUrlFromPath work with fully qualified paths alone but don't state or check that
[ https://issues.apache.org/jira/browse/YARN-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-511: Assignee: (was: Harsh J) > ConverterUtils's getPathFromYarnURL and getYarnUrlFromPath work with fully > qualified paths alone but don't state or check that > -- > > Key: YARN-511 > URL: https://issues.apache.org/jira/browse/YARN-511 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Priority: Major > > See thread: http://search-hadoop.com/m/IFGhp1C1o4j > Aside: Naming discrepancy here: getPathFromYarnURL and getYarnUrlFromPath > should have consistent URL capitalization. Generally, Url is preferred to fit > the camel case and other classnames around Hadoop these days. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-349) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/YARN-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-349: Assignee: (was: Harsh J) > Send out last-minute load averages in TaskTrackerStatus > --- > > Key: YARN-349 > URL: https://issues.apache.org/jira/browse/YARN-349 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.0.0-alpha >Reporter: Harsh J > Attachments: mapreduce.loadaverage.r3.diff, > mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff, > mapreduce.loadaverage.r6.diff > > Original Estimate: 20m > Remaining Estimate: 20m > > Load averages could be useful in scheduling. This patch looks to extend the > existing Linux resource plugin (via /proc/loadavg file) to allow transmitting > load averages of the last one minute via the TaskTrackerStatus. > Patch is up for review, with test cases added, at: > https://reviews.apache.org/r/20/ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage
[ https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180718#comment-15180718 ] Harsh J commented on YARN-4767: --- One add-on note that we can likely also address with this one: The AmIpFilter resolves the proxy addresses to host addresses (getAllByName, getHostAddress) every single time a request is made to it, vs. caching it upfront. I think we should not try to resolve it on-request unless we have errors, cause the proxy address list does not usually change over time on an already running AM? > Network issues can cause persistent RM UI outage > > > Key: YARN-4767 > URL: https://issues.apache.org/jira/browse/YARN-4767 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.9.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > > If a network issue causes an AM web app to resolve the RM proxy's address to > something other than what's listed in the allowed proxies list, the > AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy. > The RM proxy will then consume all available handler threads connecting to > itself over and over, resulting in an outage of the web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4263) Capacity scheduler 60%-40% formatting floating point issue
[ https://issues.apache.org/jira/browse/YARN-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966500#comment-14966500 ] Harsh J commented on YARN-4263: --- Thank you for the fix and tests! Some comments: - Could you remove the whitespaces in the test addition? Also, did you check if the tests fail reliably without the change added along, just to eliminate away any format changes? - Lets switch to using {{org.apache.hadoop.util.StringUtils.formatPercent}} method instead of adding a duplicate inside YARN. - Am also wondering if we should do a single decimal place instead of two, just to be compatible with the usual 40.0/60.0/0.0/100.0 outputs. > Capacity scheduler 60%-40% formatting floating point issue > -- > > Key: YARN-4263 > URL: https://issues.apache.org/jira/browse/YARN-4263 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.7.1 >Reporter: Adrian Kalaszi >Priority: Trivial > Labels: easyfix > Attachments: YARN-4263.001.patch > > > If capacity scheduler is set with two queues to 60% and 40% capacity, due to > a java float floating representation issue > {code} > > hadoop queue -list > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: > 0.0 > == > Queue Name : large > Queue State : running > Scheduling Info : Capacity: 60.04, MaximumCapacity: 100.0, > CurrentCapacity: 0.0 > {code} > Because > {code} System.err.println((0.6f) * 100); {code} > results in 60.04. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
[ https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942274#comment-14942274 ] Harsh J commented on YARN-4222: --- Failed tests aren't related. Thanks for the changes! +1, committing shortly. Quick notes: - Please do not set a Fix Version. Use Target Version field instead. The Fix Version must indicate only the branches where it has *already* been committed to. The former is to indicate requests of branches it must go to, so is more appropriate. - For more typo corrections in future, please also feel free to roll up multiple corrections into the same patch. > Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common > > > Key: YARN-4222 > URL: https://issues.apache.org/jira/browse/YARN-4222 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Attachments: YARN-4222.001.patch > > > Spotted this typo in the code while working on a separate YARN issue. > E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES > Checked in the whole project. Found a few occurrences of the typo in > code/comment. > The JIRA is meant to help fix those typos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
[ https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-4222: -- Fix Version/s: (was: 2.8.0) > Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common > > > Key: YARN-4222 > URL: https://issues.apache.org/jira/browse/YARN-4222 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Attachments: YARN-4222.001.patch > > > Spotted this typo in the code while working on a separate YARN issue. > E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES > Checked in the whole project. Found a few occurrences of the typo in > code/comment. > The JIRA is meant to help fix those typos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
[ https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-4222: -- Target Version/s: 2.8.0 > Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common > > > Key: YARN-4222 > URL: https://issues.apache.org/jira/browse/YARN-4222 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Attachments: YARN-4222.001.patch > > > Spotted this typo in the code while working on a separate YARN issue. > E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES > Checked in the whole project. Found a few occurrences of the typo in > code/comment. > The JIRA is meant to help fix those typos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4065) container-executor error should include effective user id
[ https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-4065: - Assignee: Casey Brotherton container-executor error should include effective user id - Key: YARN-4065 URL: https://issues.apache.org/jira/browse/YARN-4065 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Casey Brotherton Assignee: Casey Brotherton Priority: Trivial When container-executor fails to access it's config file, the following message will be thrown: {code} org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 24 ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf/container-executor.cfg {code} The real problem may be a change in the container-executor not running as set uid root. From: https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html {quote} The container-executor program must be owned by root and have the permission set ---sr-s---. {quote} The error message could be improved by printing out the effective user id with the error message, and possibly the executable trying to access the config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4065) container-executor error should include effective user id
[ https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708940#comment-14708940 ] Harsh J commented on YARN-4065: --- Agreed - and figuring this has wasted a few mins at another customer I worked with last week. This would be a welcome change - would you be willing to submit a patch adding the context to the error message? container-executor error should include effective user id - Key: YARN-4065 URL: https://issues.apache.org/jira/browse/YARN-4065 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Casey Brotherton Priority: Trivial When container-executor fails to access it's config file, the following message will be thrown: {code} org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 24 ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf/container-executor.cfg {code} The real problem may be a change in the container-executor not running as set uid root. From: https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html {quote} The container-executor program must be owned by root and have the permission set ---sr-s---. {quote} The error message could be improved by printing out the effective user id with the error message, and possibly the executable trying to access the config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-3462: -- Target Version/s: 2.8.0, 2.7.1 Hadoop Flags: Reviewed Thanks [~Naganarasimha], lgtm, +1. Committing shortly. Patches applied for YARN-2424 are inconsistent between trunk and branch-2 - Key: YARN-3462 URL: https://issues.apache.org/jira/browse/YARN-3462 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Sidharta Seethana Assignee: Naganarasimha G R Attachments: YARN-3462.20150508-1.patch It looks like the changes for YARN-2424 are not the same for trunk (commit 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482505#comment-14482505 ] Harsh J commented on YARN-2424: --- [~sidharta-s] - Yes, it appears the warning was skipped in the branch-2 patch, likely by accident. Thanks for spotting this! Could you file a new YARN JIRA to port the warning back into branch-2? LCE should support non-cgroups, non-secure mode --- Key: YARN-2424 URL: https://issues.apache.org/jira/browse/YARN-2424 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Fix For: 2.6.0 Attachments: Y2424-1.patch, YARN-2424.patch After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. This is a fairly serious regression, as turning on LCE prior to turning on full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377378#comment-14377378 ] Harsh J commented on YARN-1880: --- +1, this still applies. Committing shortly, thanks [~ozawa] (and [~ajisakaa] for the earlier review)! Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Component/s: test Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1880) Cleanup TestApplicationClientProtocolOnHA
[ https://issues.apache.org/jira/browse/YARN-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-1880: -- Affects Version/s: 2.6.0 Cleanup TestApplicationClientProtocolOnHA - Key: YARN-1880 URL: https://issues.apache.org/jira/browse/YARN-1880 Project: Hadoop YARN Issue Type: Test Components: test Affects Versions: 2.6.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Trivial Fix For: 2.8.0 Attachments: YARN-1880.1.patch The tests introduced on YARN-1521 includes multiple assertion with . We should separate them because it's difficult to identify which condition is illegal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3376) [MR-279] NM UI should get a read-only view instead of the actual NMContext
[ https://issues.apache.org/jira/browse/YARN-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-2745 to YARN-3376: -- Component/s: (was: mrv2) nodemanager Affects Version/s: (was: 0.23.0) 2.6.0 Key: YARN-3376 (was: MAPREDUCE-2745) Project: Hadoop YARN (was: Hadoop Map/Reduce) [MR-279] NM UI should get a read-only view instead of the actual NMContext --- Key: YARN-3376 URL: https://issues.apache.org/jira/browse/YARN-3376 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Vinod Kumar Vavilapalli Assignee: Anupam Seth Priority: Trivial Labels: newbie Attachments: MAPREDUCE-2745-branch-0_23.patch, MAPREDUCE-2745-branch-0_23_v2.patch NMContext is modifiable, the UI should only get read-only access. Just like the AM web-ui. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313593#comment-14313593 ] Harsh J commented on YARN-3021: --- Thanks again [~vinodkv] and [~yzhangal], bq. bq. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. bq. I think this will work, and is a preferable solution to me. What do others think? I'd be willing to accept that approach, but for one small worry: Any app sending in a token with a bad renewer set could get through with such a change, whereas previously it'd be rejected outright. Not that it'd be harmful (as it is ignored), but it could still be seen as a behaviour change, no? The current patch OTOH, is explicit in demanding a config/flag to be set for direct awareness of such a thing. That sounds more cleaner to me to do. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-3021: -- Summary: YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp (was: YARN's delegation-token handling disallows certain trust setups to operate properly) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310570#comment-14310570 ] Harsh J commented on YARN-3021: --- [~vinodkv], Many thanks for the response here! bq. Though the patch unblocks the jobs in the short term, it seems like long term this is still bad. I agree in that it does not resolve the problem. The goal we're seeking is also short-term, in that of bringing back a behaviour that got allowed on MR1, in MR2 - even though both end up facing the same issue. The longer term approach sounds like the most optimal thing to do for proper resolution, but given some users are getting blocked by this behaviour change I'd like to know if there'll be any objections in adding the current approach as an interim-fix (the doc for the property does/will claim it disables several necessary features of the job), and file subsequent JIRAs for implementing the standalone renewer? bq. Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck. In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely? YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298513#comment-14298513 ] Harsh J commented on YARN-3021: --- Overall the patch looks fine to me, but please do hold up for [~vinodkv] or another YARN active committer to take a look. Could you conceive a test case for this as well, to catch regressions in behaviour in future? For example it could be done by adding an invalid token with the app, but with this option turned on. With the option turned off, such a thing will always fail and app gets rejected, but with the fix in proper behaviour it will pass through the submit procedure at least. Checkout the test-case modified in the earlier patch for a reusable reference. Also, could you document the added MR config in mapred-default.xml, describing its use and marking it also as advanced, as it disables some features of a regular resilient application such as token reuse and renewals. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-3021: -- Attachment: YARN-3021.patch A patch that illustrates the change. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
Harsh J created YARN-3021: - Summary: YARN's delegation-token handling disallows certain trust setups to operate properly Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2999) Compilation error in AllocationConfiguration.java in java1.7 while running tests
[ https://issues.apache.org/jira/browse/YARN-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2999: -- Labels: jdk7 (was: ) Compilation error in AllocationConfiguration.java in java1.7 while running tests Key: YARN-2999 URL: https://issues.apache.org/jira/browse/YARN-2999 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Labels: jdk7 Attachments: 0001-YARN-2999.patch In AllocationConfiguration, in the below object creation, generic type must be specified as instance variable,otherwise java1.7 lead compilation error while running tests for RM and NM {{reservableQueues = new HashSet();}} Report : {code} java.lang.Error: Unresolved compilation problem: '' operator is not allowed for source level below 1.7 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfiguration.init(AllocationConfiguration.java:150) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1276) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1320) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:559) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:985) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart$TestSecurityMockRM.init(TestRMRestart.java:2027) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.init(MockRM.java:108) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart$TestSecurityMockRM.init(TestRMRestart.java:2020) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testAppAttemptTokensRestoredOnRMRestart(TestRMRestart.java:1199) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2950: -- Attachment: YARN-2950-2.patch Thanks Dustin! Looks good to me. I've gone ahead and made a small change to keep the line lengths less than 80 characters as per formatting requirements. Committing in a bit. Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Dustin Cote Priority: Minor Labels: newbie Attachments: YARN-2950-1.patch, YARN-2950-2.patch Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242829#comment-14242829 ] Harsh J commented on YARN-2950: --- File of message is {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java}} Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
Harsh J created YARN-2950: - Summary: Change message to mandate, not suggest JS requirement on UI Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2950: -- Labels: newbie (was: ) Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Reporter: Harsh J Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2950) Change message to mandate, not suggest JS requirement on UI
[ https://issues.apache.org/jira/browse/YARN-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2950: -- Affects Version/s: 2.5.0 Change message to mandate, not suggest JS requirement on UI --- Key: YARN-2950 URL: https://issues.apache.org/jira/browse/YARN-2950 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 2.5.0 Reporter: Harsh J Priority: Minor Labels: newbie Most of YARN's UIs do not work with JavaScript disabled on the browser, cause they appear to send back data as JS arrays instead of within the actual HTML content. The JQueryUI prints only a mild warning about this suggesting that {{This page works best with javascript enabled.}}, when in fact it ought to be {{This page will not function without javascript enabled. Please enable javascript on your browser.}} or something as such (more direct). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2891) Failed Container Executor does not provide a clear error message
[ https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2891: -- Hadoop Flags: Reviewed Failed Container Executor does not provide a clear error message Key: YARN-2891 URL: https://issues.apache.org/jira/browse/YARN-2891 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.5.1 Environment: any Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor Attachments: YARN-2891-1.patch When checking access to directories, the container executor does not provide clear information on which directory actually could not be accessed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2891) Failed Container Executor does not provide a clear error message
[ https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-2891: - Assignee: Dustin Cote Assigning to Dustin as he mentioned offline that he'd like to contribute on this. [~rohithsharma] - The clarity issue is within the LinuxContainerExecutor (C++) code. Failed Container Executor does not provide a clear error message Key: YARN-2891 URL: https://issues.apache.org/jira/browse/YARN-2891 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.5.1 Environment: any Reporter: Dustin Cote Assignee: Dustin Cote Priority: Minor When checking access to directories, the container executor does not provide clear information on which directory actually could not be accessed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2578) NM does not failover timely if RM node network connection fails
[ https://issues.apache.org/jira/browse/YARN-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214806#comment-14214806 ] Harsh J commented on YARN-2578: --- bq. We never implemented health monitoring like in ZKFC with HDFS Was this not desired for some reason, or just punted in the early implementation? Seems worthy to always have such a thing. NM does not failover timely if RM node network connection fails --- Key: YARN-2578 URL: https://issues.apache.org/jira/browse/YARN-2578 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wilfred Spiegelenburg Attachments: YARN-2578.patch The NM does not fail over correctly when the network cable of the RM is unplugged or the failure is simulated by a service network stop or a firewall that drops all traffic on the node. The RM fails over to the standby node when the failure is detected as expected. The NM should than re-register with the new active RM. This re-register takes a long time (15 minutes or more). Until then the cluster has no nodes for processing and applications are stuck. Reproduction test case which can be used in any environment: - create a cluster with 3 nodes node 1: ZK, NN, JN, ZKFC, DN, RM, NM node 2: ZK, NN, JN, ZKFC, DN, RM, NM node 3: ZK, JN, DN, NM - start all services make sure they are in good health - kill the network connection of the RM that is active using one of the network kills from above - observe the NN and RM failover - the DN's fail over to the new active NN - the NM does not recover for a long time - the logs show a long delay and traces show no change at all The stack traces of the NM all show the same set of threads. The main thread which should be used in the re-register is the Node Status Updater This thread is stuck in: {code} Node Status Updater prio=10 tid=0x7f5a6cc99800 nid=0x18d0 in Object.wait() [0x7f5a51fc1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xed62f488 (a org.apache.hadoop.ipc.Client$Call) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.ipc.Client.call(Client.java:1395) - locked 0xed62f488 (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy26.nodeHeartbeat(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) {code} The client connection which goes through the proxy can be traced back to the ResourceTrackerPBClientImpl. The generated proxy does not time out and we should be using a version which takes the RPC timeout (from the configuration) as a parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2760: -- Attachment: YARN-2760.patch Re-uploading patch to retry after the patching issue was fixed in buildbot. Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
Harsh J created YARN-2760: - Summary: Completely remove word 'experimental' from FairScheduler docs Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2760: -- Attachment: YARN-2760.patch Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186325#comment-14186325 ] Harsh J commented on YARN-2760: --- Patch can certainly be applied. Script or build box is having issues: {code} YARN-2760 patch is being downloaded at Tue Oct 28 03:11:10 UTC 2014 from http://issues.apache.org/jira/secure/attachment/12677508/YARN-2760.patch cp: cannot stat '/home/jenkins/buildSupport/lib/*': No such file or directory Error: Patch dryrun couldn't detect changes the patch would make. Exiting. PATCH APPLICATION FAILED {code} Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-281: Assignee: Wangda Tan (was: Harsh J) Sorry on delay, reassigned. Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'
[ https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reassigned YARN-1918: - Assignee: Anandha L Ranganathan Typo in description and error message for 'yarn.resourcemanager.cluster-id' --- Key: YARN-1918 URL: https://issues.apache.org/jira/browse/YARN-1918 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Devaraj K Assignee: Anandha L Ranganathan Priority: Trivial Labels: newbie 1. In yarn-default.xml {code:xml} property descriptionName of the cluster. In a HA setting, this is used to ensure the RM participates in leader election fo this cluster and ensures it does not affect other clusters/description nameyarn.resourcemanager.cluster-id/name !--valueyarn-cluster/value-- /property {code} Here the line 'election fo this cluster and ensures it does not affect' should be replaced with 'election for this cluster and ensures it does not affect'. 2. {code:xml} org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't specifyyarn.resourcemanager.cluster-id at org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336) {code} In the above exception message, it is missing a space between message and configuration name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1487) How to develop with Eclipse
[ https://issues.apache.org/jira/browse/YARN-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved YARN-1487. --- Resolution: Invalid The plugin effort has moved out of Apache Hadoop into its own Apache (incubator) project called Hadoop Developer Tools (HDT), which you can visit and ask further questions at, http://hdt.incubator.apache.org. In future, please do not open JIRAs to ask general questions. Please post them to the u...@hadoop.apache.org mailing lists instead. The JIRA instance exists for the project developers and contributors to use for tracking validated bugs, features and enhancements, not for serving the user community. How to develop with Eclipse --- Key: YARN-1487 URL: https://issues.apache.org/jira/browse/YARN-1487 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 2.2.0 Environment: Linux,Hadoop2 Reporter: Yang Hao Labels: eclipse, plugin, yarn Fix For: 2.2.0 We can develop an application on Eclipse, but the Eclipse plugin is not provided on Hadoop2. Will the new version provide Eclipse plugin for developers? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (YARN-1486) How to develop an application with Eclipse
[ https://issues.apache.org/jira/browse/YARN-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved YARN-1486. --- Resolution: Invalid Resolving as Invalid. Please see my comment on YARN-1487 on why. How to develop an application with Eclipse -- Key: YARN-1486 URL: https://issues.apache.org/jira/browse/YARN-1486 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 2.2.0 Reporter: Yang Hao Fix For: trunk-win -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1200) Provide a central view for rack topologies
[ https://issues.apache.org/jira/browse/YARN-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768195#comment-13768195 ] Harsh J commented on YARN-1200: --- A reasonable regression-fixing first step is to match that of the HDFS functionality: Each NameNode (and NameNode alone) needs the rack resolution script, not all the DNs. Provide a central view for rack topologies -- Key: YARN-1200 URL: https://issues.apache.org/jira/browse/YARN-1200 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Harsh J It appears that with YARN, any AM (such as the MRv2 AM) that tries to do rack-info-based work, will need to resolve racks locally rather than get rack info from YARN directly: https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054 and its use of a simple implementation of https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java This is a regression, as we've traditionally only had users maintain rack mappings and its associated script on a single master role node (JobTracker), not at every compute node. Task spawning hosts have never done/needed rack resolution of their own. It is silly to have to maintain rack configs and their changes on all nodes. We should have the RM host a stable interface service so that there's only a single view of the topology across the cluster, and document for AMs to use that instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1200) Provide a central view for rack topologies
Harsh J created YARN-1200: - Summary: Provide a central view for rack topologies Key: YARN-1200 URL: https://issues.apache.org/jira/browse/YARN-1200 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Harsh J It appears that with YARN, any AM (such as the MRv2 AM) that tries to do rack-info-based work, will need to resolve racks locally rather than get rack info from YARN directly: https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054 and its use of a simple implementation of https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java This is a regression, as we've traditionally only had users maintain rack mappings and its associated script on a single master role node (JobTracker), not at every compute node. Task spawning hosts have never done/needed rack resolution of their own. It is silly to have to maintain rack configs and their changes on all nodes. We should have the RM host a stable interface service so that there's only a single view of the topology across the cluster, and document for AMs to use that instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-553) Have YarnClient generate a directly usable ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686972#comment-13686972 ] Harsh J commented on YARN-553: -- I'm fine with what Arun's proposed above - a single API call that does it all for you (since it has the relevant context) would be very nice for app writers. Have YarnClient generate a directly usable ApplicationSubmissionContext --- Key: YARN-553 URL: https://issues.apache.org/jira/browse/YARN-553 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.0.3-alpha Reporter: Harsh J Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-553-1.patch, yarn-553-2.patch Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse. {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationId appId = newApp.getApplicationId(); ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); {code} A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like: {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext(); {code} [The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-842) Resource Manager Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685549#comment-13685549 ] Harsh J commented on YARN-842: -- This seems relevant http://stackoverflow.com/questions/9433789/users-report-occasional-message-message-json-is-undefined. Was your IE also IE7? Resource Manager Node Manager UI's doesn't work with IE - Key: YARN-842 URL: https://issues.apache.org/jira/browse/YARN-842 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} Webpage error details User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) Timestamp: Mon, 17 Jun 2013 12:06:03 UTC Message: 'JSON' is undefined Line: 41 Char: 218 Code: 0 URI: http://10.18.40.24:8088/cluster/apps {code} RM NM UI's are not working with IE and showing the above error for every link on the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
[ https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656454#comment-13656454 ] Harsh J commented on YARN-356: -- Hey Lohit, I think we can doc these on the yarn-env.sh template we ship, to make users aware of its presence. I hadn't closed the ticket due to that non-doc factor, but wanted to note that these are already being looked-for. Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env --- Key: YARN-356 URL: https://issues.apache.org/jira/browse/YARN-356 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Lohit Vijayarenu At present it is difficult to set different Xmx values for RM and NM without having different yarn-env.sh. Like HDFS, it would be good to have YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
[ https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened YARN-356: -- Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env --- Key: YARN-356 URL: https://issues.apache.org/jira/browse/YARN-356 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Lohit Vijayarenu At present it is difficult to set different Xmx values for RM and NM without having different yarn-env.sh. Like HDFS, it would be good to have YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-20) More information for yarn.resourcemanager.webapp.address in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened YARN-20: - More information for yarn.resourcemanager.webapp.address in yarn-default.xml -- Key: YARN-20 URL: https://issues.apache.org/jira/browse/YARN-20 Project: Hadoop YARN Issue Type: Improvement Components: documentation, resourcemanager Affects Versions: 2.0.0-alpha Reporter: nemon lou Priority: Trivial Attachments: YARN-20.patch Original Estimate: 1h Remaining Estimate: 1h The parameter yarn.resourcemanager.webapp.address in yarn-default.xml is in host:port format,which is noted in the cluster set up guide (http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html). When i read though the code,i find host format is also supported. In host format,the port will be random. So we may add more documentation in yarn-default.xml for easy understood. I will submit a patch if it's helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-570) Time strings are formated in different timezone
[ https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629907#comment-13629907 ] Harsh J commented on YARN-570: -- Thanks for the report and the patch! With this patch it now renders it this way: renderHadoopDate() - Wed, 10 Apr 2013 08:29:56 GMT+05:30 format() - 10-Apr-2013 08:29:56 Which I think is still inconsistent. Ideally, I think, we'd want the former everywhere for consistency. Can you update format() as well to print in the same style, if you agree? Time strings are formated in different timezone --- Key: YARN-570 URL: https://issues.apache.org/jira/browse/YARN-570 Project: Hadoop YARN Issue Type: Bug Reporter: PengZhang Attachments: MAPREDUCE-5141.patch Time strings on different page are displayed in different timezone. If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as Wed, 10 Apr 2013 08:29:56 GMT If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 16:29:56 Same value, but different timezone. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-555) ContainerLaunchContext is buggy when it comes to setter methods on a new instance
[ https://issues.apache.org/jira/browse/YARN-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-555: - Description: If you look at the API of ContainerLaunchContext, its got setter methods, such as for setResource, setCommands, etc…: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List) However, there's certain things broken in its use here that am trying to understand. Let me explain with some code context: 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext). {code} ContainerLaunchContext appMasterLaunchContext = Records.newRecord(ContainerLaunchContext.class); appContext.setAMContainerSpec(appMasterLaunchContext); {code} 2. I create a resource request of 130 MB, as applicationMasterResource, and try to set it into the CLC via: {code} appContext.getAMContainerSpec().setResource(applicationMasterResource); {code} 3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} call. 4. So I attempt to do the same with setCommands/setEnvironment/etc., all of which fail to mutate cause the check in CLC's implementation class disregards whatever I try to set for some reason. Edit: It seems like the issue is that when I do a appContext.getAMContainerSpec().getLocalResources() or similar call to get existing initialized data structures to populate further on, what I really get underneath is a silently non-mutative data structure that I can call .put or .add on, but it won't really reflect it. was: If you look at the API of ContainerLaunchContext, its got setter methods, such as for setResource, setCommands, etc…: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List) However, there's certain things broken in its use here that am trying to understand. Let me explain with some code context: 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext). {code} ContainerLaunchContext appMasterLaunchContext = Records.newRecord(ContainerLaunchContext.class); appContext.setAMContainerSpec(appMasterLaunchContext); {code} 2. I create a resource request of 130 MB, as applicationMasterResource, and try to set it into the CLC via: {code} appContext.getAMContainerSpec().setResource(applicationMasterResource); {code} 3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} call. 4. So I attempt to do the same with setCommands/setEnvironment/etc., all of which fail to mutate cause the check in CLC's implementation class disregards whatever I try to set. This is cause of these null checks which keep passing: {code} // ContainerLaunchContextPBImpl.java @Override public void setCommands(final ListString commands) { if (commands == null) return; initCommands(); this.commands.clear(); this.commands.addAll(commands); } {code} This is rather non intuitive as a check. If I am to set something, setting it should take place. If it is null, do not return but instead set whats provided? I'm not even sure why that null check exists - it seems to do so from the start of time. However, {{setResource(…)}} works pretty fine, as the call has no such odd check: {code} @Override public void setResource(Resource resource) { maybeInitBuilder(); if (resource == null) builder.clearResource(); this.resource = resource; } {code} ContainerLaunchContext is buggy when it comes to setter methods on a new instance - Key: YARN-555 URL: https://issues.apache.org/jira/browse/YARN-555 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor If you look at the API of ContainerLaunchContext, its got setter methods, such as for setResource, setCommands, etc…: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List) However, there's certain things broken in its use here that am trying to understand. Let me explain with some code context: 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext). {code} ContainerLaunchContext appMasterLaunchContext = Records.newRecord(ContainerLaunchContext.class); appContext.setAMContainerSpec(appMasterLaunchContext); {code} 2. I create a resource request of 130 MB, as applicationMasterResource, and try to set it into the CLC via: {code} appContext.getAMContainerSpec().setResource(applicationMasterResource); {code} 3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} call. 4. So I attempt to do the same with
[jira] [Commented] (YARN-555) ContainerLaunchContext is buggy when it comes to setter methods on a new instance
[ https://issues.apache.org/jira/browse/YARN-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625236#comment-13625236 ] Harsh J commented on YARN-555: -- If I do: {code} MapString, LocalResource localResources = new HashMapString, LocalResource(); localResources.put(node-ring-app-master.jar, appMasterJarResource); appContext.getAMContainerSpec().setLocalResources(localResources); {code} Things work fine. If I instead do the more extending form: {code} MapString, LocalResource localResources = appContext.getAMContainerSpec().getLocalResources(); localResources.put(node-ring-app-master.jar, appMasterJarResource); appContext.getAMContainerSpec().setLocalResources(localResources); {code} Then the mutations don't stick. Wonder if this is somehow a Java oddity? ContainerLaunchContext is buggy when it comes to setter methods on a new instance - Key: YARN-555 URL: https://issues.apache.org/jira/browse/YARN-555 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor If you look at the API of ContainerLaunchContext, its got setter methods, such as for setResource, setCommands, etc…: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setCommands(java.util.List) However, there's certain things broken in its use here that am trying to understand. Let me explain with some code context: 1. I initialize a proper CLC for an ApplicationSubmissionContext (appContext). {code} ContainerLaunchContext appMasterLaunchContext = Records.newRecord(ContainerLaunchContext.class); appContext.setAMContainerSpec(appMasterLaunchContext); {code} 2. I create a resource request of 130 MB, as applicationMasterResource, and try to set it into the CLC via: {code} appContext.getAMContainerSpec().setResource(applicationMasterResource); {code} 3. This works OK. If I query it back now, it returns 130 for a {{getMemory()}} call. 4. So I attempt to do the same with setCommands/setEnvironment/etc., all of which fail to mutate cause the check in CLC's implementation class disregards whatever I try to set for some reason. Edit: It seems like the issue is that when I do a appContext.getAMContainerSpec().getLocalResources() or similar call to get existing initialized data structures to populate further on, what I really get underneath is a silently non-mutative data structure that I can call .put or .add on, but it won't really reflect it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-552) Expose resource metrics as part of YarnClusterMetrics
Harsh J created YARN-552: Summary: Expose resource metrics as part of YarnClusterMetrics Key: YARN-552 URL: https://issues.apache.org/jira/browse/YARN-552 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor Right now, the YarnClusterMetrics just has the total number of node managers returned in it (when queried from a Client - RM). It would be useful to also expose NodeManager resource capacities and scheduler max/min resource limits over it to allow clients to pre-determine or pre-compute runtime feasibility without having to request an Application first to get some of this information. This does not need to be an incompatible change, and we can continue exposing the same values as part of the GetNewApplicationResponse too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-553) Have GetNewApplicationResponse generate a directly usable ApplicationSubmissionContext
Harsh J created YARN-553: Summary: Have GetNewApplicationResponse generate a directly usable ApplicationSubmissionContext Key: YARN-553 URL: https://issues.apache.org/jira/browse/YARN-553 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse. {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationId appId = newApp.getApplicationId(); ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); {code} A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like: {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext(); {code} [The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-356) Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env
[ https://issues.apache.org/jira/browse/YARN-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562876#comment-13562876 ] Harsh J commented on YARN-356: -- These are already present and used (in the yarn script) but aren't doc'd in the yarn-env.sh template. Add YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS to yarn.env --- Key: YARN-356 URL: https://issues.apache.org/jira/browse/YARN-356 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Lohit Vijayarenu At present it is difficult to set different Xmx values for RM and NM without having different yarn-env.sh. Like HDFS, it would be good to have YARN_NODEMANAGER_OPTS and YARN_RESOURCEMANAGER_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-349) Send out last-minute load averages in TaskTrackerStatus
[ https://issues.apache.org/jira/browse/YARN-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-2170 to YARN-349: - Tags: (was: load average, tasktracker) Component/s: (was: jobtracker) nodemanager Fix Version/s: (was: 0.24.0) Affects Version/s: (was: 0.22.0) 2.0.0-alpha Release Note: (was: Add support for transmitting previous-minute load averages in TaskTrackerStatus) Key: YARN-349 (was: MAPREDUCE-2170) Project: Hadoop YARN (was: Hadoop Map/Reduce) Send out last-minute load averages in TaskTrackerStatus --- Key: YARN-349 URL: https://issues.apache.org/jira/browse/YARN-349 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Harsh J Attachments: mapreduce.loadaverage.r3.diff, mapreduce.loadaverage.r4.diff, mapreduce.loadaverage.r5.diff, mapreduce.loadaverage.r6.diff Original Estimate: 20m Remaining Estimate: 20m Load averages could be useful in scheduling. This patch looks to extend the existing Linux resource plugin (via /proc/loadavg file) to allow transmitting load averages of the last one minute via the TaskTrackerStatus. Patch is up for review, with test cases added, at: https://reviews.apache.org/r/20/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-40) Provide support for missing yarn commands
[ https://issues.apache.org/jira/browse/YARN-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555176#comment-13555176 ] Harsh J commented on YARN-40: - Junping, Do you mean an equivalent for the yarn node command for MRv1 tasktrackers? I guess it could be done if there is value in it (personally I've not seen people interested in monitoring a single TT's state of maps/reduces via the CLI). Other than that, the commands seem to be YARN specific? Provide support for missing yarn commands - Key: YARN-40 URL: https://issues.apache.org/jira/browse/YARN-40 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.0-alpha Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4155-1.patch, MAPREDUCE-4155.patch, YARN-40-1.patch, YARN-40-20120917.1.txt, YARN-40-20120917.txt, YARN-40-20120924.txt, YARN-40-20121008.txt, YARN-40.patch 1. status app-id 2. kill app-id (Already issue present with Id : MAPREDUCE-3793) 3. list-apps [all] 4. nodes-report -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-4171 to YARN-281: - Component/s: (was: mrv2) (was: test) scheduler Affects Version/s: (was: 2.0.0-alpha) 2.0.0-alpha Key: YARN-281 (was: MAPREDUCE-4171) Project: Hadoop YARN (was: Hadoop Map/Reduce) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Harsh J Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-284) YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized cluster
[ https://issues.apache.org/jira/browse/YARN-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-3268 to YARN-284: - Component/s: (was: scheduler) scheduler Affects Version/s: (was: 0.23.0) 2.0.0-alpha Key: YARN-284 (was: MAPREDUCE-3268) Project: Hadoop YARN (was: Hadoop Map/Reduce) YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized cluster -- Key: YARN-284 URL: https://issues.apache.org/jira/browse/YARN-284 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon The fair scheduler in MR1 has the behavior that, if a job is submitted to an under-utilized cluster and the cluster has more open slots than tasks in the job, the tasks are spread evenly throughout the cluster. This improves job latency since more spindles and NICs are utilized to complete the job. In MR2 I see this issue causing significantly longer job runtimes when there is excess capacity in the cluster -- especially on reducers which sometimmes end up clumping together on a smaller set of nodes which then become disk/network constrained. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-239) Make link in Aggregation is not enabled. Try the nodemanager at
[ https://issues.apache.org/jira/browse/YARN-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-4509 to YARN-239: - Component/s: (was: webapps) nodemanager Fix Version/s: (was: 0.23.5) (was: 3.0.0) Affects Version/s: (was: 0.23.0) 2.0.0-alpha Key: YARN-239 (was: MAPREDUCE-4509) Project: Hadoop YARN (was: Hadoop Map/Reduce) Make link in Aggregation is not enabled. Try the nodemanager at - Key: YARN-239 URL: https://issues.apache.org/jira/browse/YARN-239 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Radim Kolar Priority: Trivial if log aggregation is disabled message is displayed *Aggregation is not enabled. Try the nodemanager at reavers.com:9006* It would be helpfull to make link to nodemanager clickable. This message is located in /hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java but i could not figure out how to make link in hamlet framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-239) Make link in Aggregation is not enabled. Try the nodemanager at
[ https://issues.apache.org/jira/browse/YARN-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened YARN-239: -- Worth linking the NM if it can be done. Apologies if no one had got back to you on the Hamlet question yet, it is a rather new part in the framework - but I think this is worth having. Make link in Aggregation is not enabled. Try the nodemanager at - Key: YARN-239 URL: https://issues.apache.org/jira/browse/YARN-239 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Radim Kolar Priority: Trivial if log aggregation is disabled message is displayed *Aggregation is not enabled. Try the nodemanager at reavers.com:9006* It would be helpfull to make link to nodemanager clickable. This message is located in /hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java but i could not figure out how to make link in hamlet framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-238) ClientRMProtocol needs to allow the specification of a ResourceRequest so that the Application Master's Container can be placed on the specified host
[ https://issues.apache.org/jira/browse/YARN-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502897#comment-13502897 ] Harsh J commented on YARN-238: -- Is there also need for this to be a strict need or is it good to be flexible (i.e. non guaranteeing) like other resource requests (we do a good locality job to not have this concern very frequently, but still)? ClientRMProtocol needs to allow the specification of a ResourceRequest so that the Application Master's Container can be placed on the specified host - Key: YARN-238 URL: https://issues.apache.org/jira/browse/YARN-238 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Vinayak Borkar Currently a client is able to specify only resource requirements in terms of amount of memory required while launching an ApplicationMaster. There needs to be a way to ask for resources using a ResourceRequest so that a host name could be specified in addition to the amount of memory required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-168) No way to turn off virtual memory limits without turning off physical memory limits
Harsh J created YARN-168: Summary: No way to turn off virtual memory limits without turning off physical memory limits Key: YARN-168 URL: https://issues.apache.org/jira/browse/YARN-168 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Harsh J Asked and reported by a user (Krishna) on ML: {quote} This is possible to do, but you've hit a bug with the current YARN implementation. Ideally you should be able to configure the vmem-pmem ratio (or an equivalent config) to be -1, to indicate disabling of virtual memory checks completely (and there's indeed checks for this), but it seems like we are enforcing the ratio to be at least 1.0 (and hence negatives are disallowed). You can't workaround by setting the NM's offered resource.mb to -1 either, as you'll lose out on controlling maximum allocations. Please file a YARN bug on JIRA. The code at fault lies under ContainersMonitorImpl#init(…). On Thu, Oct 18, 2012 at 4:00 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Is there a way we can ask the YARN RM for not killing a container when it uses excess virtual memory than the maximum it can use as per the specification in the configuration file yarn-site.xml? We can't always estimate the amount of virtual memory needed for our application running on a container, but we don't want to get it killed in a case it exceeds the maximum limit. Please suggest as to how can we come across this issue. Thanks, Kishore {quote} Basically, we're doing: {code} // / Virtual memory configuration // float vmemRatio = conf.getFloat( YarnConfiguration.NM_VMEM_PMEM_RATIO, YarnConfiguration.DEFAULT_NM_VMEM_PMEM_RATIO); Preconditions.checkArgument(vmemRatio 0.99f, YarnConfiguration.NM_VMEM_PMEM_RATIO + should be at least 1.0); this.maxVmemAllottedForContainers = (long)(vmemRatio * maxPmemAllottedForContainers); {code} For virtual memory monitoring to be disabled, maxVmemAllottedForContainers has to be -1. For that to be -1, given the above buggy computation, vmemRatio must be -1 or maxPmemAllottedForContainers must be -1. If vmemRatio were -1, we fail the precondition check and exit. If maxPmemAllottedForContainers, we also end up disabling physical memory monitoring. Or perhaps that makes sense - to disable both physical and virtual memory monitoring, but that way your NM becomes infinite in resource grants, I think. We need a way to selectively disable kills done via virtual memory monitoring, which is the base request here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-149) ZK-based High Availability (HA) for ResourceManager (RM)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved MAPREDUCE-4345 to YARN-149: - Issue Type: New Feature (was: Improvement) Key: YARN-149 (was: MAPREDUCE-4345) Project: Hadoop YARN (was: Hadoop Map/Reduce) ZK-based High Availability (HA) for ResourceManager (RM) Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha One of the goals presented on MAPREDUCE-279 was to have high availability. One way that was discussed, per Mahadev/others on https://issues.apache.org/jira/browse/MAPREDUCE-2648 and other places, was ZK: {quote} Am not sure, if you already know about the MR-279 branch (the next version of MR framework). We've been trying to integrate ZK into the framework from the beginning. As for now, we are just doing restart with ZK but soon we should have a HA soln with ZK. {quote} There is now MAPREDUCE-4343 that tracks recoverability via ZK. This JIRA is meant to track HA via ZK. Currently there isn't a HA solution for RM, via ZK or otherwise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-138) Improve default config values for YARN
[ https://issues.apache.org/jira/browse/YARN-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466357#comment-13466357 ] Harsh J commented on YARN-138: -- Thanks, Sid! Improve default config values for YARN -- Key: YARN-138 URL: https://issues.apache.org/jira/browse/YARN-138 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.0-alpha Reporter: Arun C Murthy Assignee: Harsh J Labels: performance Attachments: MAPREDUCE-4316.patch, YARN138.txt Currently some of our configs are way off e.g. min-alloc is 128M while max-alloc is 10240. This leads to poor out-of-box performance as noticed by some users: http://s.apache.org/avd -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-130) Yarn examples use wrong configuration
[ https://issues.apache.org/jira/browse/YARN-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462920#comment-13462920 ] Harsh J commented on YARN-130: -- This is interesting. In trunk today, the HDFS clients do not require instantiating a HdfsConfiguration instance manually. They are auto-loaded by the classes that get loaded for HDFS FS. Similar should be done by YARN, given we use YARN client classes to interact with YARN anyway? Regarding the error message improvements, can you file a new JIRA with what you get and what to expect rather? Yarn examples use wrong configuration - Key: YARN-130 URL: https://issues.apache.org/jira/browse/YARN-130 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.0.3-alpha Reporter: Erich Schubert Priority: Minor AFAICT the example applications are broken when you don't use default ports. So it probably won't show in a single node setup. The bug fix seems to be: -conf = new Configuration(); +conf = new YarnConfiguration(); Then the yarn settings file (containing relevant host and port information) will also be read. The error messages *need* to be improved. For me, they said something like protocol not supported. The reason was that a different hadoop RPC was running on the port it was connecting to. It took me a lot of debugging to find out that it was just talking to the wrong service because it had not read it's configuration file... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-116) Add exclude/include file , need restart NN or RM.
[ https://issues.apache.org/jira/browse/YARN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-116: - Environment: (was: suse) Add exclude/include file , need restart NN or RM. - Key: YARN-116 URL: https://issues.apache.org/jira/browse/YARN-116 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: xieguiming Attachments: HADOOP-835-0.patch, HADOOP-835-1.patch, HADOOP-835.patch yarn.resourcemanager.nodes.include-path default value is , if we need add one include file. and we must restart the RM. I suggest that adding one include or exclude file, no need restart the RM. only execute the refresh command. NN is the same. Modify the HostsFileReader class: public HostsFileReader(String inFile, String exFile) to public HostsFileReader(Configuration conf, String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH, String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH) and thus, we can read the config file dynamic. and no need to restart the NM/NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-116) RM is missing ability to add include/exclude files without a restart
[ https://issues.apache.org/jira/browse/YARN-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-116: - Summary: RM is missing ability to add include/exclude files without a restart (was: Add exclude/include file , need restart NN or RM.) RM is missing ability to add include/exclude files without a restart Key: YARN-116 URL: https://issues.apache.org/jira/browse/YARN-116 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: xieguiming Attachments: HADOOP-835-0.patch, HADOOP-835-1.patch, HADOOP-835.patch yarn.resourcemanager.nodes.include-path default value is , if we need add one include file. and we must restart the RM. I suggest that adding one include or exclude file, no need restart the RM. only execute the refresh command. NN is the same. Modify the HostsFileReader class: public HostsFileReader(String inFile, String exFile) to public HostsFileReader(Configuration conf, String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH, String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH) and thus, we can read the config file dynamic. and no need to restart the NM/NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-97) nodemanager depends on /bin/bash
[ https://issues.apache.org/jira/browse/YARN-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461472#comment-13461472 ] Harsh J commented on YARN-97: - bq. It should be well documented for system not having bash installed by default such as FreeBSD. Why don't we simply document requirements then? I've recently seen /bin/sh shbanged scripts cause trouble on Ubuntu cause /bin/sh points to Ubuntu's dash (https://wiki.ubuntu.com/DashAsBinSh). You don't wanna run into such a trouble and end up changing things (hadoop or OS side) post-deploy. I'll still vote we stick to one shell (bash) and be clear we need it. nodemanager depends on /bin/bash Key: YARN-97 URL: https://issues.apache.org/jira/browse/YARN-97 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: FreeBSD 8.2 / 64 bit Reporter: Radim Kolar Labels: patch Attachments: bash-replace-by-sh.txt Currently nodemanager depends on bash shell. It should be well documented for system not having bash installed by default such as FreeBSD. Because only basic functionality of bash is used, probably changing bash to /bin/sh would work enough. i found 2 cases: 1. DefaultContainerExecutor.java creates file with /bin/bash hardcoded in writeLocalWrapperScript. (this needs bash in /bin) 2. yarn-hduser-nodemanager-ponto.amerinoc.com.log:2012-04-03 19:50:10,798 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /tmp/nm-local-dir/usercache/hduser/appcache/application_1333474251533_0002/container_1333474251533_0002_01_12/default_container_executor.sh] this created script is also launched by bash - bash anywhere in path works - in freebsd it is /usr/local/bin/bash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-97) nodemanager depends on /bin/bash
[ https://issues.apache.org/jira/browse/YARN-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461473#comment-13461473 ] Harsh J commented on YARN-97: - bq. Why don't we simply document requirements then? We can additionally be clear that we demand bash exists in /bin if thats the whole trouble here? Or rely on {{env bash}}, but no idea if thats cross platform properly as well. nodemanager depends on /bin/bash Key: YARN-97 URL: https://issues.apache.org/jira/browse/YARN-97 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: FreeBSD 8.2 / 64 bit Reporter: Radim Kolar Labels: patch Attachments: bash-replace-by-sh.txt Currently nodemanager depends on bash shell. It should be well documented for system not having bash installed by default such as FreeBSD. Because only basic functionality of bash is used, probably changing bash to /bin/sh would work enough. i found 2 cases: 1. DefaultContainerExecutor.java creates file with /bin/bash hardcoded in writeLocalWrapperScript. (this needs bash in /bin) 2. yarn-hduser-nodemanager-ponto.amerinoc.com.log:2012-04-03 19:50:10,798 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /tmp/nm-local-dir/usercache/hduser/appcache/application_1333474251533_0002/container_1333474251533_0002_01_12/default_container_executor.sh] this created script is also launched by bash - bash anywhere in path works - in freebsd it is /usr/local/bin/bash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-101: - Description: see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info(Removed completed container + containerId); } } nodeStatus.setContainersStatuses(containersStatuses); LOG.debug(this.nodeId + sending out status for + numActiveContainers + containers); NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus(); nodeHealthStatus.setHealthReport(healthChecker.getHealthReport()); nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy()); nodeHealthStatus.setLastHealthReportTime( healthChecker.getLastHealthReportTime()); if (LOG.isDebugEnabled()) { LOG.debug(Node's health-status : + nodeHealthStatus.getIsNodeHealthy() + , + nodeHealthStatus.getHealthReport()); } nodeStatus.setNodeHealthStatus(nodeHealthStatus); ListApplicationId keepAliveAppIds = createKeepAliveApplicationList(); nodeStatus.setKeepAliveApplications(keepAliveAppIds);
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461315#comment-13461315 ] Harsh J commented on YARN-101: -- [~xieguiming] - I tweaked the sentences a bit so you're sounding more clear. You're essentially saying that we may be removing completed containers completely, which in case of a node-heartbeat failure, we should make sure to propagate eventually again (on next successful heartbeat), correct? If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Priority: Minor see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color}
[jira] [Commented] (YARN-56) Handle container requests that request more resources than available in the cluster
[ https://issues.apache.org/jira/browse/YARN-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461316#comment-13461316 ] Harsh J commented on YARN-56: - bq. Handle container requests that request more resources than available in the cluster Won't this be better as a summary if it read Handle container requests that request more resources than _presently_ available in the cluster? Since there's another case where requests maximum allowed requests itself needs to be first capped, so that scheduling may occur. Handle container requests that request more resources than available in the cluster --- Key: YARN-56 URL: https://issues.apache.org/jira/browse/YARN-56 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.3 Reporter: Hitesh Shah In heterogenous clusters, a simple check at the scheduler to check if the allocation request is within the max allocatable range is not enough. If there are large nodes in the cluster which are not available, there may be situations where some allocation requests will never be fulfilled. Need an approach to decide when to invalidate such requests. For application submissions, there will need to be a feedback loop for applications that could not be launched. For running AMs, AllocationResponse may need to augmented with information for invalidated/cancelled container requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-56) Handle container requests that request more resources than available in the cluster
[ https://issues.apache.org/jira/browse/YARN-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461317#comment-13461317 ] Harsh J commented on YARN-56: - +1 on Robert's timeout suggestion though (per app, with a reasonable default). Handle container requests that request more resources than available in the cluster --- Key: YARN-56 URL: https://issues.apache.org/jira/browse/YARN-56 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 0.23.3 Reporter: Hitesh Shah In heterogenous clusters, a simple check at the scheduler to check if the allocation request is within the max allocatable range is not enough. If there are large nodes in the cluster which are not available, there may be situations where some allocation requests will never be fulfilled. Need an approach to decide when to invalidate such requests. For application submissions, there will need to be a feedback loop for applications that could not be launched. For running AMs, AllocationResponse may need to augmented with information for invalidated/cancelled container requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-71: Issue Type: Test (was: Bug) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Test Components: nodemanager Reporter: Vinod Kumar Vavilapalli We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-71: Labels: (was: test) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Test Components: nodemanager Reporter: Vinod Kumar Vavilapalli We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-71) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart
[ https://issues.apache.org/jira/browse/YARN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-71: Labels: test (was: ) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart Key: YARN-71 URL: https://issues.apache.org/jira/browse/YARN-71 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Vinod Kumar Vavilapalli We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-111) Application level priority in Resource Manager Schedulers
[ https://issues.apache.org/jira/browse/YARN-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459016#comment-13459016 ] Harsh J commented on YARN-111: -- Robert, I still see Job priority exist in MR1 (1.x). Which JIRA removed this, per your comment above? Or is this something CapacityScheduler specific we're discussing? In YARN I see Priority coming in for generally all resource requests (which I assume does apply to the AM too) and hence http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setPriority(org.apache.hadoop.yarn.api.records.Priority) ought to work, as the CS's LeafQueue does look at it? Application level priority in Resource Manager Schedulers - Key: YARN-111 URL: https://issues.apache.org/jira/browse/YARN-111 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.1-alpha Reporter: nemon lou We need application level priority for Hadoop 2.0,both in FIFO scheduler and Capacity Scheduler. In Hadoop 1.0.x,job priority is supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-80) Support delay scheduling for node locality in MR2's capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451772#comment-13451772 ] Harsh J commented on YARN-80: - Hi Arun, Thanks very much for doing this! We could probably address this in a new JIRA but I had two questions: - Why was the feature decided to be disabled by default? - Is there no way to not have people change configuration based on their # of racks (i.e. make it automated)? Support delay scheduling for node locality in MR2's capacity scheduler -- Key: YARN-80 URL: https://issues.apache.org/jira/browse/YARN-80 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Todd Lipcon Assignee: Arun C Murthy Fix For: 2.0.2-alpha Attachments: YARN-80.patch, YARN-80.patch The capacity scheduler in MR2 doesn't support delay scheduling for achieving node-level locality. So, jobs exhibit poor data locality even if they have good rack locality. Especially on clusters where disk throughput is much better than network capacity, this hurts overall job performance. We should optionally support node-level delay scheduling heuristics similar to what the fair scheduler implements in MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira