[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout
[ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901762#comment-16901762 ] Szilard Nemeth commented on YARN-9667: -- Thanks [~eyang]! > Container-executor.c duplicates messages to stdout > -- > > Key: YARN-9667 > URL: https://issues.apache.org/jira/browse/YARN-9667 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9667-001.patch > > > When a container is killed by its AM we get a similar error message like this: > {noformat} > 2019-06-30 12:09:04,412 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: > Shell execution returned exit code: 143. Privileged Execution Operation > Stderr: > Stdout: main : command provided 1 > main : run as user is systest > main : requested yarn user is systest > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > {noformat} > In container-executor.c the fork point is right after the "Creating script > paths..." part, though in the Stdout log we can clearly see it has been > written there twice. After consulting with [~pbacsko] it seems like there's a > missing flush in container-executor.c before the fork and that causes the > duplication. > I suggest to add a flush there so that it won't be duplicated: it's a bit > misleading that the child process writes out "Getting exit code file" and > "Creating script paths" even though it is clearly not doing that. > A more appealing solution could be to revisit the fprintf-fflush pairs in the > code and change them to a single call, so that the fflush calls would not be > forgotten accidentally. (It can cause problems in every place where it's > used). > Note: this issue probably affects every occasion of fork(), not just the one > from {{launch_container_as_user}} in {{main.c}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9715) [YARN UI2] yarn-container-log support for https Knox Gateway url in nodes page
[ https://issues.apache.org/jira/browse/YARN-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-9715: --- Summary: [YARN UI2] yarn-container-log support for https Knox Gateway url in nodes page (was: YARN UI2 - yarn-container-log support for https Knox Gateway url) > [YARN UI2] yarn-container-log support for https Knox Gateway url in nodes page > -- > > Key: YARN-9715 > URL: https://issues.apache.org/jira/browse/YARN-9715 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Prabhu Joseph >Assignee: Akhil PB >Priority: Major > > Currently yarn-container-log (UI2 - Nodes - List of Containers - log file) > creates url with node scheme (http) and nodeHttpAddress. This does not work > with Knox Gateway https url. The logic to construct url can be improved to > accept both normal and knox case. The similar way is used in Applications -> > Logs Section. > And also UI2 - Nodes - List of Containers - log file does not have pagination > support for log file. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9688) Variable description error of method in stateMachine class
[ https://issues.apache.org/jira/browse/YARN-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhou wu updated YARN-9688: - Labels: Newbie (was: ) > Variable description error of method in stateMachine class > --- > > Key: YARN-9688 > URL: https://issues.apache.org/jira/browse/YARN-9688 > Project: Hadoop YARN > Issue Type: Bug >Reporter: runzhou wu >Assignee: runzhou wu >Priority: Trivial > Labels: Newbie > Attachments: YARN-9688.001.patch > > > StateMachineFactory class > /** > * Effect a transition due to the effecting stimulus. > * @param {color:#FF}state{color} current state > * @param eventType trigger to initiate the transition > * @param {color:#FF}cause{color} causal eventType context > * @return transitioned state > */ > private STATE doTransition > (OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event) > throws InvalidStateTransitionException { -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9692) ContainerAllocationExpirer is missspelled
[ https://issues.apache.org/jira/browse/YARN-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhou wu updated YARN-9692: - Labels: newbie (was: ) > ContainerAllocationExpirer is missspelled > - > > Key: YARN-9692 > URL: https://issues.apache.org/jira/browse/YARN-9692 > Project: Hadoop YARN > Issue Type: Bug >Reporter: runzhou wu >Assignee: runzhou wu >Priority: Trivial > Labels: newbie > Attachments: YARN-9692.001.patch > > > The class ContainerAllocationExpirer is missspelled. > I think it should be changed to ContainerAllocationExpired -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
[ https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901665#comment-16901665 ] Eric Badger commented on YARN-9527: --- [~billie.rina...@gmail.com], [~djp], [~eyang], [~bibinchundatt], you've all committed changes to the ResourceLocalizationService recently. Could one of you give an additional review on this change? > Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file > - > > Key: YARN-9527 > URL: https://issues.apache.org/jira/browse/YARN-9527 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.5, 3.1.2 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-9527.001.patch, YARN-9527.002.patch, > YARN-9527.003.patch, YARN-9527.004.patch > > > A rogue ContainerLocalizer can get stuck in a loop continuously downloading > the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" > exception on each iteration. Sometimes this continues long enough that it > fills up a disk or depletes available inodes for the filesystem. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901656#comment-16901656 ] Zhankun Tang edited comment on YARN-9721 at 8/7/19 3:20 AM: [~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid elastic environment. I'm checking this story to get a more clear understanding. BTW, which solution do you prefer? was (Author: tangzhankun): [~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid environment. I'm checking this story to get a more clear understanding. BTW, which solution do you prefer? > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901656#comment-16901656 ] Zhankun Tang commented on YARN-9721: [~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid environment. I'm checking this story to get a more clear understanding. BTW, which solution do you prefer? > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9442) container working directory has group read permissions
[ https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901647#comment-16901647 ] Eric Badger commented on YARN-9442: --- +1 lgtm. I'll commit in a day or two if there are no objections. > container working directory has group read permissions > -- > > Key: YARN-9442 > URL: https://issues.apache.org/jira/browse/YARN-9442 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.2 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-9442.001.patch, YARN-9442.002.patch > > > Container working directories are currently created with permissions 0750, > owned by the user and with the group set to the node manager group. > Is there any reason why these directories need group read permissions? > I have been testing with group read permissions removed and so far I haven't > encountered any problems. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901638#comment-16901638 ] Xie YiFan commented on YARN-6539: - [~subru] Could you review this patch for me? > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901604#comment-16901604 ] Hadoop QA commented on YARN-9564: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 6s{color} | {color:orange} The patch generated 113 new + 0 unchanged - 0 fixed = 113 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s{color} | {color:green} hadoop-assemblies in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}203m 15s{color} | {color:green} hadoop-yarn in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}296m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9564 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12976847/YARN-9564.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml pylint | | uname | Linux b5a644c5e0af 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b77761b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | pylint | v1.9.2 | | pylint | https://builds.apache.org/job/PreCommit-YARN-Build/24483/artifact/out/diff-patch-pylint.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24483/testReport/ | | Max. process+thread count | 896 (vs. ulimi
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901554#comment-16901554 ] kevin su commented on YARN-9678: [~jojochuang] Thank you so much > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9559: Comment: was deleted (was: Attached a branch-3.2 version. It's the same as trunk modulo trivial conflicts.) > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9559: Attachment: (was: YARN-9559-branch-3.2.001.patch) > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901501#comment-16901501 ] Hudson commented on YARN-9559: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17050 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17050/]) YARN-9559. Create AbstractContainersLauncher for pluggable (haibochen: rev f51702d5398531835b24d812f6f95094a0e0493e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/AbstractContainersLauncher.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/package-info.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, > YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, > YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901500#comment-16901500 ] Eric Yang commented on YARN-9564: - [~ebadger] {quote}Yes, you'll need to run this script as sudo due to a few of the commands in the script. It's probably easiest to run the whole script as root, but I like to run as little as possible as root. I could require the script run as root and then drop privileges when they aren't needed.{quote} My first choice is to route this operation via YARN daemon to perform the privilege escalation operations similar to container launch. This would put the usage of the command similar to "docker build" that requires a trusted daemon to validate security then perform the image build operation. The second choice is to check the current user is a privileged user and run accordingly. {quote}I believe that Craig Condit has a java version of something similar to this tool. I don't think that I'm going to have time to rewrite this in Java, but we might be able to leverage his tool if you think that approach is better.{quote} [~ccondit]'s tool provides basic flatten of docker image to squashfs. The python implementation provides more metadata management of layers. Unless there is effort to add metadata management to Craig's version. They are not equal in functionality. I can not recommend to change direction unless someone willing put in the effort to covert the python work to Java. I am shy from committing python version at this time because external dependencies make this script incomplete to function as standalone unit. I think it is too hard to replicate for normal users. The script can improves on detecting pre-requisites dependencies ahead of time instead of error out half way of the script execution. > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9720) MR job submitted to a queue with default partition accessing the non-exclusive label resources
[ https://issues.apache.org/jira/browse/YARN-9720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901499#comment-16901499 ] Eric Payne commented on YARN-9720: -- [~gb.ana...@gmail.com], can you please attach a copy of your capacity-scheduler.xml to show the queue and label configuration properties? > MR job submitted to a queue with default partition accessing the > non-exclusive label resources > -- > > Key: YARN-9720 > URL: https://issues.apache.org/jira/browse/YARN-9720 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Affects Versions: 3.1.1, 3.1.2 >Reporter: ANANDA G B >Assignee: ANANDA G B >Priority: Major > Attachments: Issue.png > > > When MR job is submitted to a queue1 with default partition, then it is > accessing non-exclusive partition resources. Please find the attachments. > MR Job command: > ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.0201.jar > pi -Dmapreduce.job.queuename=queue1 -Dmapreduce.job.node-label-expression= 10 > 10 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9442) container working directory has group read permissions
[ https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901490#comment-16901490 ] Hadoop QA commented on YARN-9442: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 14s{color} | {color:red} YARN-9442 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9442 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965018/YARN-9442.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24485/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > container working directory has group read permissions > -- > > Key: YARN-9442 > URL: https://issues.apache.org/jira/browse/YARN-9442 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.2 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-9442.001.patch, YARN-9442.002.patch > > > Container working directories are currently created with permissions 0750, > owned by the user and with the group set to the node manager group. > Is there any reason why these directories need group read permissions? > I have been testing with group read permissions removed and so far I haven't > encountered any problems. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files
[ https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901483#comment-16901483 ] Hadoop QA commented on YARN-6315: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 11s{color} | {color:red} YARN-6315 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6315 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901695/YARN-6315.006.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24484/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Improve LocalResourcesTrackerImpl#isResourcePresent to return false for > corrupted files > --- > > Key: YARN-6315 > URL: https://issues.apache.org/jira/browse/YARN-6315 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-6315.001.patch, YARN-6315.002.patch, > YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, > YARN-6315.006.patch > > > We currently check if a resource is present by making sure that the file > exists locally. There can be a case where the LocalizationTracker thinks that > it has the resource if the file exists but with size 0 or less than the > "expected" size of the LocalResource. This JIRA tracks the change to harden > the isResourcePresent call to address that case. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
[ https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901473#comment-16901473 ] Jim Brennan commented on YARN-9527: --- We have been running with this patch on one of our large research clusters for about a month. I scanned for this issue again today and there were no instances of it. That is not definitive, but it is a good sign. We also have not had any new problems reported as a result of this change. I will continue to monitor our clusters for this. [~ebadger], did you want to see if we can get some other reviewers for this patch? > Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file > - > > Key: YARN-9527 > URL: https://issues.apache.org/jira/browse/YARN-9527 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.5, 3.1.2 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-9527.001.patch, YARN-9527.002.patch, > YARN-9527.003.patch, YARN-9527.004.patch > > > A rogue ContainerLocalizer can get stuck in a loop continuously downloading > the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" > exception on each iteration. Sometimes this continues long enough that it > fills up a disk or depletes available inodes for the filesystem. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901467#comment-16901467 ] Jonathan Hung commented on YARN-9559: - Attached a branch-3.2 version. It's the same as trunk modulo trivial conflicts. > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, > YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, > YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9559: Attachment: YARN-9559-branch-3.2.001.patch > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, > YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, > YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-9559: - Fix Version/s: 3.3.0 > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901450#comment-16901450 ] Haibo Chen commented on YARN-9559: -- The unit test failure is reported at YARN-5857, independent of the change here. Committing to trunk soon. > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9442) container working directory has group read permissions
[ https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901443#comment-16901443 ] Jim Brennan commented on YARN-9442: --- [~eyang], [~ebadger], [~shaneku...@gmail.com], [~jeagles], any further comments on this? > container working directory has group read permissions > -- > > Key: YARN-9442 > URL: https://issues.apache.org/jira/browse/YARN-9442 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.2 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: YARN-9442.001.patch, YARN-9442.002.patch > > > Container working directories are currently created with permissions 0750, > owned by the user and with the group set to the node manager group. > Is there any reason why these directories need group read permissions? > I have been testing with group read permissions removed and so far I haven't > encountered any problems. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901433#comment-16901433 ] Hadoop QA commented on YARN-9559: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 318 unchanged - 2 fixed = 318 total (was 320) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 50s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12976
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901432#comment-16901432 ] Jim Brennan commented on YARN-8045: --- The patch for 2.8 looks good to me. +1 (non-binding) > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.4, 2.8.6, 2.9.3, 3.1.3 > > Attachments: YARN-8045.001-branch-2.8.patch, YARN-8045.001.patch > > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files
[ https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901428#comment-16901428 ] Kuhu Shukla commented on YARN-6315: --- Thanks for the ping Eric and sorry about the delay on this. This is not a trivial change when it comes to archives and directories and I would have difficulty making time for this patch rework. I apologize and please feel free to reassign and use the existing patch if it is any good. :( > Improve LocalResourcesTrackerImpl#isResourcePresent to return false for > corrupted files > --- > > Key: YARN-6315 > URL: https://issues.apache.org/jira/browse/YARN-6315 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-6315.001.patch, YARN-6315.002.patch, > YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, > YARN-6315.006.patch > > > We currently check if a resource is present by making sure that the file > exists locally. There can be a case where the LocalizationTracker thinks that > it has the resource if the file exists but with size 0 or less than the > "expected" size of the LocalResource. This JIRA tracks the change to harden > the isResourcePresent call to address that case. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files
[ https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901424#comment-16901424 ] Eric Payne commented on YARN-6315: -- [~kshukla], this came up internally for us recently. Do you plan on addressing the comments from [~jlowe] above? > Improve LocalResourcesTrackerImpl#isResourcePresent to return false for > corrupted files > --- > > Key: YARN-6315 > URL: https://issues.apache.org/jira/browse/YARN-6315 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.3, 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-6315.001.patch, YARN-6315.002.patch, > YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, > YARN-6315.006.patch > > > We currently check if a resource is present by making sure that the file > exists locally. There can be a case where the LocalizationTracker thinks that > it has the resource if the file exists but with size 0 or less than the > "expected" size of the LocalResource. This JIRA tracks the change to harden > the isResourcePresent call to address that case. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901413#comment-16901413 ] Eric Badger commented on YARN-9564: --- Patch 002 addresses the rest of your review comments, [~eyang]. I still have open questions from your comments #5 and #7 from my previous comment. > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9564: -- Attachment: YARN-9564.002.patch > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901373#comment-16901373 ] Haibo Chen commented on YARN-9559: -- +1 on the latest patch pending Jenkins. > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901360#comment-16901360 ] Steve Loughran commented on YARN-9724: -- looking @ the spark side of things, it's coming from the line {code} logInfo("Requesting a new application from cluster with %d NodeManagers" .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) {code} That is, it's not actually doing much and could probably be reworked to be less brittle. For now, set the logger org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend to log at WARN, not INFO and that line should be skipped. > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builde
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901352#comment-16901352 ] Jonathan Hung commented on YARN-9559: - Thanks Adam! Attached 005 to fix checkstyle > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9559: Attachment: YARN-9559.005.patch > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901312#comment-16901312 ] Hudson commented on YARN-9678: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17048 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17048/]) YARN-9678. Addendum: TestGpuResourceHandler / TestFpgaResourceHandler (weichiu: rev 7c2042a44d1cd7e60b911cb40642cdd9c443b076) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandlerImpl.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandlerImpl.java > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901308#comment-16901308 ] Wei-Chiu Chuang commented on YARN-9678: --- Added an addendum patch since I forgot to add the new files into the commit. It is now updated. > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901299#comment-16901299 ] Íñigo Goiri commented on YARN-9694: --- +1 on [^YARN-9694.004.patch]. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated YARN-9724: -- Issue Type: Bug (was: New Feature) > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apac
[jira] [Resolved] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved YARN-9724. --- Resolution: Duplicate Release Note: (was: has solved) > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method
[jira] [Reopened] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened YARN-9724: --- > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.st
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901220#comment-16901220 ] Hudson commented on YARN-9678: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17047 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17047/]) YARN-9678. TestGpuResourceHandler / TestFpgaResourceHandler should be (weichiu: rev b8bf09ba3d2514ccfa3c6beb4a7530cd2f3555c7) * (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java * (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901179#comment-16901179 ] Eric Badger commented on YARN-8045: --- [~shaneku...@gmail.com], can you review the 2.8 patch? > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.0.4, 2.8.6, 2.9.3, 3.1.3 > > Attachments: YARN-8045.001-branch-2.8.patch, YARN-8045.001.patch > > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901166#comment-16901166 ] Adam Antal commented on YARN-9559: -- (aside from the checkstyle error) > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic
[ https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901165#comment-16901165 ] Adam Antal commented on YARN-9559: -- Perfect. +1 (non-binding). > Create AbstractContainersLauncher for pluggable ContainersLauncher logic > > > Key: YARN-9559 > URL: https://issues.apache.org/jira/browse/YARN-9559 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9559.001.patch, YARN-9559.002.patch, > YARN-9559.003.patch, YARN-9559.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications
[ https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901156#comment-16901156 ] Adam Antal commented on YARN-9438: -- Trunk patch LGTM (non-binding). > launchTime not written to state store for running applications > -- > > Key: YARN-9438 > URL: https://issues.apache.org/jira/browse/YARN-9438 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9438-branch-2.001.patch, > YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, > YARN-9438.003.patch > > > launchTime is only saved to state store after application finishes, so if > restart happens, any running applications will have launchTime set as -1 > (since this is the default timestamp of the recovery event). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901105#comment-16901105 ] Abhishek Modi commented on YARN-9694: - Tested the latest patch on scale with 4500 nodes json and everything worked fine. [~elgoiri] could you please review it. Thanks. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9410) Typo in documentation: Using FPGA On YARN
[ https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901075#comment-16901075 ] Hudson commented on YARN-9410: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17045 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17045/]) YARN-9410. Typo in documentation: Using FPGA On YARN (#1220) Contributed (weichiu: rev 1c53ce0cda9b21ec7b56c76a65c7b742491b1a67) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingFPGA.md > Typo in documentation: Using FPGA On YARN > -- > > Key: YARN-9410 > URL: https://issues.apache.org/jira/browse/YARN-9410 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0 > > > fpag.major-device-number should be changed to fpga... -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
[ https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901066#comment-16901066 ] Wei-Chiu Chuang commented on YARN-9678: --- LGTM +1 > TestGpuResourceHandler / TestFpgaResourceHandler should be renamed > -- > > Key: YARN-9678 > URL: https://issues.apache.org/jira/browse/YARN-9678 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: kevin su >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-9678.001.patch > > > Their respective production classes are GpuResourceHandlerImpl and > FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase > classnames. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9571) RM state store purging should be configurable if log aggregation does not terminate
[ https://issues.apache.org/jira/browse/YARN-9571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit resolved YARN-9571. Resolution: Won't Fix Closing as won't fix since it appears we will revert YARN-4946 instead. > RM state store purging should be configurable if log aggregation does not > terminate > --- > > Key: YARN-9571 > URL: https://issues.apache.org/jira/browse/YARN-9571 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Attachments: YARN-9571.001.patch > > > YARN-4946 introduced logic which prevents applications from being removed > from the RM State Store unless log aggregation has reached a terminal state. > However, there are cases where log aggregation may not always complete or > fail cleanly (we've seen instances where it shows as NOT_STARTED on > production clusters), and in this case the state store will continue to build > up large numbers of applications. > We should make this behavior configurable. I propose to add a new > configuration: > {code:java} > yarn.resourcemanager.completed-application-ttl-secs > {code} > This can be used to force removal of an application if it would otherwise be > purged from the state store if not for log aggregation status. If this > configuration is set to a positive value, it will be used in conjunction with > the application finish time to determine whether or not to purge the app. The > app would be removed if the max # of completed applications has been reached > AND either the app has completed more than ttl-secs seconds ago OR the app > has completed log aggregation. For backwards compatibility we can set the > default value of this parameter to zero, which would disable the new logic. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901040#comment-16901040 ] Craig Condit commented on YARN-4946: I'd be in favor of reverting, but I will defer to [~wangda]. > RM should not consider an application as COMPLETED when log aggregation is > not in a terminal state > -- > > Key: YARN-4946 > URL: https://issues.apache.org/jira/browse/YARN-4946 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-4946.001.patch, YARN-4946.002.patch, > YARN-4946.003.patch, YARN-4946.004.patch > > > MAPREDUCE-6415 added a tool that combines the aggregated log files for each > Yarn App into a HAR file. When run, it seeds the list by looking at the > aggregated logs directory, and then filters out ineligible apps. One of the > criteria involves checking with the RM that an Application's log aggregation > status is not still running and has not failed. When the RM "forgets" about > an older completed Application (e.g. RM failover, enough time has passed, > etc), the tool won't find the Application in the RM and will just assume that > its log aggregation succeeded, even if it actually failed or is still running. > We can solve this problem by doing the following: > The RM should not consider an app to be fully completed (and thus removed > from its history) until the aggregation status has reached a terminal state > (e.g. SUCCEEDED, FAILED, TIME_OUT). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panlijie resolved YARN-9724. Resolution: Fixed Release Note: has solved Target Version/s: 3.2.0 > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Met
[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panlijie updated YARN-9724: --- Fix Version/s: 3.2.0 > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Fix For: 3.2.0 > > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaM
[jira] [Commented] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery
[ https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900900#comment-16900900 ] Prabhu Joseph commented on YARN-9723: - [~eyang] Can you review this Jira when you get time. This fixes RMAppManager recovery to not call placeApplication for terminated jobs as ApplicationPlacementContext is not required for those jobs. Failing testcase is not related and reported in YARN-9333. > ApplicationPlacementContext is not required for terminated jobs during > recovery > --- > > Key: YARN-9723 > URL: https://issues.apache.org/jira/browse/YARN-9723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9723-001.patch > > >Placement of application (RMAppManager.placeApplication) is called for all > the jobs during recovery. This can be ignored for the terminated jobs. > {code} > at > org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery
[ https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900875#comment-16900875 ] Hadoop QA commented on YARN-9723: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}127m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9723 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12976795/YARN-9723-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b2271482e27f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1127215 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/24481/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24481/testReport/ | | Max. process+thread count | 888 (vs. ulimit of 1) |
[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900836#comment-16900836 ] panlijie commented on YARN-9724: [~Prabhu Joseph] Thank you , I'll keep care for this update > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.
[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900821#comment-16900821 ] Prabhu Joseph commented on YARN-9724: - [~panlijie] This will be fixed by YARN-8699 which has implemented the getClusterMetrics in FederationClientInterceptor. > ERROR SparkContext: Error initializing SparkContext. > > > Key: YARN-9724 > URL: https://issues.apache.org/jira/browse/YARN-9724 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation, router, yarn >Affects Versions: 3.0.0, 3.1.0 > Environment: Hadoop:3.1.0 > Spark:2.3.3 >Reporter: panlijie >Priority: Major > Attachments: spark.log > > > we have some problemes about hadoop-yarn-federation when we use spark on > yarn-federation > The flowing Error find : > org.apache.commons.lang.NotImplementedException: Code is not implemented > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at > org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) > at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) > at org.apache.spark.SparkContext.(SparkContext.scala:500) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invok
[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
[ https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panlijie updated YARN-9724: --- Description: we have some problemes about hadoop-yarn-federation when we use spark on yarn-federation The flowing Error find : org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.(SparkContext.scala:500) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) was: In our HDFS and YARN Federation deploy. we run sparkDemo , The flowing Error find : org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(Feder
[jira] [Created] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.
panlijie created YARN-9724: -- Summary: ERROR SparkContext: Error initializing SparkContext. Key: YARN-9724 URL: https://issues.apache.org/jira/browse/YARN-9724 Project: Hadoop YARN Issue Type: New Feature Components: federation, router, yarn Affects Versions: 3.1.0, 3.0.0 Environment: Hadoop:3.1.0 Spark:2.3.3 Reporter: panlijie Attachments: spark.log In our HDFS and YARN Federation deploy. we run sparkDemo , The flowing Error find : org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155) at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.(SparkContext.scala:500) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -
[jira] [Updated] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery
[ https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9723: Attachment: YARN-9723-001.patch > ApplicationPlacementContext is not required for terminated jobs during > recovery > --- > > Key: YARN-9723 > URL: https://issues.apache.org/jira/browse/YARN-9723 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9723-001.patch > > >Placement of application (RMAppManager.placeApplication) is called for all > the jobs during recovery. This can be ignored for the terminated jobs. > {code} > at > org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery
Prabhu Joseph created YARN-9723: --- Summary: ApplicationPlacementContext is not required for terminated jobs during recovery Key: YARN-9723 URL: https://issues.apache.org/jira/browse/YARN-9723 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Placement of application (RMAppManager.placeApplication) is called for all the jobs during recovery. This can be ignored for the terminated jobs. {code} at org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193) at org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly
[ https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900727#comment-16900727 ] Zac Zhou commented on YARN-9721: [~sunilg] Thanks a lot for your comments~ Maybe I could use some methods to clean up the inactive list. # Add a parameter like "--prune-nodes", to the command "rmadmin -refreshNodes". A parameter named like "prunable" can be added to RMNodes. when "rmadmin -refreshNodes --prune-nodes" is executed. prunable of RMNodes should be true, and RMNodes will deleted by removalTimer. # Add a time period parameter in yarn configuration. If RMNodes stays in the inactive list more than that time period, delete the RMNodes. # Add a parameter in yarn configuration. If the parameter is true. Delete the RMNodes from the inactive list directly. [~sunilg], [~leftnoteasy], [~cheersyang], [~tangzhankun] Any Ideas~ > An easy method to exclude a nodemanager from the yarn cluster cleanly > - > > Key: YARN-9721 > URL: https://issues.apache.org/jira/browse/YARN-9721 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: decommission nodes.png > > > If we want to take offline a nodemanager server, nodes.exclude-path > and "rmadmin -refreshNodes" command are used to decommission the server. > But this method cannot clean up the node clearly. Nodemanager servers are > still in Decommissioned Nodes as the attachment shows. > !decommission nodes.png! > YARN-4311 enable a removalTimer to clean up the untracked node. > But the logic of isUntrackedNode method is to restrict. If include-path is > not used, no servers can meet the criteria. Using an include file would make > a potential risk in maintenance. > If yarn cluster is installed on cloud, nodemanager servers are created and > deleted frequently. We need a way to exclude a nodemanager from the yarn > cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would > keep growing, which would cause a memory issue of RM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9715) YARN UI2 - yarn-container-log support for https Knox Gateway url
[ https://issues.apache.org/jira/browse/YARN-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-9715: --- Assignee: Akhil PB > YARN UI2 - yarn-container-log support for https Knox Gateway url > > > Key: YARN-9715 > URL: https://issues.apache.org/jira/browse/YARN-9715 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Prabhu Joseph >Assignee: Akhil PB >Priority: Major > > Currently yarn-container-log (UI2 - Nodes - List of Containers - log file) > creates url with node scheme (http) and nodeHttpAddress. This does not work > with Knox Gateway https url. The logic to construct url can be improved to > accept both normal and knox case. The similar way is used in Applications -> > Logs Section. > And also UI2 - Nodes - List of Containers - log file does not have pagination > support for log file. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org