[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247564#comment-17247564 ]
Ahmed Hussein commented on YARN-10040: -------------------------------------- On iOS The {{TestDistributedShell}} does not run. But I thought to dump the error here because a NPE could be a hint to what's broken in the implementation. {code:bash} 2020-12-10 17:29:22,129 INFO [IPC Server listener on 8048] ipc.Server (Server.java:run(1344)) - IPC Server listener on 8048: starting 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] collectormanager.NMCollectorService (NMCollectorService.java:serviceStart(101)) - NMCollectorService started at localhost/127.0.0.1:8048 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:serviceStart(267)) - Node ID assigned is : localhost:54943 2020-12-10 17:29:22,207 INFO [Listener at localhost/8048] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(617)) - NodeManager from node localhost(cmPort: 54943 httpPort: 54946) registered with capability: <memory:4096, vCores:8>, assigned nodeId localhost:54943 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMContainerTokenSecretManager (NMContainerTokenSecretManager.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -210390460 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMTokenSecretManagerInNM (NMTokenSecretManagerInNM.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -1432443197 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(486)) - Registered with ResourceManager as localhost:54943 with total resource of <memory:4096, vCores:8> 2020-12-10 17:29:22,212 INFO [Listener at localhost/8048] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(701)) - Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [RM Event dispatcher] rmnode.RMNodeImpl (RMNodeImpl.java:handle(774)) - localhost:54943 Node Transitioned from NEW to UNHEALTHY 2020-12-10 17:29:22,214 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] distributed.NodeQueueLoadMonitor (NodeQueueLoadMonitor.java:removeNode(202)) - Node delete event for: localhost 2020-12-10 17:29:22,215 ERROR [SchedulerEventDispatcher:Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(2127)) - Attempting to remove non-existent node localhost:54943 2020-12-10 17:29:22,215 ERROR [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type NODE_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeFromNodeIdsByRack(NodeQueueLoadMonitor.java:405) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeNode(NodeQueueLoadMonitor.java:204) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:399) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:94) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:71) at java.lang.Thread.run(Thread.java:748) 2020-12-10 17:29:22,216 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] event.EventDispatcher (EventDispatcher.java:run(84)) - Exiting, bbye.. 2020-12-10 17:29:22,217 INFO [Listener at localhost/8048] ipc.CallQueueManager (CallQueueManager.java:<init>(93)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. 2020-12-10 17:29:22,218 INFO [Socket Reader #1 for port 0] ipc.Server (Server.java:run(1265)) - Starting Socket Reader #1 for port 0 2020-12-10 17:29:22,222 INFO [Listener at localhost/54947] pb.RpcServerFactoryPBImpl (RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB to the server {code} {quote}Abhishek Modi any pointers about this? Is the code only broken or just the test. If the functionality itself has some issue we should consider reverting YARN-9697, else if this is only a test issue, we should wrap this up, if there isn't a fix available we can disable this test for time being. Let me know what is the actual situation. I can try help in whichever way possible.{quote} [~abmodi] Would you mind please taking a look at the failures? > DistributedShell test failure on X86 and ARM > -------------------------------------------- > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 > Reporter: zhao bo > Assignee: Abhishek Modi > Priority: Major > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org