[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247564#comment-17247564
 ] 

Ahmed Hussein commented on YARN-10040:
--------------------------------------

On iOS The {{TestDistributedShell}} does not run. But I thought to dump the 
error here because a NPE could be a hint to what's broken in the implementation.


{code:bash}
2020-12-10 17:29:22,129 INFO  [IPC Server listener on 8048] ipc.Server 
(Server.java:run(1344)) - IPC Server listener on 8048: starting
2020-12-10 17:29:22,131 INFO  [Listener at localhost/8048] 
collectormanager.NMCollectorService (NMCollectorService.java:serviceStart(101)) 
- NMCollectorService started at localhost/127.0.0.1:8048
2020-12-10 17:29:22,131 INFO  [Listener at localhost/8048] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:serviceStart(267)) - Node ID assigned is : 
localhost:54943
2020-12-10 17:29:22,207 INFO  [Listener at localhost/8048] 
resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(617)) - NodeManager from node 
localhost(cmPort: 54943 httpPort: 54946) registered with capability: 
<memory:4096, vCores:8>, assigned nodeId localhost:54943
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
security.NMContainerTokenSecretManager 
(NMContainerTokenSecretManager.java:setMasterKey(143)) - Rolling master-key for 
container-tokens, got key with id -210390460
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
security.NMTokenSecretManagerInNM 
(NMTokenSecretManagerInNM.java:setMasterKey(143)) - Rolling master-key for 
container-tokens, got key with id -1432443197
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:registerWithRM(486)) - Registered with 
ResourceManager as localhost:54943 with total resource of <memory:4096, 
vCores:8>
2020-12-10 17:29:22,212 INFO  [Listener at localhost/8048] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating 
the current master key for generating delegation tokens
2020-12-10 17:29:22,212 INFO  [Thread[Thread-282,5,FailOnTimeoutGroup]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(701)) - Starting expired 
delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2020-12-10 17:29:22,212 INFO  [Thread[Thread-282,5,FailOnTimeoutGroup]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating 
the current master key for generating delegation tokens
2020-12-10 17:29:22,212 INFO  [RM Event dispatcher] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(774)) - localhost:54943 Node Transitioned from NEW to 
UNHEALTHY
2020-12-10 17:29:22,214 INFO  
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Processor] distributed.NodeQueueLoadMonitor 
(NodeQueueLoadMonitor.java:removeNode(202)) - Node delete event for: localhost
2020-12-10 17:29:22,215 ERROR [SchedulerEventDispatcher:Event Processor] 
capacity.CapacityScheduler (CapacityScheduler.java:removeNode(2127)) - 
Attempting to remove non-existent node localhost:54943
2020-12-10 17:29:22,215 ERROR 
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error 
in handling event type NODE_REMOVED to the Event Dispatcher
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeFromNodeIdsByRack(NodeQueueLoadMonitor.java:405)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeNode(NodeQueueLoadMonitor.java:204)
        at 
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:399)
        at 
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:94)
        at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:71)
        at java.lang.Thread.run(Thread.java:748)
2020-12-10 17:29:22,216 INFO  
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Processor] event.EventDispatcher (EventDispatcher.java:run(84)) - Exiting, 
bbye..
2020-12-10 17:29:22,217 INFO  [Listener at localhost/8048] ipc.CallQueueManager 
(CallQueueManager.java:<init>(93)) - Using callQueue: class 
java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class 
org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2020-12-10 17:29:22,218 INFO  [Socket Reader #1 for port 0] ipc.Server 
(Server.java:run(1265)) - Starting Socket Reader #1 for port 0
2020-12-10 17:29:22,222 INFO  [Listener at localhost/54947] 
pb.RpcServerFactoryPBImpl (RpcServerFactoryPBImpl.java:createServer(174)) - 
Adding protocol org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB to the 
server

{code}

{quote}Abhishek Modi any pointers about this? Is the code only broken or just 
the test. If the functionality itself has some issue we should consider 
reverting YARN-9697, else if this is only a test issue, we should wrap this up, 
if there isn't a fix available we can disable this test for time being. Let me 
know what is the actual situation. I can try help in whichever way 
possible.{quote}

[~abmodi] Would you mind please taking a look at the failures?



> DistributedShell test failure on X86 and ARM
> --------------------------------------------
>
>                 Key: YARN-10040
>                 URL: https://issues.apache.org/jira/browse/YARN-10040
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications/distributed-shell
>         Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>            Reporter: zhao bo
>            Assignee: Abhishek Modi
>            Priority: Major
>         Attachments: YARN-10040.001.patch
>
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to