[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM killed,
I check the log and found ArrayIndexOutOfBoundsException and 
NullPointerException in RMLog:

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Attachment: logs.rar

> After kill command is send, the job hangs 
> --
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
> Attachments: logs.rar
>
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the job is hang,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> 

[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:


[^C:\Users\Administrator\Desktop\logs.zip]

> After kill command is send, the job hangs 
> --
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the job is hang,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
>