Elek, Marton created HDFS-13309:
-----------------------------------

             Summary: Ozone: Improve error message in case of missing nodes
                 Key: HDFS-13309
                 URL: https://issues.apache.org/jira/browse/HDFS-13309
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: HDFS-7240
    Affects Versions: HDFS-7240
            Reporter: Elek, Marton
            Assignee: Elek, Marton


During testing ozonefs with spark I found multiple error messages in the log:

{code}
scm_1              | java.lang.NullPointerException
scm_1              |    at 
org.apache.hadoop.ozone.scm.container.ContainerStates.ContainerStateMap.addContainer(ContainerStateMap.java:129)
scm_1              |    at 
org.apache.hadoop.ozone.scm.container.ContainerStateManager.allocateContainer(ContainerStateManager.java:308)
scm_1              |    at 
org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:244)
scm_1              |    at 
org.apache.hadoop.ozone.scm.block.BlockManagerImpl.preAllocateContainers(BlockManagerImpl.java:189)
scm_1              |    at 
org.apache.hadoop.ozone.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:291)
scm_1              |    at 
org.apache.hadoop.ozone.scm.StorageContainerManager.allocateBlock(StorageContainerManager.java:1131)
scm_1              |    at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:109)
scm_1              |    at 
org.apache.hadoop.hdsl.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:8038)
scm_1              |    at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
scm_1              |    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
scm_1              |    at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
scm_1              |    at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
scm_1              |    at java.security.AccessController.doPrivileged(Native 
Method)
scm_1              |    at javax.security.auth.Subject.doAs(Subject.java:422)
scm_1              |    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
scm_1              |    at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
{code}

The problem is that PiplineManager..getPipeline() may return with null if 
pipline couldn't be found/establised (for example if I have not enogh nodes for 
a ratis ring).

In ContainerStateMap.addContainer this pipline is expected to be not null.

I suggest to do an additional check in ContainerStateManager.allocateContainer 
and return with more meaningfull error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to