[ https://issues.apache.org/jira/browse/HDFS-12142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087565#comment-16087565 ]
Kihwal Lee commented on HDFS-12142: ----------------------------------- The following appears after the files is successfully closed. It seems DataStreamer is sometimes left running and the regular pipeline shutdown is somehow recognized as a failure. {noformat} 2017-07-10 20:19:11,870 [IPC Server handler 72 on 8020] INFO ipc.Server: IPC Server handler 72 on 8020, call Call#99 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.updateBlockForPipeline from x.x.x.x:50972 java.io.IOException: Unexpected BlockUCState: BP-yyy:blk_12300000_10000 is COMPLETE but not UNDER_CONSTRUCTION at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:5509) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:5576) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:918) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline (ClientNamenodeProtocolServerSideTranslatorPB.java:971) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod (ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:881) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:810) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2523) {noformat} The blocks are all finalized normally and had no data loss, but until we know the actual cause of this, I can't be sure whether it will cause any data loss. > Files may be closed before streamer is done > ------------------------------------------- > > Key: HDFS-12142 > URL: https://issues.apache.org/jira/browse/HDFS-12142 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.8.0 > Reporter: Daryn Sharp > > We're encountering multiple cases of clients calling updateBlockForPipeline > on completed blocks. Initial analysis is the client closes a file, > completeFile succeeds, then it immediately attempts recovery. The exception > is swallowed on the client, only logged on the NN by checkUCBlock. > The problem "appears" to be benign (no data loss) but it's unproven if the > issue always occurs for successfully closed files. There appears to be very > poor coordination between the dfs output stream's threads which leads to > races that confuse the streamer thread – which probably should have been > joined before returning from close. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org