Hello, We encountered some error on Accumulo 1.7.2. The error seems to be HDFS replication issue, but HDFS is not full.
Actual log is below, 2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected error writing to log, retrying attempt 43 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235) at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330) at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270) at org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405) at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46) at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74) at com.sun.proxy.$Proxy20.update(Unknown Source) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516) at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181) ... 2 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2 could only be replicated to 0 nodes instead of minReplication (=1). There are 5 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418) at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) 2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing java.lang.reflect.InvocationTargetException 2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2 could only be replicated to 0 nodes instead of minReplication (=1). There are 5 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418) at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) HDFS web ui info is below, Security is off. Safemode is off. 17461 files and directories, 14873 blocks = 32334 total filesystem object(s). Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB. Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory. Max Non Heap Memory is -1 B. Configured Capacity: 132.43 GB DFS Used: 12.44 GB (9.39%) Non DFS Used: 58.07 GB DFS Remaining: 61.92 GB (46.76%) Block Pool Used: 12.44 GB (9.39%) DataNodes usages% (Min/Median/Max/stdDev): 5.74% / 9.94% / 11.01% / 1.91% Live Nodes 5 (Decommissioned: 0) Dead Nodes 0 (Decommissioned: 0) Decommissioning Nodes 0 Total Datanode Volume Failures 0 (0 B) Number of Under-Replicated Blocks 0 Number of Blocks Pending Deletion 0 Block Deletion Start Time 2017/4/19 11:16:31 Accumulo Configuration is below, config -s table.cache.block.enable=true config -s tserver.memory.maps.native.enabled=true config -s tserver.cache.data.size=1G config -s tserver.cache.index.size=2G config -s tserver.memory.maps.max=2G config -s tserver.client.timeout=5s config -s table.durability=flush config -t accumulo.metadata -d table.durability config -t accumulo.root -d table.durability Accumulo Monitor web ui info is below, Accumulo Overview Disk Used 904.26M % of Used DFS 100.00% Tables 57 Tablet Servers 5 Dead Tablet Servers 0 Tablets 1.86K Entries 22.60M Lookups 35.62M Uptime 28d 3h If there was a similar error in the past, could you tell me fix method. Thanks, Takashi
