Hi Takashi,

Accumulo TabletServers, by default, create WALs with a size of ~1GB (think, pre-allocate the file). The error you show often comes because a Datanode cannot actually allocate that much space given its reserved space threshold. See dfs.datanode.du.reserved in hdfs-site.xml

To help in confirming the problem, you can try to temporarily reduce tserver.walog.max.size from 1G to 128M (or similar).

I'd recommend you take a look at the Datanode logs. You might get a clue.

- Josh

Takashi Sasaki wrote:
Hello,

We encountered some error on Accumulo 1.7.2.
The error seems to be HDFS replication issue, but HDFS is not full.

Actual log is below,
2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
error writing to log, retrying attempt 43
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at 
org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
   at 
org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
   at 
org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
   at 
org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
   at 
org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
   at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
   at com.sun.proxy.$Proxy20.update(Unknown Source)
   at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
   at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
   at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
   at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
   at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
   ... 2 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
   at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
   at org.apache.hadoop.ipc.Client.call(Client.java:1475)
   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
   at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
java.lang.reflect.InvocationTargetException
2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
   at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
   at org.apache.hadoop.ipc.Client.call(Client.java:1475)
   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
   at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
   at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
   at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

HDFS web ui info is below,
  Security is off.
  Safemode is off.

  17461 files and directories, 14873 blocks = 32334 total filesystem object(s).
  Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB.
  Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
Max Non Heap Memory is -1 B.

  Configured Capacity: 132.43 GB
  DFS Used: 12.44 GB (9.39%)
  Non DFS Used: 58.07 GB
  DFS Remaining: 61.92 GB (46.76%)
  Block Pool Used: 12.44 GB (9.39%)
  DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% / 1.91%
  Live Nodes 5 (Decommissioned: 0)
  Dead Nodes 0 (Decommissioned: 0)
  Decommissioning Nodes 0
  Total Datanode Volume Failures 0 (0 B)
  Number of Under-Replicated Blocks 0
  Number of Blocks Pending Deletion 0
  Block Deletion Start Time 2017/4/19 11:16:31

Accumulo Configuration is below,
  config -s table.cache.block.enable=true
  config -s tserver.memory.maps.native.enabled=true
  config -s tserver.cache.data.size=1G
  config -s tserver.cache.index.size=2G
  config -s tserver.memory.maps.max=2G
  config -s tserver.client.timeout=5s
  config -s table.durability=flush
  config -t accumulo.metadata -d table.durability
  config -t accumulo.root -d table.durability

Accumulo Monitor web ui info is below,
  Accumulo Overview
  Disk Used 904.26M
  % of Used DFS 100.00%
  Tables 57
  Tablet Servers 5
  Dead Tablet Servers 0
  Tablets 1.86K
  Entries 22.60M
  Lookups 35.62M
  Uptime 28d 3h

If there was a similar error in the past, could you tell me fix method.

Thanks,
Takashi

Reply via email to