[ https://issues.apache.org/jira/browse/ACCUMULO-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Elser updated ACCUMULO-3853: --------------------------------- Status: Reopened (was: Reopened) > Contention around ConcurrentLinkedQueue.size() in AsyncSpanReceiver > ------------------------------------------------------------------- > > Key: ACCUMULO-3853 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3853 > Project: Accumulo > Issue Type: Bug > Components: trace > Affects Versions: 1.7.0 > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 1.8.0, 1.7.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Trying to debug slowness in the replication merkle test, I saw a lot of stack > traces sitting here: > {noformat} > "replication task 1" #297 daemon prio=5 os_prio=0 tid=0x0000000004624000 > nid=0x6b46 runnable [0x00007ff8867b6000] > java.lang.Thread.State: RUNNABLE > at > java.util.concurrent.ConcurrentLinkedQueue.size(ConcurrentLinkedQueue.java:450) > at > org.apache.accumulo.tracer.AsyncSpanReceiver.receiveSpan(AsyncSpanReceiver.java:171) > at org.apache.htrace.Tracer.deliver(Tracer.java:81) > at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177) > - locked <0x00000000a7fe3560> (a org.apache.htrace.impl.MilliSpan) > at org.apache.htrace.TraceScope.close(TraceScope.java:78) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898) > - locked <0x00000000b0086118> (a > org.apache.hadoop.hdfs.DFSInputStream) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at > org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:965) > at > org.apache.accumulo.server.data.ServerMutation.readFields(ServerMutation.java:52) > at > org.apache.accumulo.tserver.logger.LogFileValue.readFields(LogFileValue.java:45) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.getWalEdits(AccumuloReplicaSystem.java:702) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(AccumuloReplicaSystem.java:531) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(AccumuloReplicaSystem.java:506) > at > org.apache.accumulo.core.client.impl.ReplicationClient.executeServicerWithReturn(ReplicationClient.java:189) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicateLogs(AccumuloReplicaSystem.java:429) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem._replicate(AccumuloReplicaSystem.java:291) > at > org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicate(AccumuloReplicaSystem.java:212) > at > org.apache.accumulo.tserver.replication.ReplicationProcessor.process(ReplicationProcessor.java:134) > at > org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:107) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000000b00dbf48> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > Seems like this might still be related to HDFS-8069, but one odd thing is > that we were sitting in the size() method. > For those who don't remember, size() is a big o N operation on > ConcurrentLinkedQueue. Calling the size() method every time we receive a new > span is a little nasty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)