Hi,

We are starting to investigate an issue where 1 tserver was up, but became 
slow/unresponsive for several hours, yet all writes to our 20+ servers began to 
fail. We could see leading up to the failure that the writes were distributed 
among all of the tablet servers, so it wasn't a hotspot. Whenever we receive a 
MutationsRejectedException, we recreate the BatchWriter (ACCUMULO-2990). I'm 
digging into the TabletServerBatchWriter code, but any ideas what could cause 
this issue? Is there some sort of initialization or healthchecking that the 
client does where 1 server could impact all?

Thanks.

-Mike

Caused by: org.apache.accumulo.core.client.TimedOutException: Servers timed out 
[pnj-bvlt-r4n03.abc.com:31113] at 
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$TimeoutTracker.wroteNothing(TabletServerBatchWriter.java:177)
 ~[stormjar.jar:1.0] at 
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$TimeoutTracker.errorOccured(TabletServerBatchWriter.java:182)
 ~[stormjar.jar:1.0] at 
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:933)
 ~[stormjar.jar:1.0] at 

Reply via email to