Hello Accumulo folks, Sending this to the dev list to see if anyone else had any thoughts. I created this PR as a fix for a bad situation with merging minor compactions in version 1.10. https://github.com/apache/accumulo/pull/2708
Here is the situation where a Tablet couldn't flush... The tserver hosting the hot spot Tablet was hitting its max WAL limit (TABLE_MINC_LOGS_MAX) so it was forcing a flush on the Tablet. The client (TabletServerBatchWriter) would try to flush its data by calling applyUpdates() to the current commit session but the user is seeing the HoldTimeoutException on the client side. The flush will timeout on the tserver, presumably due to hitting max number of write threads and/or connection pools filling up. The WALs keep growing due to the Tablet not flushing. Major compactions will complete but the hot spot Tablet will get stuck trying to flush. The simple quick fix is to restart the tablet server hosting the hot spot Tablet. But this won't prevent the situation from happening again. FYI these troublesome flushes (M files) have been removed in version 2.1.