UNOFFICIAL

A correction to my description:

Looking at the Accumulo gui on the 'Table Problems' section there are 8K errors 
stating:

Table                     Problem Type                    Server                
   Time                      Resource                             Exception
Table                     TABLET_LOAD                   host                    
   datetime              resourceUUID                    java.io.IOException: 
....FileNoteFoundException: File does not exist: 
hdfs://system/accumulo/recovery/434adfsdf124312f/failed/data

These seem to correspond to records in the accumulo.metadata table:

Scan -t accumulo.metadata -b ~err

~err_zxn  TABLET_LOAD: ............

From: Dickson, Matt MR <[email protected]>
Sent: Tuesday, 17 December 2019 12:29 PM
To: [email protected]
Subject: ASSIGNED_TO_DEAD_SERVER #walogs:2 [SEC=UNOFFICIAL]

UNOFFICIAL

Hi,

I'm trying to recover from an issue that was caused by the 
table.split.threshold being set to a very low size that then generated a 
massive load on zookeeper and cluster nodes timing out communicating with 
zookeeper while Accumulo was splitting tablets.  This was noticed when tablet 
servers were being declared dead.

I've corrected the threshold and Accumulo is back online however there are 8K 
unhosted tablets that are not coming online.

Running check the checkTablets script produces the exact number of errors as 
there are unhosted tablets with a message like:

4d4;blah::words::4gfv43@(host:9997[23423442344234f23fd],null,null_ is 
ASSIGNED_TO_DEAD_SERVER #walogs:2

I'm not concerned if there is data in these tablets and it is lost in returning 
the system to a healthy state because I suspect other Accumulo operations can't 
proceed while tablets are unhosted so just need to remove these issues.

Any advice would be great.

Thanks in advance,
Matt

Reply via email to