RE: All tablets are down

Ed Coleman Wed, 27 Feb 2019 21:14:36 -0800

Are you using the importdirectory command? The importdirecty command examines 
the rfiles and assigns the files to tablets according to the row keys in the 
files.


 

If you want the splits from the original, you can use the getsplits command and 
then set splits on the “new” table using those values (or a subset).  There is 
a -m option if you want fewer splits (say to merge tablets if you have many 
small splits)  If you don’t have the splits (or don’t care) you can allow 
Accumulo to determine new splits on the import, Accumulo will create new splits 
based on the split.threshold parameter for the table (config -t xxxx).  The 
number of splits (tablets) at the start of an import will determine how many 
tservers can participate in the import. If you elect to allow Accumulo to 
determine new splits, start with smaller batches and then you can ramp up as 
there are more splits.

 

One strategy that could help eliminate distcp-ing, the files again.  If you 
have the files on the “target” cluster would be to copy or move the files to 
separate directories, say hdfs://path//batch1, … batch2,… with some fraction of 
the original files in each batch.  Then use importdirectory command for each 
batch.  Import a batch, wait for the ingest to complete and things to settle 
down (watch the master log for splitting, or the UI for a stable number of 
tablets, compactions…) and then import batch2,….batchn – (or you could distcp 
again, especially if the files have been changed by what you just did)

 

You may want to also consider increasing the split size on the “new” table 
while doing this, and then when the imports are done, set it to your desired 
split size.  This will temporarily help limit the splitting and seems, that the 
result may be more evenly split at the end – really depends on how “even” your 
splits are going in.

 

You want to balance the number of files that you import in a batch with the 
number of tablets that you have, that is if you have a new table and no splits, 
don’t just throw all of the files at it with the importdirectory – it will be a 
lot of work for that one tserver.  If you start with a limited number of splits 
and then give it enough files to create some splits, you can then increase the 
batch size accordingly.

 

If you have the files on the “new” system and if they are not shared by other 
tablets, you can – 

 

-          take the current table offline.

-          Use hdfs move to relocate the files from under the 
hdfs://accumlo_ns/tables/[table_id]/[dir]/*.rf to the batch directories.  If 
you do this, you probably want to batch the number files you give hdfs at a 
time (500…750) to be friendly to the name node.  You can place multiple groups 
of the 500..750 into a directory – say 10,000 files (again assuming that you 
have enough splits already, otherwise start with a smaller number) If you have 
“cloned” a table, multiple tables can reference the same file – if this is your 
situation Do Not move the files – copy them, and then compact the other tables, 
otherwise you risk corrupting the metadata and having it point to non-existent 
files.

-          Use the importdirectory command  for each batch directory

-          Once the files have been moved, you can delete the original, source 
table – the delete command will examine the metadata and as long as no other 
tables have references to the files, it just deletes the directory.  

 

Another approach could be the importtable command. Importtable uses the files 
from an exporttable command to set the metadata and configuration and assigns 
the files to the same assignment.  This could be faster, but there is less 
validation done on the import (basically none) – it is trusting that the 
exporttable files represent absolute truth) – being that you are coming from a 
system with metadata issues, I would not recommend this command for this 
situation.

 

Ed Coleman

 

From: pranav.puri [mailto:pranav.p...@orkash.com] 
Sent: Wednesday, February 27, 2019 11:27 PM
To: user@accumulo.apache.org
Subject: Re: All tablets are down

 

Hi

The accumulo version is 1.8.1. I have tried copying the rf files. 

Here are the steps that I followed :

1. create a table of same name in the new accumulo cluster(v1.7)

2. the distcp the folder corresponding to the table to the new accumulo 
instance(1.7). The folder association with table was checked through table -l 
command on old accumulo cluster.

3. The files are replaced for the table in the new accumulo cluster.

4. When I did this for a small table it worked(when the associated folder had 
just the default folder and no splits). But when it was done for a large table 
which consisted many splits other than the default folder, I was unable to 
import that data.

I have gone through the link that you've sent. And the above steps are pretty 
similar.

Can you please explain this further:

- If you have data you need to keep, you can bulk import the existing RFiles 
into a new instance if you can associate the files to a table.  Having the old 
splits would be helpful but is not necessary.

Here I can associate rfiles with table but how can we import table with splits. 
ALso we have the metadata table intact can we use that.

thanks

Pranav

On 27/02/19 7:29 PM, Michael Wall wrote:

Pranav, 

 

Having a corrupt root table and replacing with with an empty empty file means 
Accumulo does not not anything about the metadata table.  Without the metadata 
table, Accumulo know nothing about the other tables.

 

If you can do so, I suggest starting over.  If you have data you need to keep, 
you can bulk import the existing RFiles into a new instance if you can 
associate the files to a table.  Having the old splits would be helpful but is 
not necessary.

 

What version of Accumulo are you using?  Take a look at 
https://accumulo.apache.org/1.9/accumulo_user_manual.html#_advanced_system_recovery.

 

It would be good to understand what led to the corruption in the accumulo.root 
table.

 

Mike

 

On Wed, Feb 27, 2019 at 7:27 AM pranav.puri <pranav.p...@orkash.com 
<mailto:pranav.p...@orkash.com> > wrote:

Hi,

I have setup accumulo on a two node hadoop HA cluster. In this, the rf file 
kept in +r has been corrupted(rf file related to the accumulo.root table). This 
was checked through fsck command. And due to this the accumulo was not starting 
up.

As a troubleshooting step, I removed this rf file and replaced with an empty 
file of same name. After this the accumulo is starting up and all the tabes are 
online but there are no entries present for these table. No tablets are there 
but one. And also I am able to scan just the root table probably because I have 
replaced it with the new file.Also if I try to scan any other table the shell 
hangs.

Please let me know how to handle this. Also, please mention if any other 
details are required.

Regards
Pranav

RE: All tablets are down

Reply via email to