Are you using the importdirectory command? The importdirecty command examines the rfiles and assigns the files to tablets according to the row keys in the files.
If you want the splits from the original, you can use the getsplits command and then set splits on the “new” table using those values (or a subset). There is a -m option if you want fewer splits (say to merge tablets if you have many small splits) If you don’t have the splits (or don’t care) you can allow Accumulo to determine new splits on the import, Accumulo will create new splits based on the split.threshold parameter for the table (config -t xxxx). The number of splits (tablets) at the start of an import will determine how many tservers can participate in the import. If you elect to allow Accumulo to determine new splits, start with smaller batches and then you can ramp up as there are more splits. One strategy that could help eliminate distcp-ing, the files again. If you have the files on the “target” cluster would be to copy or move the files to separate directories, say hdfs://path//batch1, … batch2,… with some fraction of the original files in each batch. Then use importdirectory command for each batch. Import a batch, wait for the ingest to complete and things to settle down (watch the master log for splitting, or the UI for a stable number of tablets, compactions…) and then import batch2,….batchn – (or you could distcp again, especially if the files have been changed by what you just did) You may want to also consider increasing the split size on the “new” table while doing this, and then when the imports are done, set it to your desired split size. This will temporarily help limit the splitting and seems, that the result may be more evenly split at the end – really depends on how “even” your splits are going in. You want to balance the number of files that you import in a batch with the number of tablets that you have, that is if you have a new table and no splits, don’t just throw all of the files at it with the importdirectory – it will be a lot of work for that one tserver. If you start with a limited number of splits and then give it enough files to create some splits, you can then increase the batch size accordingly. If you have the files on the “new” system and if they are not shared by other tablets, you can – - take the current table offline. - Use hdfs move to relocate the files from under the hdfs://accumlo_ns/tables/[table_id]/[dir]/*.rf to the batch directories. If you do this, you probably want to batch the number files you give hdfs at a time (500…750) to be friendly to the name node. You can place multiple groups of the 500..750 into a directory – say 10,000 files (again assuming that you have enough splits already, otherwise start with a smaller number) If you have “cloned” a table, multiple tables can reference the same file – if this is your situation Do Not move the files – copy them, and then compact the other tables, otherwise you risk corrupting the metadata and having it point to non-existent files. - Use the importdirectory command for each batch directory - Once the files have been moved, you can delete the original, source table – the delete command will examine the metadata and as long as no other tables have references to the files, it just deletes the directory. Another approach could be the importtable command. Importtable uses the files from an exporttable command to set the metadata and configuration and assigns the files to the same assignment. This could be faster, but there is less validation done on the import (basically none) – it is trusting that the exporttable files represent absolute truth) – being that you are coming from a system with metadata issues, I would not recommend this command for this situation. Ed Coleman From: pranav.puri [mailto:pranav.p...@orkash.com] Sent: Wednesday, February 27, 2019 11:27 PM To: user@accumulo.apache.org Subject: Re: All tablets are down Hi The accumulo version is 1.8.1. I have tried copying the rf files. Here are the steps that I followed : 1. create a table of same name in the new accumulo cluster(v1.7) 2. the distcp the folder corresponding to the table to the new accumulo instance(1.7). The folder association with table was checked through table -l command on old accumulo cluster. 3. The files are replaced for the table in the new accumulo cluster. 4. When I did this for a small table it worked(when the associated folder had just the default folder and no splits). But when it was done for a large table which consisted many splits other than the default folder, I was unable to import that data. I have gone through the link that you've sent. And the above steps are pretty similar. Can you please explain this further: - If you have data you need to keep, you can bulk import the existing RFiles into a new instance if you can associate the files to a table. Having the old splits would be helpful but is not necessary. Here I can associate rfiles with table but how can we import table with splits. ALso we have the metadata table intact can we use that. thanks Pranav On 27/02/19 7:29 PM, Michael Wall wrote: Pranav, Having a corrupt root table and replacing with with an empty empty file means Accumulo does not not anything about the metadata table. Without the metadata table, Accumulo know nothing about the other tables. If you can do so, I suggest starting over. If you have data you need to keep, you can bulk import the existing RFiles into a new instance if you can associate the files to a table. Having the old splits would be helpful but is not necessary. What version of Accumulo are you using? Take a look at https://accumulo.apache.org/1.9/accumulo_user_manual.html#_advanced_system_recovery. It would be good to understand what led to the corruption in the accumulo.root table. Mike On Wed, Feb 27, 2019 at 7:27 AM pranav.puri <pranav.p...@orkash.com <mailto:pranav.p...@orkash.com> > wrote: Hi, I have setup accumulo on a two node hadoop HA cluster. In this, the rf file kept in +r has been corrupted(rf file related to the accumulo.root table). This was checked through fsck command. And due to this the accumulo was not starting up. As a troubleshooting step, I removed this rf file and replaced with an empty file of same name. After this the accumulo is starting up and all the tabes are online but there are no entries present for these table. No tablets are there but one. And also I am able to scan just the root table probably because I have replaced it with the new file.Also if I try to scan any other table the shell hangs. Please let me know how to handle this. Also, please mention if any other details are required. Regards Pranav