I have a large amount of data that I am batch loading into accumulo. I'm using mapreduce to read in chunks of data and write out rfiles to be loaded with importdirectory. I've noticed that the import will hang for longer and longer times as more data is added. For instance, one table, which currently has ~2500 tablets, now takes around 2 hours to process the importdirectory.
In poking around in the source for TableOperationsImpl (1.5.0), I see that there is an option to not wait on certain operations (like compact). Would it be dangerous to (optionally) return immediately from importdirectory, and instead check the fail directory to detect errors in the import? I know this will eventually cause a backup in the staging directories, but is there any potential to corrupt the tables? Thanks, Ed
