Hi, I have a question about using bulk ingestion for a rather special case. Let's say that I have the locality groups A and B. The values of each locality group are written to Accumulo in at different times, which means that first we ingest all the cells of the group A and then of B. We use Spark to ingest those records. Right now we write all the values with a custom writer but we would like to create the rfiles directly with Spark. In the case above, we would have two jobs creating the rfiles for the two distinct locality groups. Is Accumulo able to import these files, considering that they are two different locality groups, without triggering a huge major compaction? If not, what strategy would you suggest for the above use case?
Thanks, Mario -- Mario Pastorelli | TERALYTICS *software engineer* Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland phone: +41794381682 email: [email protected] www.teralytics.net Company registration number: CH-020.3.037.709-7 | Trade register Canton Zurich Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann de Vries This e-mail message contains confidential information which is for the sole attention and use of the intended recipient. Please notify us at once if you think that it may not be intended for you and delete it immediately.
