Does the data get distributed if you compact the table?

> On May 23, 2017 at 5:04 AM Massimilian Mattetti <[email protected]> wrote:
> 
>     Hi all,
> 
>     I created a table with 3 initial split-points ( I used a sharding 
> mechanism to evenly distribute the data between them) and started ingesting 
> data using the batch writer API. At the end of the ingestion process I got 
> around 1.01K tablets (threshold for splitting was set to 1GB)  for a total of 
> 600GB of space on HDFS ( measured using the command hadoop fs -du -h on the 
> table directory). Digging into the table directory on HDFS I noticed that 
> there are around 700 tablets (directory starting with t-) that are empty, 
> other 300 tablets that have around 1GB or less of data and 3 tablets 
> (default_tablet included) containing 130 GB of data each one.  Is this a 
> normal behavior? (I am working with a cluster of 3 servers running Accumulo 
> 1.8.1).
> 
>     I ran also another experiment importing the same data on a different 
> table that was configured in the same way of the previous one, but this time 
> using the bulk import. Eventually for this table I did not have empty tablets 
> although most of them contains few MBs, and the final space on HDFS was 
> around 450GB. What can be the reason for such big difference on the space on 
> disk between the batch writer API and bulk import?
>     Thanks.
> 
>     Best Regards,
>     Max
> 

Reply via email to