Does the data get distributed if you compact the table?
> On May 23, 2017 at 5:04 AM Massimilian Mattetti <[email protected]> wrote: > > Hi all, > > I created a table with 3 initial split-points ( I used a sharding > mechanism to evenly distribute the data between them) and started ingesting > data using the batch writer API. At the end of the ingestion process I got > around 1.01K tablets (threshold for splitting was set to 1GB) for a total of > 600GB of space on HDFS ( measured using the command hadoop fs -du -h on the > table directory). Digging into the table directory on HDFS I noticed that > there are around 700 tablets (directory starting with t-) that are empty, > other 300 tablets that have around 1GB or less of data and 3 tablets > (default_tablet included) containing 130 GB of data each one. Is this a > normal behavior? (I am working with a cluster of 3 servers running Accumulo > 1.8.1). > > I ran also another experiment importing the same data on a different > table that was configured in the same way of the previous one, but this time > using the bulk import. Eventually for this table I did not have empty tablets > although most of them contains few MBs, and the final space on HDFS was > around 450GB. What can be the reason for such big difference on the space on > disk between the batch writer API and bulk import? > Thanks. > > Best Regards, > Max >
