I'm talking about the store files size and the ratio between store file
size and the byte count as counted in PutSortReducer.


On Wed, Jan 15, 2014 at 5:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> See previous discussion: http://search-hadoop.com/m/85S3A1DgZHP1
>
>
> On Wed, Jan 15, 2014 at 5:44 AM, Amit Sela <am...@infolinks.com> wrote:
>
> > Hi all,
> > I'm trying to measure the size (in bytes) of the data I'm about to load
> > into HBase.
> > I'm using bulk load with PutSortReducer.
> > All bulk load data is loaded into new regions and not added to existing
> > ones.
> >
> > In order to count the size of all KeyValues in the Put object I iterate
> > over the Put's familyMap.values() and sum the KeyValue lengths.
> > After loading the data, I check the region size by summing the
> > RegionLoad.getStorefileSizeMB().
> > Counting the Put objects size predicted ~500MB per region but in
> practice I
> > got ~32MB per region.
> > the table uses GZ compression but this cannot be the cause of such a
> > difference.
> >
> > Is counting the Put's KeyValues the correct way to count a row size ? Is
> it
> > comparable to the store files size ?
> >
> > Thanks,
> > Amit.
> >
>

Reply via email to