Re: Question on the number of column families

Ted Yu Wed, 06 Aug 2014 20:39:25 -0700

bq. While scanning, an entire row will be read even for a rowkey filtering

If you specify essential column family in your filter, the above would not
be true - only the essential column family would be loaded into memory
first. Once the filter passes, the other family would be loaded.


Cheers


On Wed, Aug 6, 2014 at 4:00 AM, innowireless TaeYun Kim <
[email protected]> wrote:

> Hi Ted,
>
> Now I finished reading the filtering section and the source code of
> TestJoinedScanners(0.94).
>
> Facts learned:
>
> - While scanning, an entire row will be read even for a rowkey filtering.
> (Since a rowkey is not a physically separate entity and stored in KeyValue
> object, it's natural. Am I right?)
> - The key API for the essential column family support is
> setLoadColumnFamiliesOnDemand().
>
> So, now I have questions:
>
> On rowkey filtering, which column family's KeyValue object is read?
> If HBase just reads a KeyValue from a randomly selected (or just the
> first) column family, how is setLoadColumnFamiliesOnDemand() affected? Can
> HBase select a smaller column family intelligently?
>
> If setLoadColumnFamiliesOnDemand() can be applied to a rowkey filtering, a
> 'dummy' column family can be used to minimize the scan cost.
>
> Thank you.
>
>
> -----Original Message-----
> From: innowireless TaeYun Kim [mailto:[email protected]]
> Sent: Wednesday, August 06, 2014 1:48 PM
> To: [email protected]
> Subject: RE: Question on the number of column families
>
> Thank you.
>
> The 'dummy' column will always hold the value '1' (or even an empty
> string), that only signifies that this row exists. (And the real value is
> in the other 'big' column family) The value is irrelevant since with
> current schema the filtering will be done by rowkey components alone. No
> column value is needed. (I will begin reading the filtering section shortly
> - it is only 6 pages ahead. So sorry for my premature thoughts)
>
>
> -----Original Message-----
> From: Ted Yu [mailto:[email protected]]
> Sent: Wednesday, August 06, 2014 1:38 PM
> To: [email protected]
> Subject: Re: Question on the number of column families
>
> bq. add a 'dummy' column family and apply HBASE-5416 technique
>
> Adding dummy column family is not the way to utilize essential column
> family support - what would this dummy column family hold ?
>
> bq. since I have not read the filtering section of the book I'm reading yet
>
> Once you finish reading, you can look at the unit test
> (TestJoinedScanners) from HBASE-5416. You would understand this feature
> better.
>
> Cheers
>
>
> On Tue, Aug 5, 2014 at 9:21 PM, innowireless TaeYun Kim <
> [email protected]> wrote:
>
> > Thank you all.
> >
> > Facts learned:
> >
> > - Having 130 column families is too much. Don't do that.
> > - While scanning, an entire row will be read for filtering, unless
> > HBASE-5416 technique is applied which makes only relevant column
> > family is loaded. (But it seems that still one can't load just a
> > column needed while
> > scanning)
> > - Big row size is maybe not good.
> >
> > Currently it seems appropriate to follow the one-column solution that
> > Alok Singh suggested, in part since currently there is no reasonable
> > grouping of the fields.
> >
> > Here is my current thinking:
> >
> > - One column family, one column. Field name will be included in rowkey.
> > - Eliminate filtering altogether (in most case) by properly ordering
> > rowkey components.
> > - If a filtering is absolutely needed, add a 'dummy' column family and
> > apply HBASE-5416 technique to minimize disk read, since the field
> > value can be large(~5MB). (This dummy column thing may not be right,
> > I'm not sure, since I have not read the filtering section of the book
> > I'm reading yet)
> >
> > Hope that I am not missing or misunderstanding something...
> > (I'm a total newbie. I've started to read a HBase book since last
> > week...)
> >
> >
> >
> >
> >
> >
>
>

Re: Question on the number of column families

Reply via email to