Re: Question on the number of column families

Qiang Tian Wed, 06 Aug 2014 22:37:21 -0700

Hi TaeYun,
thanks for explain.




On Thu, Aug 7, 2014 at 12:50 PM, innowireless TaeYun Kim <
taeyun....@innowireless.co.kr> wrote:

> Hi Qiang,
> thank you for your help.
>
> 1. Regarding HBASE-5416, I think it's purpose is simple.
>
> "Avoid loading column families that is irrelevant to filtering while
> scanning."
> So, it can be applied to my 'dummy CF' case.
> That is, a dummy CF can act like an 'relevant' CF to filtering, provided
> that HBase can select it while applying a rowkey filter, since a dummy CF
> has the rowkey data in its 'dummy' KeyValue object.
>
> 2. About rowkey.
>
> What I meant is, I would include the field name as a component when the
> byte array for a rowkey is constructed.
>
> 3. About read-only-ness and the number of CF.
>
> Thank you for your suggestion.
> But since MemStore and BlockCache is separately managed on each column
> family, I'm a little concerned with the memory footprint.
>
> Thank you.
>
> -----Original Message-----
> From: Qiang Tian [mailto:tian...@gmail.com]
> Sent: Thursday, August 07, 2014 11:43 AM
> To: user@hbase.apache.org
> Subject: Re: Question on the number of column families
>
> Hi,
> the description of hbase-5416 stated why it was introduced, if you only
> have 1 CF, dummy CF does not help. it is helpful for multi-CF case, e.g.
> "putting them in one column family. And "Non frequently" ones in another. "
>
> bq. "Field name will be included in rowkey."
> Please read the chapter 9 "Advanced usage" in book "HBase Definitive Guide"
> about how hbase store data on disk and how to design rowkey based on
> specific scenario.(rowkey is the only index you can use, so take care)
>
> bq. "The table is read-only. It is bulk-loaded once. When a new data is
> ready, A new table is created and the old table is deleted."
> the scenario is quite different.  as hbase is designed for random
> read/write.  the limitation described at
> http://hbase.apache.org/book/number.of.cfs.html is to consider the write
> case(flush&compaction), perhaps you could try 140 CFs, as long as you can
> presplit your regions well? after that,  since no write, there will be no
> flush/compaction...anyway, any idea better be tested with your real data.
>
>
>
>
>
>
>
>
> On Wed, Aug 6, 2014 at 7:00 PM, innowireless TaeYun Kim <
> taeyun....@innowireless.co.kr> wrote:
>
> > Hi Ted,
> >
> > Now I finished reading the filtering section and the source code of
> > TestJoinedScanners(0.94).
> >
> > Facts learned:
> >
> > - While scanning, an entire row will be read even for a rowkey filtering.
> > (Since a rowkey is not a physically separate entity and stored in
> > KeyValue object, it's natural. Am I right?)
> > - The key API for the essential column family support is
> > setLoadColumnFamiliesOnDemand().
> >
> > So, now I have questions:
> >
> > On rowkey filtering, which column family's KeyValue object is read?
> > If HBase just reads a KeyValue from a randomly selected (or just the
> > first) column family, how is setLoadColumnFamiliesOnDemand() affected?
> > Can HBase select a smaller column family intelligently?
> >
> > If setLoadColumnFamiliesOnDemand() can be applied to a rowkey
> > filtering, a 'dummy' column family can be used to minimize the scan cost.
> >
> > Thank you.
> >
> >
> > -----Original Message-----
> > From: innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr]
> > Sent: Wednesday, August 06, 2014 1:48 PM
> > To: user@hbase.apache.org
> > Subject: RE: Question on the number of column families
> >
> > Thank you.
> >
> > The 'dummy' column will always hold the value '1' (or even an empty
> > string), that only signifies that this row exists. (And the real value
> > is in the other 'big' column family) The value is irrelevant since
> > with current schema the filtering will be done by rowkey components
> > alone. No column value is needed. (I will begin reading the filtering
> > section shortly
> > - it is only 6 pages ahead. So sorry for my premature thoughts)
> >
> >
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Wednesday, August 06, 2014 1:38 PM
> > To: user@hbase.apache.org
> > Subject: Re: Question on the number of column families
> >
> > bq. add a 'dummy' column family and apply HBASE-5416 technique
> >
> > Adding dummy column family is not the way to utilize essential column
> > family support - what would this dummy column family hold ?
> >
> > bq. since I have not read the filtering section of the book I'm
> > reading yet
> >
> > Once you finish reading, you can look at the unit test
> > (TestJoinedScanners) from HBASE-5416. You would understand this
> > feature better.
> >
> > Cheers
> >
> >
> > On Tue, Aug 5, 2014 at 9:21 PM, innowireless TaeYun Kim <
> > taeyun....@innowireless.co.kr> wrote:
> >
> > > Thank you all.
> > >
> > > Facts learned:
> > >
> > > - Having 130 column families is too much. Don't do that.
> > > - While scanning, an entire row will be read for filtering, unless
> > > HBASE-5416 technique is applied which makes only relevant column
> > > family is loaded. (But it seems that still one can't load just a
> > > column needed while
> > > scanning)
> > > - Big row size is maybe not good.
> > >
> > > Currently it seems appropriate to follow the one-column solution
> > > that Alok Singh suggested, in part since currently there is no
> > > reasonable grouping of the fields.
> > >
> > > Here is my current thinking:
> > >
> > > - One column family, one column. Field name will be included in rowkey.
> > > - Eliminate filtering altogether (in most case) by properly ordering
> > > rowkey components.
> > > - If a filtering is absolutely needed, add a 'dummy' column family
> > > and apply HBASE-5416 technique to minimize disk read, since the
> > > field value can be large(~5MB). (This dummy column thing may not be
> > > right, I'm not sure, since I have not read the filtering section of
> > > the book I'm reading yet)
> > >
> > > Hope that I am not missing or misunderstanding something...
> > > (I'm a total newbie. I've started to read a HBase book since last
> > > week...)
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
>
>

Re: Question on the number of column families

Reply via email to