Wide area :-) I agree with Michael, perhaps the best explanation could be to explicit *WHEN* adding extra CF perfectly makes sense.
On Tue, Apr 16, 2013 at 4:35 PM, Michael Segel <michael_se...@hotmail.com>wrote: > I think the important thing about Column Families is trying to understand > on how to use them properly in a design. > > Sparse data may make sense. It depends on the use case and an > understanding of the trade offs. > > It all depends on how the data breaks down in to specific use cases. > > Keeping CFs to a minimum makes sense. However, what that minimum remains > to be seen. > > It depends.... > > > On Apr 16, 2013, at 9:08 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > bq. Maybe we can explain why there is some impacts, or what to consider? > > > > The above would be covered in the JIRA. > > > > Thanks > > > > On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > >> Can we add more details than just changing the maximum CF number? Maybe > we > >> can explain why there is some impacts, or what to consider? > >> > >> JM > >> > >> 2013/4/16 Ted Yu <yuzhih...@gmail.com> > >> > >>> If there is no objection, I will create a JIRA to increase the maximum > >>> number of column families described here: > >>> > >>> http://hbase.apache.org/book.html#number.of.cfs > >>> > >>> Cheers > >>> > >>> On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil < > doug.m...@explorysmedical.com > >>>> wrote: > >>> > >>>> > >>>> > >>>> For the record, the refGuide mentions potential issues of CF lumpiness > >>>> that you mentioned: > >>>> > >>>> http://hbase.apache.org/book.html#number.of.cfs > >>>> > >>>> > >>>> 6.2.1. Cardinality of ColumnFamilies > >>>> > >>>> Where multiple ColumnFamilies exist in a single table, be aware of the > >>>> cardinality (i.e., number of rows). > >>>> If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 > >> billion > >>>> rows, ColumnFamilyA's data will likely be spread > >>>> across many, many regions (and RegionServers). This makes mass > >>>> scans for ColumnFamilyA less efficient. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Š. anything that needs to be updated/added for this? > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On 4/8/13 12:39 AM, "lars hofhansl" <la...@apache.org> wrote: > >>>> > >>>>> I think the main problem is that all CFs have to be flushed if one > >> gets > >>>>> large enough to require a flush. > >>>>> (Does anyone remember why exactly that is? And do we still need that > >> now > >>>>> that the memstoreTS is stored in the HFiles?) > >>>>> > >>>>> > >>>>> So things are fine as long as all CFs have roughly the same size. But > >> if > >>>>> you have one that gets a lot of data and many others that are > smaller, > >>>>> we'd end up with a lot of unnecessary and small store files from the > >>>>> smaller CFs. > >>>>> > >>>>> Anything else known that is bad about many column families? > >>>>> > >>>>> > >>>>> -- Lars > >>>>> > >>>>> > >>>>> > >>>>> ________________________________ > >>>>> From: Andrew Purtell <apurt...@apache.org> > >>>>> To: "user@hbase.apache.org" <user@hbase.apache.org> > >>>>> Sent: Sunday, April 7, 2013 3:52 PM > >>>>> Subject: Re: schema design: rows vs wide columns > >>>>> > >>>>> Is there a pointer to evidence/experiment backed analysis of this > >>>>> question? > >>>>> I'm sure there is some basis for this text in the book but I > recommend > >>> we > >>>>> strike it. We could replace it with YCSB or LoadTestTool driven > >> latency > >>>>> graphs for different workloads maybe. Although that would also be a > >> big > >>>>> simplification of 'schema design' considerations, it would not be so > >>>>> starkly lacking background. > >>>>> > >>>>> On Sunday, April 7, 2013, Ted Yu wrote: > >>>>> > >>>>>> From http://hbase.apache.org/book.html#number.of.cfs : > >>>>>> > >>>>>> HBase currently does not do well with anything above two or three > >>> column > >>>>>> families so keep the number of column families in your schema low. > >>>>>> > >>>>>> Cheers > >>>>>> > >>>>>> On Sun, Apr 7, 2013 at 3:04 PM, Stack <st...@duboce.net > >> <javascript:;>> > >>>>>> wrote: > >>>>>> > >>>>>>> On Sun, Apr 7, 2013 at 11:58 AM, Ted <yuzhih...@gmail.com > >>>>>> <javascript:;>> > >>>>>> wrote: > >>>>>>> > >>>>>>>> With regard to number of column families, 3 is the recommended > >>>>>> maximum. > >>>>>>>> > >>>>>>> > >>>>>>> How did you come up w/ the number '3'? Is it a 'hard' 3? Or does > >> it > >>>>>>> depend? If the latter, on what does it depend? > >>>>>>> Thanks, > >>>>>>> St.Ack > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> > >>>>> - Andy > >>>>> > >>>>> Problems worthy of attack prove their worth by hitting back. - Piet > >> Hein > >>>>> (via Tom White) > >>>> > >>>> > >>>> > >>>> > >>> > >> > > -- Adrien Mogenet http://www.borntosegfault.com