Asaf, I am using the Genre/Author stuff as an example but yes at the moment I only have 5 column families. However, over time I may have more (no upper limit decided that this point). See below for more responses
On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika <[email protected]> wrote: > Do you have only 5 static author names? > Keep in mind the column family name is defined when creating the table. > > Regarding tall vs wide debate: > HBase is first and for most a Key Value database thus reads and writes in > the column-value level. So it doesn't really care about rows. > But it's not entirely true. Rows come into play in the following > situations: > Splitting a region is per row and not per column, thus a row will be saved > as a whole on a region. If you have a really large row, the region size > granularity is dependent on it. It doesn't seem to be the case here. > Put/Delete creates a lock until finished. If you are intensive on inserts > to the same row at the same time, thus might be bad for you, keeping your > rows slimmer can reduce contention, but again, only if you make a lot > concurrent modifications to the same row. > I expect batches of Put/Delete to the same row to happen by at most one thread at a time based on user's current behavior. So locking shouldn't be an issue. However, not sure if the saving row to a region with enough space topic is really an issue I need to worry about (probably because I just don't know much about it yet). > Filtering - if you need a filter which need all the row (there is a method > you override in Filter to mark that) than a far row will be more memory > intensive. If you needed only 1/5 of your row, than maybe splitting it to 5 > rows to begin with would have made a better schema design in terms of > memory and I/O. > Currently, my access pattern is to get all data for a given row. Its possible in the future we may want to apply (family/qualifier) filters. There is a lot of uncertainty on use cases (client side) at this point which is why I am not entirely sure on how things will look months from now. I am not sure I follow this statement "if you need a filter which need all the row (there is a method you override in Filter to mark that) than a far row will be more memory intensive." Can you please explain? Thank you for these suggestions btw, good food for thought! > > On Wednesday, July 3, 2013, Aji Janis wrote: > > > I have a major typo in the question so I apologize. I meant to say 5 > > families with 1000+ qualifiers each. > > > > Lets work with an example, (not the greatest example here but still). > Lets > > say we have a Genre Class like this: > > > > Class HistoryBooks{ > > > > ArrayList<Books> author1; > > ArrayList<Books> author2; > > ArrayList<Books> author3; > > ArrayList<Books> author4; > > ArrayList<Books> author5; > > > > ...} > > > > Each author is a column family (lets say we only allow 5 authors per > > <T>Book class. Book per author ends up being the qualifier. In this > case, I > > know I have a max family count but my qualifiers have no upper limit. So > is > > this scenario a case for tall or wide table? Why? Thank you. > > > > > > On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault > > <[email protected] <javascript:;>>wrote: > > > > > If they are accessed mostly together they should all be a single column > > > family. The key with tall or wide is based on the total byte size of > each > > > KeyValue. Your cells would need to be quite large for 50 to become a > > > problem. I still would recommend using a single CF though. > > > — > > > Sent from iPhone >
