> That's an entirely fair question.  I'm new to this.  I figured if the data
> was related to the same thing and could have the same key, then it ought to
> go into various CFs on that key in a single table.  I got the feeling from
> reading the BigTable paper that the typical design approach was to dump lots
> of CFs into a table.  It seems like that's not the HBase-way, though.

My interpretation is that they try to keep that number low, from page 2:

"It is our intent that the number of distinct column families in a
table be small (in the hundreds at most)"

>
> For the most part it's not a big deal to store the data in separate tables.
>  However, I'm curious what you'd recommend for one particular part of it.
>  Specifically I'd like to store actions within a web visit.  I've been
> planning to store individual actions as columns in their own column family,
> keyed by something like [timestamp, action details, session ID].  In another
> column family I'd been planning on storing statistics about the actions,
> such as first time, end time, count, etc.  When writing to the actions CF,
> I'd need to read from and possibly update the stats CF.  Would your
> recommendation be to store that kind of data in the same CF, two CFs in the
> same table, or in two separate tables?

Could you just store that in the same family?

>
> My thought was that I could use row locking to avoid races to update the
> stats after inserting into actions if I took the two CF approach.

Row locking is rarely a good idea, it doesn't scale and they currently
aren't persisted anywhere except the RS memory (so if it dies...).
Using a single family might be better for you.

J-D

Reply via email to