About a million rows per day per table.

Is there a disadvantage to create more tables?

On 8/21/11 10:49 AM, Sonal Goyal wrote:
If your data size is big enough to warrant 3 tables, go for it. This would
be the case where there are really lots of entries for user#type.

Best Regards,
Sonal
Crux: Reporting for HBase<https://github.com/sonalgoyal/crux>
Nube Technologies<http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Sun, Aug 21, 2011 at 11:09 PM, Mark<static.void....@gmail.com>  wrote:

Almost all use cases require type.. ie

Retrieve all searches performed by user 'foo':  scan "history", {STARTROW
=>  "search/foo"}
Retrieve all product views performed by user 'foo': scan "history",
{STARTROW =>  "view/foo"}


On 8/21/11 10:25 AM, Sonal Goyal wrote:

Hi Mark,

When you say that your use case does not require searching across multiple
types, what do you mean? Do you have cases when you search with type?

Best Regards,
Sonal
Crux: Reporting for 
HBase<https://github.com/**sonalgoyal/crux<https://github.com/sonalgoyal/crux>
Nube Technologies<http://www.**nubetech.co<http://www.nubetech.co>>

<http://in.linkedin.com/in/**sonalgoyal<http://in.linkedin.com/in/sonalgoyal>





On Sun, Aug 21, 2011 at 9:29 PM, Mark<static.void....@gmail.com**>
  wrote:

  We are logging all user actions into hbase. These actions include
searches,
product views and clicks.

We are currently storing them in one table with row keys like so:
"#{type}/#{user}/#{time}", where type is either click, search, view and
user
is the current user logged in. Obviously using this method lead to region
hot spotting as the start of each key is fairly static. This got me to
thinking on what alternatives ways I could model this type of data and I
was
hoping I could get some suggestions from the community.

Which would be more advisable?

1) Keep the current all logs go to one table pattern that is describe
above.
2) Keep the current all logs go to one table pattern that is describe
above
but switch the type and user fields which would lead to more randomized
keys
thus reducing hot spots
3) Create separate tables for each type of log we are saving... ie have
search table, click table, view table.

Our use case does not require us searching across multiple types so I'm
leaning towards #3 now but I was wondering if there were any cons to
using
this method? Is it worse to have more tables than less?

Thanks for help

-M






Reply via email to