If the row Key is just the customer ID, then a simple MD5 hash or SHA-1 hash would suffice. That would clear up any risk of hot spotting, once you do your initial load of data.
And that's probably a key point... hot spotting when you're first loading a very large table is really a moot point. It may be painful, but the pain lasts for less than an hour. On Nov 26, 2012, at 4:28 AM, Mohammad Tariq <donta...@gmail.com> wrote: > Hello sir, > > You might become a victim of RS hotspotting, since the cutomerIDs will > be sequential(I assume). To keep things simple Hbase puts all the rows with > similar keys to the same RS. But, it becomes a bottleneck in the long run > as all the data keeps on going to the same region. > > HTH > > Regards, > Mohammad Tariq > > > > On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan < > ramasubramanian.naraya...@gmail.com> wrote: > >> Hi, >> Thanks! Can we have the customer number as the RowKey for the customer >> (client) master table? Please help in educating me on the advantage and >> disadvantage of having customer number as the Row key... >> >> Also SCD2 we may need to implement in that table.. will it work if I have >> like that? >> >> Or >> >> SCD2 is not needed instead we can achieve the same by increasing the >> version number that it will hold? >> >> pls suggest... >> >> regards, >> Rams >> >> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <m...@microstrategy.com> wrote: >> >>> When 1 cf need to do split, other 599 cfs will split at the same time. So >>> many fragments will be produced when you use so many column families. >>> Actually, many cfs can be merge to only one cf with specific tags in >>> rowkey. For example, rowkey of customer address can be uid+'AD', and >>> customer profile can be uid+'PR'. >>> >>> Min >>> -----Original Message----- >>> From: Ramasubramanian Narayanan [mailto: >>> ramasubramanian.naraya...@gmail.com] >>> Sent: Monday, November 26, 2012 3:05 PM >>> To: user@hbase.apache.org >>> Subject: Expert suggestion needed to create table in Hbase - Banking >>> >>> Hi, >>> >>> I have a requirement of physicalising the logical model... I have a >>> client model which has 600+ entities... >>> >>> Need suggestion how to go about physicalising it... >>> >>> I have few other doubts : >>> 1) Whether is it good to create a single table for all the 600+ >> columns? >>> 2) To have different column families for different groups or can it be >>> under a single column family? For example, customer address can we have >> as >>> a different column family? >>> >>> Please help on this.. >>> >>> >>> regards, >>> Rams >>> >>