Sequential writes are also an argument for pre-splitting and using hash
prefixing. In other words, presplit your table into N regions instead of
the default of 1 & transform your keys into:
new_key = md5(old_key) + old_key
Using this method your sequential writes under the old_key are now spread
Mark,
Yes, your understanding is correct. If your keys are sequential (timestamps
etc), you will always be writing to the end of the table and "older"
regions will not get any writes. This is one of the arguments against using
sequential keys.
-ak
On Sun, Nov 20, 2011 at 11:33 AM, Mark wrote:
I run HRegionServer whith program arguments that is start in eclipse.
2011-11-21 09:35:12,384 WARN [main]
regionserver.HRegionServerCommandLine(56): Not starting a distinct region
server because hbase.cluster.distributed is false
but the following contents in $HBAE_HOME/conf/hbase-site.xml :
Mark,
This is an interesting discussion and like Michel said - the answer to your
question depends on what you are trying to achieve. However, here are the
points that I would think about:
What are the access patters of the various buckets of data that you want to
put in HBase? For instance, woul
Thanks for the info.
On 11/20/11 11:30 AM, lars hofhansl wrote:
There are many considerations here, but one is that separate tables provide a
completely separate namespace.
If you use one table design of the key space is more involved as you need to
separate the namespace with key prefixes.
I had the same issue.
The problem for me turned out to be that the hbase.zookeeper.quorum was
not set in hbase-site.xml in the server that submitted the mapreduce
job. Ironically, this is also the same server that was running hbase
master. This defaulted to 127.0.0.1 which was where the task
Say we have a use case that has sequential row keys and we have rows
0-100. Let's assume that 100 rows = the split size. Now when there is a
split it will split at the halfway mark so there will be two regions as
follows:
Region1 [START-49]
Region2 [50-END]
So now at this point all inserts wi
There are many considerations here, but one is that separate tables provide a
completely separate namespace.
If you use one table design of the key space is more involved as you need to
separate the namespace with key prefixes.
So if you never have to access data from separate "key space" in a
I'm more interested in how and why it would depend rather than the
actual answer.
In evenly distributed systems you should do x/y because . If your
data is not evenly distributed then you should...
Thanks
On 11/20/11 12:57 AM, Michel Segel wrote:
Mark,
Simple answer ... it depends... ;
Hi,
OK...
First a caveat... I haven't seen your initial normalized schema, so take what I
say with a grain of salt...
The problem you are trying to solve is one which can be solved better on an
RDBMS platform and does not fit well in a NoSQL space.
Your scalability issue would probably be bet
Mark,
Simple answer ... it depends... ;-)
Longer answer...
What's your use case? What's your access pattern? Is the type of data, in this
case evenly distributed in terms of size?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 18, 2011, at 3:29 PM, Mark wrote:
>
11 matches
Mail list logo