Re: Region Splits

2011-11-20 Thread Nicolas Spiegelberg
Sequential writes are also an argument for pre-splitting and using hash prefixing. In other words, presplit your table into N regions instead of the default of 1 & transform your keys into: new_key = md5(old_key) + old_key Using this method your sequential writes under the old_key are now spread

Re: Region Splits

2011-11-20 Thread Amandeep Khurana
Mark, Yes, your understanding is correct. If your keys are sequential (timestamps etc), you will always be writing to the end of the table and "older" regions will not get any writes. This is one of the arguments against using sequential keys. -ak On Sun, Nov 20, 2011 at 11:33 AM, Mark wrote:

How to debug/run HBase in eclipse

2011-11-20 Thread 陈加俊
I run HRegionServer whith program arguments that is start in eclipse. 2011-11-21 09:35:12,384 WARN [main] regionserver.HRegionServerCommandLine(56): Not starting a distinct region server because hbase.cluster.distributed is false but the following contents in $HBAE_HOME/conf/hbase-site.xml :

Re: Multiple tables vs big fat table

2011-11-20 Thread Amandeep Khurana
Mark, This is an interesting discussion and like Michel said - the answer to your question depends on what you are trying to achieve. However, here are the points that I would think about: What are the access patters of the various buckets of data that you want to put in HBase? For instance, woul

Re: Multiple tables vs big fat table

2011-11-20 Thread Mark
Thanks for the info. On 11/20/11 11:30 AM, lars hofhansl wrote: There are many considerations here, but one is that separate tables provide a completely separate namespace. If you use one table design of the key space is more involved as you need to separate the namespace with key prefixes.

Re: HBase & MapReduce & Zookeeper

2011-11-20 Thread Randy D. Wallace Jr.
I had the same issue. The problem for me turned out to be that the hbase.zookeeper.quorum was not set in hbase-site.xml in the server that submitted the mapreduce job. Ironically, this is also the same server that was running hbase master. This defaulted to 127.0.0.1 which was where the task

Region Splits

2011-11-20 Thread Mark
Say we have a use case that has sequential row keys and we have rows 0-100. Let's assume that 100 rows = the split size. Now when there is a split it will split at the halfway mark so there will be two regions as follows: Region1 [START-49] Region2 [50-END] So now at this point all inserts wi

Re: Multiple tables vs big fat table

2011-11-20 Thread lars hofhansl
There are many considerations here, but one is that separate tables provide a completely separate namespace. If you use one table design of the key space is more involved as you need to separate the namespace with key prefixes. So if you never have to access data from separate "key space" in a

Re: Multiple tables vs big fat table

2011-11-20 Thread Mark
I'm more interested in how and why it would depend rather than the actual answer. In evenly distributed systems you should do x/y because . If your data is not evenly distributed then you should... Thanks On 11/20/11 12:57 AM, Michel Segel wrote: Mark, Simple answer ... it depends... ;

Re: Schema design question - Hot Key concerns

2011-11-20 Thread Michel Segel
Hi, OK... First a caveat... I haven't seen your initial normalized schema, so take what I say with a grain of salt... The problem you are trying to solve is one which can be solved better on an RDBMS platform and does not fit well in a NoSQL space. Your scalability issue would probably be bet

Re: Multiple tables vs big fat table

2011-11-20 Thread Michel Segel
Mark, Simple answer ... it depends... ;-) Longer answer... What's your use case? What's your access pattern? Is the type of data, in this case evenly distributed in terms of size? Sent from a remote device. Please excuse any typos... Mike Segel On Nov 18, 2011, at 3:29 PM, Mark wrote: >