I went through this discussion a month or so ago and came away with the
opinion that you can either have an efficient load with random key but
then have an inefficient 'scan' not using start and end rows, or have an
inefficient import with sequential key and then scan using start and end
rows.
-Pete
On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <oruchov...@gmail.com>
wrote:
Actually discussion started from this post:
http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
Simply inserting the data in which row key <date>_<somedata> I noticed
that
only one node works (region to which data were writing). In case we have
10-15 nodes I think it is inefficient to write data to only one region. I
want to get an effect that data will be inserted to as much as possible
nodes simultaneously. Correct me guys , but in this case writing job
will take less time , am I write?
Oleg.
On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <c...@email.com> wrote:
There is none - HBase uses a total order partitioner. The straight key
value itself determines which region a row is put into. This allows for
very
rapid scans of sequential data, among other things but does mean it is
easier to hotspot regions. Key design is very important.
-chris
On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
> the hash function that distributes the rows between the regions.
>
> On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>
>> Hash? Which hash are you referring to sir?
>> St.Ack
>>
>> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter
<li...@infolinks.com>
>> wrote:
>>> Hi,
>>> What is the API or configuration for changing the default hash
function
>> for
>>> a specific htable.
>>>
>>> thanks,
>>> Lior
>>>
>>