The default way HBase sets up M/R jobs is to make one per partition.  So, if 
all of your months are in one partition then you will only have one map job.  
To do something different you would have to change the way splits are 
determined for your job, rather than using the default.
One nice thing about having random keys is that you can use the defaults and 
just set a filter for your date range. That way you get maximum parallelism on 
the map side.  But then you might be constrained by your subsequent step if you 
can't parallelize that nicely.

Dave

-----Original Message-----
From: Weishung Chung [mailto:weish...@gmail.com] 
Sent: Monday, January 10, 2011 11:02 AM
To: user@hbase.apache.org
Subject: Re: customize partitioning of regionserver

Thanks for the reply.
In my use case, I have to retrieve a range of data usually by month and
operate on them before reinserting them, so it would be nice if i could
partition by month but then I don't know how would the partition affect the
mapreduce job.

On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <buttl...@llnl.gov> wrote:

> Not to my knowledge.  Partitions are dynamically determined. As your table
> grows, regions become too large and are split roughly in half.  This
> prevents unbalanced regions.  Any predetermined partitioning will ultimately
> fail because you don't know your data as well as you think you do.
>
> Dave
>
>
> -----Original Message-----
> From: Weishung Chung [mailto:weish...@gmail.com]
> Sent: Monday, January 10, 2011 10:14 AM
> To: user@hbase.apache.org
> Subject: customize partitioning of regionserver
>
> Does HBase have the capability to partition dataset by range like the MySQL
> partitioning eg. partition the datetime, row key by month?
> Thank you.
>

Reply via email to