To your question regarding if you can write a mapper that sends only the
columns that you need:

Yes of course you can do it.
See the example in Importer.java.  It shows you how a simple copytable can
be implemented.  Use a similar way but before creating the new put for the
new table, just check the KVs and then decide.   Hope this helps.

Regards
Ram

On Sat, Feb 9, 2013 at 7:52 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> See the following javadoc in Scan.java:
>
>  * To only retrieve columns within a specific range of version timestamps,
>
>  * execute {@link #setTimeRange(long, long) setTimeRange}.
> You can search for the above method in unit tests.
>
> In your use case, is family f the only family ?
> If not, take a look at HBASE-5416 which is coming in 0.94.5
> family f would be the essential column.
>
> Cheers
>
> On Fri, Feb 8, 2013 at 5:47 PM, <alx...@aim.com> wrote:
>
> > Hi,
> >
> > Thanks for suggestions. How a time range scan can be implemented in java
> > code. Is there any sample code or tutorials?
> > Also, is it possible to select by a value of a column? Let say I know
> that
> > records has family f and column m, and new records has m=5. I need to
> > instruct hbase to send only these records to the mapper of mapred jobs.
> >
> > Thanks.
> > Alex.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ted Yu <yuzhih...@gmail.com>
> > To: user <user@hbase.apache.org>
> > Sent: Fri, Feb 8, 2013 11:05 am
> > Subject: Re: split table data into two or more tables
> >
> >
> > bq. in a cluster of 2 nodes +1 master
> > I assume you're limited by hardware in the regard.
> >
> > bq. job selects these new records
> > Have you used time-range scan ?
> >
> > Cheers
> >
> > On Fri, Feb 8, 2013 at 10:59 AM, <alx...@aim.com> wrote:
> >
> > > Hi,
> > >
> > > The rationale is that I have a mapred job that adds new records to an
> > > hbase table, constantly.
> > > The next mapred job selects these new records, but it must iterate over
> > > all records and check if it is a candidate for selection.
> > > Since there are too many old records iterating though them in a cluster
> > of
> > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into
> > two
> > > tables must reduce this time, and as soon as I figure out that there is
> > no
> > > more new record left in one of the new tables I will not run mapred job
> > on
> > > it.
> > >
> > > Currently, we have 7 regions including ROOT and META.
> > >
> > >
> > > Thanks.
> > > Alex.
> > >
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Ted Yu <yuzhih...@gmail.com>
> > > To: user <user@hbase.apache.org>
> > > Sent: Fri, Feb 8, 2013 10:40 am
> > > Subject: Re: split table data into two or more tables
> > >
> > >
> > > May I ask the rationale behind this ?
> > > Were you aiming for higher write throughput ?
> > >
> > > Please also tell us how many regions you have in the current table.
> > >
> > > Thanks
> > >
> > > BTW please consider upgrading to 0.94.4
> > >
> > > On Fri, Feb 8, 2013 at 10:36 AM, <alx...@aim.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I wondered if there is a way of splitting data from one table into
> two
> > or
> > > > more tables in hbase with iidentical schemas, i.e. if table A has
> 100M
> > > > records put 50M into table B, 50M into table C and delete table A.
> > > > Currently, I use hbase-0.92.1 and hadoop-1.4.0
> > > >
> > > > Thanks.
> > > > Alex.
> > > >
> > >
> > >
> > >
> >
> >
> >
>

Reply via email to