Re: Hbase Count Aggregate Function

ramkrishna vasudevan Mon, 24 Dec 2012 10:51:42 -0800

Hi
You could have custom filter implemented which is similar to
FirstKeyOnlyfilter.
Implement the filterKeyValue method such that it should match your keyvalue
(the specific qualifier that you are looking for).


Deploy it in your cluster.  It should work.

Regards
Ram

On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <dalia.mohso...@hotmail.com>wrote:

>
> So do you have a suggestion how to enable/work the filter?
>
> > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasude...@gmail.com
> > To: user@hbase.apache.org
> >
> > Okie, seeing the shell script and the code I feel that while you use this
> > counter, the user's filter is not taken into account.
> > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> >
> > Regards
> > Ram
> >
> > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> dalia.mohso...@hotmail.com>wrote:
> >
> > >
> > > yeah scan gives the correct number of rows, while count returns the
> total
> > > number of rows.
> > >
> > > Both are using the same filter, I even tried it using Java API, using
> row
> > > count method.
> > >
> > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > >
> > > I get the total number of rows not the number of rows filtered.
> > >
> > > So any idea ??
> > >
> > > Thanks Ram :)
> > >
> > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasude...@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > So you find that scan with a filter and count with the same filter is
> > > > giving you different results?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> dalia.mohso...@hotmail.com
> > > >wrote:
> > > >
> > > > >
> > > > > Dear all,
> > > > >
> > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > 50,000
> > > > > rows with "renal".
> > > > >
> > > > > When I type this in Hbase shell,
> > > > >
> > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > >
> > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >          Bytes.toBytes('diagnosis'),
> > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >          SubstringComparator.new('cardiac'))}
> > > > >
> > > > > Output = 50,000 row
> > > > >
> > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > >
> > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > >          Bytes.toBytes('diagnosis'),
> > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > >          SubstringComparator.new('cardiac'))}
> > > > > Output = 100,000 row
> > > > >
> > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > Instance,
> > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > >
> > > > > Also when measuring the improved performance on case of adding more
> > > nodes
> > > > > the operation takes the same time.
> > > > >
> > > > > So any advice please?
> > > > >
> > > > > I have been throughout all this mess from a couple of weeks
> > > > >
> > > > > Thanks,
> > >
> > >
>
>

Re: Hbase Count Aggregate Function

Reply via email to