yeah scan gives the correct number of rows, while count returns the total number of rows.
Both are using the same filter, I even tried it using Java API, using row count method. rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan); I get the total number of rows not the number of rows filtered. So any idea ?? Thanks Ram :) > Date: Mon, 24 Dec 2012 21:57:54 +0530 > Subject: Re: Hbase Count Aggregate Function > From: ramkrishna.s.vasude...@gmail.com > To: user@hbase.apache.org > > So you find that scan with a filter and count with the same filter is > giving you different results? > > Regards > Ram > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy > <dalia.mohso...@hotmail.com>wrote: > > > > > Dear all, > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another 50,000 > > rows with "renal". > > > > When I type this in Hbase shell, > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > import org.apache.hadoop.hbase.filter.SubstringComparator > > import org.apache.hadoop.hbase.util.Bytes > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER => > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > Bytes.toBytes('diagnosis'), > > CompareFilter::CompareOp.valueOf('EQUAL'), > > SubstringComparator.new('cardiac'))} > > > > Output = 50,000 row > > > > import org.apache.hadoop.hbase.filter.CompareFilter > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > > import org.apache.hadoop.hbase.filter.SubstringComparator > > import org.apache.hadoop.hbase.util.Bytes > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER => > > SingleColumnValueFilter.new(Bytes.toBytes('info'), > > Bytes.toBytes('diagnosis'), > > CompareFilter::CompareOp.valueOf('EQUAL'), > > SubstringComparator.new('cardiac'))} > > Output = 100,000 row > > > > Even though I tried it using Hbase Java API, Aggregation Client Instance, > > and I enabled the Coprocessor aggregation for the table. > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan) > > > > Also when measuring the improved performance on case of adding more nodes > > the operation takes the same time. > > > > So any advice please? > > > > I have been throughout all this mess from a couple of weeks > > > > Thanks,