Re: Coprocessor Aggregation supposed to be ~20x slower than Scans?

Ted Yu Mon, 14 May 2012 16:00:31 -0700

I was aware of the following change.

Can you log a JIRA and attach the patch to it ?


Thanks for trying out and improving aggregation client.

On Mon, May 14, 2012 at 3:31 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi Ted,
>
> If we change the if statement condition in validateParameters method in
> AggregationClient.java to:
> if (scan == null || (Bytes.equals(scan.getStartRow(), scan.getStopRow()) &&
> !Bytes.equals(scan.getStartRow(), HConstants.EMPTY_START_ROW)) ||
> (Bytes.compareTo(scan.getStartRow(), scan.getStopRow()) > 0 &&
> *!Bytes.equals(scan.getStopRow(),
> HConstants.EMPTY_END_ROW)* ))
>
> Condition specified in the bold and Italic will handle the case when the
> stopRow is not specified. IMHO, it's not an error if we are not specifying
> the stopRow. This is what is was looking for because in my case i didnt
> wanted to set the stop row as I am using a prefix filter. I have tested the
> above specified code and it works fine when i only specify the startRow. Is
> this a desirable functionality? If yes, should this be added to trunk?
>
> Here is the link for source of AggregationClient:
>
> http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.0/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java/?v=source
>
> Thanks,
> Anil Gupta
>
>
> On Mon, May 14, 2012 at 1:58 PM, Ted Yu <yuzhih...@gmail.com> wrote
>
> > Anil:
> > As code #3 shows, having stopRow helps narrow the range of rows
> > participating in aggregation.
> >
> > Do you have suggestion on how this process can be made more
> user-friendly ?
> >
> > Thanks
> >
> > On Mon, May 14, 2012 at 1:47 PM, anil gupta <anilgupt...@gmail.com>
> wrote:
> >
> > > HI Ted,
> > >
> > > My bad, i missed out a big difference between the Scan object i am
> using
> > in
> > > my filter and Scan object used in coprocessors. So, scan object is not
> > > same.
> > > Basically, i am doing filtering on the basis of a prefix of RowKey.
> > >
> > > So, in my filter i do this to build scanner:
> > > Code 1:
> > >  Filter filter = new PrefixFilter(Bytes.toBytes(strPrefix));
> > >            Scan scan = new Scan();
> > >            scan.setFilter(filter);
> > >            scan.setStartRow(Bytes.toBytes(strPrefix)); // I dont set
> any
> > > stopRow in this scanner.
> > >
> > > In coprocessor, i do the following for scanner:
> > > Code 2:
> > >  Scan scan = new Scan();
> > > scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix)));
> > >
> > >  I dont have startRow in above code because if i only use only the
> > startRow
> > > in coprocessor scanner then i get the following exception(due to this I
> > > removed the startRow from CP scan object code):
> > > java.io.IOException: Agg client Exception: Startrow should be smaller
> > than
> > > Stoprow
> > >    at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.validateParameters(AggregationClient.java:116)
> > >    at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.max(AggregationClient.java:85)
> > >    at
> > > com.intuit.ihub.hbase.poc.DummyClass.doAggregation(DummyClass.java:81)
> > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >    at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >
> > >
> > > I modified the above code#2 to add the stopRow also:
> > > Code 3:
> > > Scan scan = new Scan();
> > >        scan.setStartRow(Bytes.toBytes(prefix));
> > >
> > >
> scan.setStopRow(Bytes.toBytes(String.valueOf(Long.parseLong(prefix)+1)));
> > >        scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix)));
> > >
> > > When, i run the coprocessor with Code #3, its blazing fast. I gives the
> > > result in around 200 millisecond. :)
> > > Since, this was just testing a coprocessors i added the logic to add
> the
> > > stopRow manually. What is the reason that Scan object in coprocessor
> > always
> > > requires stopRow along with startRow?(code #1 works fine even when i
> dont
> > > use stopRow)  Can this restriction be relaxed?
> > >
> > > Thanks,
> > > Anil Gupta
> > >
> > >
> > >
> > > On Mon, May 14, 2012 at 12:55 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > Anil:
> > > > I think the performance was related to your custom filter.
> > > >
> > > > Please tell us more about the filter next time.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, May 14, 2012 at 12:31 PM, anil gupta <anilgupt...@gmail.com>
> > > > wrote:
> > > >
> > > > > HI Stack,
> > > > >
> > > > > I'll look into Gary Helming post and try to do profiling of
> > coprocessor
> > > > and
> > > > > share the results.
> > > > >
> > > > > Thanks,
> > > > > Anil Gupta
> > > > >
> > > > > On Mon, May 14, 2012 at 12:08 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > > > On Mon, May 14, 2012 at 12:02 PM, anil gupta <
> > anilgupt...@gmail.com>
> > > > > > wrote:
> > > > > > > I loaded around 70 thousand 1-2KB records in HBase. For scans,
> > with
> > > > my
> > > > > > > custom filter i am able to get 97 rows in 500 milliseconds and
> > for
> > > > > doing
> > > > > > > sum, max, min(in built aggregations of HBase) on the same
> custom
> > > > filter
> > > > > > its
> > > > > > > taking 11000 milliseconds. Does this mean that coprocessors
> > > > aggregation
> > > > > > is
> > > > > > > supposed to be around ~20x slower than scans? Am i missing any
> > > trick
> > > > > over
> > > > > > > here?
> > > > > > >
> > > > > >
> > > > > > That seems like a high tax to pay for running CPs.  Can you dig
> in
> > on
> > > > > > where the time is being spent?  (See another recent note on this
> > list
> > > > > > or on dev where Gary Helmling talks about how he did basic
> > profiling
> > > > > > of CPs).
> > > > > > St.Ack
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Coprocessor Aggregation supposed to be ~20x slower than Scans?

Reply via email to