Anil: I am having trouble accessing JIRA. Ted Yu and Zhihong Yu are the same person :-)
I think it would be good to remind user of aggregation client to narrow range of scan. That's why I proposed adding check of hasFilter(). Cheers On Tue, May 15, 2012 at 10:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Take your time. > Once you complete your first submission, subsequent contributions would be > easier. > > > On Tue, May 15, 2012 at 10:34 AM, anil gupta <anilgupt...@gmail.com>wrote: > >> Hi Ted, >> >> I created the jira:https://issues.apache.org/jira/browse/HBASE-5999 for >> fixing this. >> >> Creating the patch might take me sometime(due to learning curve) as this >> is >> the first time i would be creating a patch. >> >> Thanks, >> Anil Gupta >> >> >> On Mon, May 14, 2012 at 4:00 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > I was aware of the following change. >> > >> > Can you log a JIRA and attach the patch to it ? >> > >> > Thanks for trying out and improving aggregation client. >> > >> > On Mon, May 14, 2012 at 3:31 PM, anil gupta <anilgupt...@gmail.com> >> wrote: >> > >> > > Hi Ted, >> > > >> > > If we change the if statement condition in validateParameters method >> in >> > > AggregationClient.java to: >> > > if (scan == null || (Bytes.equals(scan.getStartRow(), >> scan.getStopRow()) >> > && >> > > !Bytes.equals(scan.getStartRow(), HConstants.EMPTY_START_ROW)) || >> > > (Bytes.compareTo(scan.getStartRow(), scan.getStopRow()) > 0 && >> > > *!Bytes.equals(scan.getStopRow(), >> > > HConstants.EMPTY_END_ROW)* )) >> > > >> > > Condition specified in the bold and Italic will handle the case when >> the >> > > stopRow is not specified. IMHO, it's not an error if we are not >> > specifying >> > > the stopRow. This is what is was looking for because in my case i >> didnt >> > > wanted to set the stop row as I am using a prefix filter. I have >> tested >> > the >> > > above specified code and it works fine when i only specify the >> startRow. >> > Is >> > > this a desirable functionality? If yes, should this be added to trunk? >> > > >> > > Here is the link for source of AggregationClient: >> > > >> > > >> > >> http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.0/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java/?v=source >> > > >> > > Thanks, >> > > Anil Gupta >> > > >> > > >> > > On Mon, May 14, 2012 at 1:58 PM, Ted Yu <yuzhih...@gmail.com> wrote >> > > >> > > > Anil: >> > > > As code #3 shows, having stopRow helps narrow the range of rows >> > > > participating in aggregation. >> > > > >> > > > Do you have suggestion on how this process can be made more >> > > user-friendly ? >> > > > >> > > > Thanks >> > > > >> > > > On Mon, May 14, 2012 at 1:47 PM, anil gupta <anilgupt...@gmail.com> >> > > wrote: >> > > > >> > > > > HI Ted, >> > > > > >> > > > > My bad, i missed out a big difference between the Scan object i am >> > > using >> > > > in >> > > > > my filter and Scan object used in coprocessors. So, scan object is >> > not >> > > > > same. >> > > > > Basically, i am doing filtering on the basis of a prefix of >> RowKey. >> > > > > >> > > > > So, in my filter i do this to build scanner: >> > > > > Code 1: >> > > > > Filter filter = new PrefixFilter(Bytes.toBytes(strPrefix)); >> > > > > Scan scan = new Scan(); >> > > > > scan.setFilter(filter); >> > > > > scan.setStartRow(Bytes.toBytes(strPrefix)); // I dont >> set >> > > any >> > > > > stopRow in this scanner. >> > > > > >> > > > > In coprocessor, i do the following for scanner: >> > > > > Code 2: >> > > > > Scan scan = new Scan(); >> > > > > scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix))); >> > > > > >> > > > > I dont have startRow in above code because if i only use only the >> > > > startRow >> > > > > in coprocessor scanner then i get the following exception(due to >> > this I >> > > > > removed the startRow from CP scan object code): >> > > > > java.io.IOException: Agg client Exception: Startrow should be >> smaller >> > > > than >> > > > > Stoprow >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.validateParameters(AggregationClient.java:116) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.max(AggregationClient.java:85) >> > > > > at >> > > > > >> > com.intuit.ihub.hbase.poc.DummyClass.doAggregation(DummyClass.java:81) >> > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > > > > >> > > > > >> > > > > I modified the above code#2 to add the stopRow also: >> > > > > Code 3: >> > > > > Scan scan = new Scan(); >> > > > > scan.setStartRow(Bytes.toBytes(prefix)); >> > > > > >> > > > > >> > > >> scan.setStopRow(Bytes.toBytes(String.valueOf(Long.parseLong(prefix)+1))); >> > > > > scan.setFilter(new PrefixFilter(Bytes.toBytes(prefix))); >> > > > > >> > > > > When, i run the coprocessor with Code #3, its blazing fast. I >> gives >> > the >> > > > > result in around 200 millisecond. :) >> > > > > Since, this was just testing a coprocessors i added the logic to >> add >> > > the >> > > > > stopRow manually. What is the reason that Scan object in >> coprocessor >> > > > always >> > > > > requires stopRow along with startRow?(code #1 works fine even >> when i >> > > dont >> > > > > use stopRow) Can this restriction be relaxed? >> > > > > >> > > > > Thanks, >> > > > > Anil Gupta >> > > > > >> > > > > >> > > > > >> > > > > On Mon, May 14, 2012 at 12:55 PM, Ted Yu <yuzhih...@gmail.com> >> > wrote: >> > > > > >> > > > > > Anil: >> > > > > > I think the performance was related to your custom filter. >> > > > > > >> > > > > > Please tell us more about the filter next time. >> > > > > > >> > > > > > Thanks >> > > > > > >> > > > > > On Mon, May 14, 2012 at 12:31 PM, anil gupta < >> > anilgupt...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > > HI Stack, >> > > > > > > >> > > > > > > I'll look into Gary Helming post and try to do profiling of >> > > > coprocessor >> > > > > > and >> > > > > > > share the results. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Anil Gupta >> > > > > > > >> > > > > > > On Mon, May 14, 2012 at 12:08 PM, Stack <st...@duboce.net> >> > wrote: >> > > > > > > >> > > > > > > > On Mon, May 14, 2012 at 12:02 PM, anil gupta < >> > > > anilgupt...@gmail.com> >> > > > > > > > wrote: >> > > > > > > > > I loaded around 70 thousand 1-2KB records in HBase. For >> > scans, >> > > > with >> > > > > > my >> > > > > > > > > custom filter i am able to get 97 rows in 500 milliseconds >> > and >> > > > for >> > > > > > > doing >> > > > > > > > > sum, max, min(in built aggregations of HBase) on the same >> > > custom >> > > > > > filter >> > > > > > > > its >> > > > > > > > > taking 11000 milliseconds. Does this mean that >> coprocessors >> > > > > > aggregation >> > > > > > > > is >> > > > > > > > > supposed to be around ~20x slower than scans? Am i missing >> > any >> > > > > trick >> > > > > > > over >> > > > > > > > > here? >> > > > > > > > > >> > > > > > > > >> > > > > > > > That seems like a high tax to pay for running CPs. Can you >> dig >> > > in >> > > > on >> > > > > > > > where the time is being spent? (See another recent note on >> > this >> > > > list >> > > > > > > > or on dev where Gary Helmling talks about how he did basic >> > > > profiling >> > > > > > > > of CPs). >> > > > > > > > St.Ack >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Thanks & Regards, >> > > > > > > Anil Gupta >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Thanks & Regards, >> > > > > Anil Gupta >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Thanks & Regards, >> > > Anil Gupta >> > > >> > >> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > >