Hi , I am use drill to query hbase. But its very slow , I think one of the reason is when have multiple qualifiers the query plan scan all the data instead of the columns specified in the SQL. I debuged the process, and find out that the main logic is in calcite and hard to update it.
Can I get the physical plan and update it at client side ? I do some research , the class DrillWorker can get plan , but I have to init the context etc.? Does there have easy way I can get the physical plan ? So I can update it based on my query condition. 2016-05-24 13:30 GMT+08:00 qiang li <tiredqi...@gmail.com>: > I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it > again, The result are correct now. Thanks for the help. > > Even though, I think the query plan still can be optimized. > > Here is what I think can improve: > a. specify the columns when need > b. remove the rowFilter when scan startRow and stopRow can meet it. > for example : select row_key from hbase.browser_action a where > a.row_key >'0' and a.row_key < '1' , I think sql like this will query > faster without rowfilter. > > > > 2016-05-24 9:29 GMT+08:00 qiang li <tiredqi...@gmail.com>: > >> Yes, Its seems like the same issue. I will upgrade it and test again. But >> do you think we can update the physical plan too. If we only want to query >> one qualifier, then the columns of the plan should only contains the >> qualifier instead of "*". Maybe this plan will be query fast. Am I right? >> >> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>: >> >>> Hi Qiang, >>> >>> Looks like you might be encountering this issue: >>> https://issues.apache.org/jira/browse/DRILL-4271 >>> >>> Thanks >>> >>> On Sun, May 22, 2016 at 8:38 PM, qiang li <tiredqi...@gmail.com> wrote: >>> >>> > I test it step by step again. And finally I find out that the issue >>> > happened only if the qualifier number is more than 3. >>> > >>> > It's werid, but this is the result I test. >>> > >>> > I tested about 10 thousands row of data. The length of the event is >>> 6,the >>> > code I used to test is like below: >>> > >>> > String[] earr = action.getEvent().geteArr(); >>> > for(int i=0;i<6;i++){ >>> > put.addColumn(family, Bytes.toBytes("e"+i), >>> Bytes.toBytes(earr[i])); >>> > } >>> > >>> > Then I test step by step like below : >>> > >>> > put.addColumn(family, Bytes.toBytes("e0"), Bytes.toBytes("e0")); >>> > put.addColumn(family, Bytes.toBytes("e1"), Bytes.toBytes("e1")); >>> > put.addColumn(family, Bytes.toBytes("e2"), Bytes.toBytes("e2")); >>> > put.addColumn(family, Bytes.toBytes("e3"), Bytes.toBytes("e3")); >>> > >>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0' >>> > group by a.`v`.`e0`; >>> > +-----+-------+ >>> > | k | p | >>> > +-----+-------+ >>> > | e0 | 3856 | >>> > +-----+-------+ >>> > >>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`; >>> > +-----+-------+ >>> > | k | p | >>> > +-----+-------+ >>> > | e0 | 9094 | >>> > +-----+-------+ >>> > >>> > >>> > if I put the qualifier "e3" into the table(that is more than 3 >>> qulifier), >>> > then the issue reproduced. >>> > >>> > >>> > 2016-05-23 9:23 GMT+08:00 qiang li <tiredqi...@gmail.com>: >>> > >>> > > Sorry late. >>> > > >>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5. >>> > > >>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>: >>> > > >>> > >> Qiang, Can you please let us know the hbase version and hadoop >>> > >> distribution >>> > >> version that you are using. >>> > >> >>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen < >>> kngu...@maprtech.com> >>> > >> wrote: >>> > >> >>> > >> > Can you please let us know the hbase version and hadoop >>> distribution >>> > >> > version that you are using. >>> > >> > >>> > >> > >>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com> >>> > wrote: >>> > >> > >>> > >> >> Khurram , I send the mail again, the last mail forget to cc to >>> > >> >> user@drill.apache.org >>> > >> >> >>> > >> >> The main process is the same, but my rowkey is more complicate, >>> > >> >> Here is the detail I tested. >>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] + >>> > >> [event] + >>> > >> >> [uid long] + [ts long] >>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the >>> > others >>> > >> >> are >>> > >> >> string. >>> > >> >> >>> > >> >> example: >>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1} >>> > >> >> ROW >>> > >> COLUMN+CELL >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e0, >>> > >> >> timestamp=1461839343076, value=pay >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e1, >>> > >> >> timestamp=1461839343076, value=bijia >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e2, >>> > >> >> timestamp=1461839343076, value=browser >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e3, >>> > >> >> timestamp=1461839343076, value=* >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e4, >>> > >> >> timestamp=1461839343076, value=* >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:e5, >>> > >> >> timestamp=1461839343076, value=* >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:h, >>> > >> >> timestamp=1459771200000, value=20 >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:m, >>> > >> >> timestamp=1459771200000, value=0 >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE >>> > >> column=v:v, >>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17 >>> > >> >> >>> > >> >> >>> > >> >> 1$\xD2\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 1 row(s) in 0.0410 seconds >>> > >> >> >>> > >> >> >>> > >> >> Here is the example I how the issue look like: >>> > >> >> >>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => >>> ['v:e0'], >>> > >> >> STARTROW=> '0'} >>> > >> >> ........ >>> > >> >> 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ >>> > >> column=v:e0, >>> > >> >> timestamp=1463723029448, value=visit >>> > >> >> >>> > >> >> >>> > >> >> x01T\x00\x0A\xFA\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ >>> > >> column=v:e0, >>> > >> >> timestamp=1463723029217, value=visit >>> > >> >> >>> > >> >> >>> > >> >> x01T\x00\x0A\xFA\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ >>> > >> column=v:e0, >>> > >> >> timestamp=1463723029295, value=visit >>> > >> >> >>> > >> >> >>> > >> >> x01T\x00\x0A\xFA\x00 >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> 9994 row(s) in 123.8650 seconds >>> > >> >> >>> > >> >> the drill result: >>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') >>> as k, >>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >>> > >>> > '0' >>> > >> >> group by a.`v`.`e0`; >>> > >> >> +--------+-------+ >>> > >> >> | k | p | >>> > >> >> +--------+-------+ >>> > >> >> | visit | 1216 | >>> > >> >> +--------+-------+ >>> > >> >> >>> > >> >> >>> > >> >> I find out that if the row size larger than 10000 will have the >>> > issue. >>> > >> >> The >>> > >> >> result is right if less that 1000 rows. But not always that way. >>> > >> >> What I can make sure is if I updated the columns in the physical >>> plan >>> > >> and >>> > >> >> query by web UI , the result will be correct. >>> > >> >> >>> > >> >> >>> > >> >> Thanks >>> > >> >> >>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com >>> >: >>> > >> >> >>> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if >>> the >>> > >> data >>> > >> >> > set used in my repro is the same as the one you have used. If >>> the >>> > >> data >>> > >> >> set >>> > >> >> > is different please let us know the type of data that you have >>> used >>> > >> in >>> > >> >> your >>> > >> >> > table. >>> > >> >> > >>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share >>> > >> results. >>> > >> >> > >>> > >> >> > Thanks, >>> > >> >> > Khurram >>> > >> >> > >>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha < >>> amansi...@apache.org >>> > > >>> > >> >> wrote: >>> > >> >> > >>> > >> >> >> Khurram, DRILL-4686 seems like a different issue...it is >>> > reporting >>> > >> an >>> > >> >> >> error whereas the original problem from qiang was an incorrect >>> > >> result. >>> > >> >> >> Can >>> > >> >> >> you use the same version (1.6) that he was using. Also, is >>> the >>> > data >>> > >> >> set >>> > >> >> >> similar ? If you are unable to repro the exact same issue, >>> > perhaps >>> > >> >> qiang >>> > >> >> >> should file a JIRA with a smaller repro if possible. >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz < >>> > >> kfar...@maprtech.com> >>> > >> >> >> wrote: >>> > >> >> >> >>> > >> >> >> > Hello Qiang, >>> > >> >> >> > >>> > >> >> >> > DRILL-4686 is reported to track this problem. >>> > >> >> >> > >>> > >> >> >> > Thanks, >>> > >> >> >> > Khurram >>> > >> >> >> > >>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li < >>> tiredqi...@gmail.com >>> > > >>> > >> >> wrote: >>> > >> >> >> > >>> > >> >> >> >> Ok, Thanks very much. >>> > >> >> >> >> >>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz < >>> > kfar...@maprtech.com >>> > >> >: >>> > >> >> >> >> >>> > >> >> >> >>> Hello Qiang, >>> > >> >> >> >>> >>> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back >>> to you >>> > >> >> soon. I >>> > >> >> >> >>> work from the India lab and I am in a different time zone >>> as >>> > >> >> compared >>> > >> >> >> to >>> > >> >> >> >>> San Jose office, some one from MapR San Jose will get >>> back to >>> > >> you >>> > >> >> as >>> > >> >> >> soon >>> > >> >> >> >>> as possible. >>> > >> >> >> >>> >>> > >> >> >> >>> Thanks, >>> > >> >> >> >>> Khurram >>> > >> >> >> >>> >>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li < >>> > tiredqi...@gmail.com >>> > >> > >>> > >> >> >> wrote: >>> > >> >> >> >>> >>> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's >>> the >>> > >> >> >> >>>> conclusion? >>> > >> >> >> >>>> >>> > >> >> >> >>>> Any idea how to sovle it? >>> > >> >> >> >>>> >>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz < >>> > >> kfar...@maprtech.com>: >>> > >> >> >> >>>> >>> > >> >> >> >>>>> So I tried to create the table using HBase API (with no >>> data >>> > >> >> >> inserted >>> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0 >>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; >>> > >> >> >> >>>>> +--------------+------------+--------------+ >>> > >> >> >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >>> > >> >> >> >>>>> +--------------+------------+--------------+ >>> > >> >> >> >>>>> | row_key | ANY | NO | >>> > >> >> >> >>>>> | v | MAP | NO | >>> > >> >> >> >>>>> +--------------+------------+--------------+ >>> > >> >> >> >>>>> 2 rows selected (1.665 seconds) >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> Table creation Java program >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> public class PutIntDataToHBase { >>> > >> >> >> >>>>> public static void main(String args[]) throws >>> > IOException >>> > >> { >>> > >> >> >> >>>>> Configuration conf = >>> HBaseConfiguration.create(); >>> > >> >> >> >>>>> >>> > >> conf.set("hbase.zookeeper.property.clientPort","5181"); >>> > >> >> >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf); >>> > >> >> >> >>>>> if (admin.tableExists("browser_action2")) { >>> > >> >> >> >>>>> admin.disableTable("browser_action2"); >>> > >> >> >> >>>>> admin.deleteTable("browser_action2"); >>> > >> >> >> >>>>> } >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> byte[][] SPLIT_KEYS = >>> > >> >> >> >>>>> >>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; >>> > >> >> >> >>>>> HTableDescriptor tableDesc = new >>> > >> >> >> >>>>> >>> > >> >> HTableDescriptor(TableName.valueOf("browser_action2")); >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v")); >>> > >> >> >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS); >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> } >>> > >> >> >> >>>>> } >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> Query plan for the query that was reported as returning >>> > wrong >>> > >> >> >> results. >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >>> count(a.`v`.`e0`) p >>> > >> from >>> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by >>> > >> >> a.`v`.`e0`; >>> > >> >> >> >>>>> +------+------+ >>> > >> >> >> >>>>> | text | json | >>> > >> >> >> >>>>> +------+------+ >>> > >> >> >> >>>>> | 00-00 Screen >>> > >> >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) >>> > >> >> >> >>>>> 00-02 UnionExchange >>> > >> >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>> > >> >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> > >> >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >>> > >> >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >>> > >> >> >> >>>>> 02-01 UnorderedMuxExchange >>> > >> >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], >>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> > >> >> >> >>>>> 03-02 HashAgg(group=[{0}], >>> > p=[COUNT($0)]) >>> > >> >> >> >>>>> 03-03 Project($f0=[ITEM($1, >>> 'e0')]) >>> > >> >> >> >>>>> 03-04 >>> > Scan(groupscan=[HBaseGroupScan >>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >>> > >> >> >> startRow=0\x00, >>> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]]) >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> and the query plan for the other problem query >>> mentioned in >>> > >> the >>> > >> >> >> first >>> > >> >> >> >>>>> email. >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as >>> k, >>> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from >>> > >> >> >> >>>>> hbase.browser_action2 a group by >>> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 >>> , 9); >>> > >> >> >> >>>>> +------+------+ >>> > >> >> >> >>>>> | text | json | >>> > >> >> >> >>>>> +------+------+ >>> > >> >> >> >>>>> | 00-00 Screen >>> > >> >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) >>> > >> >> >> >>>>> 00-02 UnionExchange >>> > >> >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>> > >> >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> > >> >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >>> > >> >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >>> > >> >> >> >>>>> 02-01 UnorderedMuxExchange >>> > >> >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], >>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> > >> >> >> >>>>> 03-02 HashAgg(group=[{0}], >>> > p=[COUNT($1)]) >>> > >> >> >> >>>>> 03-03 >>> Project($f0=[BYTE_SUBSTR($0, 1, >>> > >> 9)], >>> > >> >> >> >>>>> row_key=[$0]) >>> > >> >> >> >>>>> 03-04 >>> > Scan(groupscan=[HBaseGroupScan >>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >>> > >> >> >> startRow=null, >>> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]]) >>> > >> >> >> >>>>> {noformat} >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> Thanks, >>> > >> >> >> >>>>> Khurram >>> > >> >> >> >>>>> >>> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li < >>> > >> tiredqi...@gmail.com> >>> > >> >> >> >>>>> wrote: >>> > >> >> >> >>>>> >>> > >> >> >> >>>>>> Yes. >>> > >> >> >> >>>>>> I use hbase API to create it. >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> The main code is: >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, >>> {'4'}, >>> > >> >> {'5'}, >>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} }; >>> > >> >> >> >>>>>> TableName tableName = >>> TableName.valueOf("browser_action2"); >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> HTableDescriptor tableDesc = new >>> > HTableDescriptor(tableName); >>> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new >>> HColumnDescriptor("v"); >>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc); >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> >>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); >>> > >> >> >> >>>>>> >>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS); >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong < >>> zf...@maprtech.com >>> > >: >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to >>> > >> >> reproduce >>> > >> >> >> this >>> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end. >>> > >> >> >> >>>>>>> >>> > >> >> >> >>>>>>> Thanks. >>> > >> >> >> >>>>>>> >>> > >> >> >> >>>>>>> -- Zelaine >>> > >> >> >> >>>>>>> >>> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li < >>> > >> >> tiredqi...@gmail.com> >>> > >> >> >> >>>>>>> wrote: >>> > >> >> >> >>>>>>> >>> > >> >> >> >>>>>>> > Hi , >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > I recently meet a issue that can not query the >>> correct >>> > >> data >>> > >> >> from >>> > >> >> >> >>>>>>> hbase with >>> > >> >> >> >>>>>>> > sql by drill, can anybody help me. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > I test with the drill 1.6. >>> > >> >> >> >>>>>>> > My hbase scheme: >>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: >>> > 120160411visituidts >>> > >> >> >> >>>>>>> > cf : v >>> > >> >> >> >>>>>>> > qualifier: v, e0, e1 >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > The wrong result only happened when I use group by >>> > clause. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > This sql will not return correct result: >>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >>> > >> >> count(a.`v`.`e0`) >>> > >> >> >> p >>> > >> >> >> >>>>>>> from >>> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' >>> group by >>> > >> >> >> a.`v`.`e0`; >>> > >> >> >> >>>>>>> > Part of explain of this sql is: >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select >>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`, >>> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from >>> > >> hbase.browser_action2 >>> > >> >> a >>> > >> >> >> >>>>>>> where >>> > >> >> >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`; >>> > >> >> >> >>>>>>> > +------+------+ >>> > >> >> >> >>>>>>> > | text | json | >>> > >> >> >> >>>>>>> > +------+------+ >>> > >> >> >> >>>>>>> > | 00-00 Screen >>> > >> >> >> >>>>>>> > 00-01 Project(k=[$0], p=[$1]) >>> > >> >> >> >>>>>>> > 00-02 UnionExchange >>> > >> >> >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], >>> p=[$1]) >>> > >> >> >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> > >> >> >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1]) >>> > >> >> >> >>>>>>> > 01-04 >>> HashToRandomExchange(dist0=[[$0]]) >>> > >> >> >> >>>>>>> > 02-01 UnorderedMuxExchange >>> > >> >> >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1], >>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> > >> >> >> >>>>>>> > 03-02 HashAgg(group=[{0}], >>> > >> >> p=[COUNT($0)]) >>> > >> >> >> >>>>>>> > 03-03 Project($f0=[ITEM($1, >>> > 'e0')]) >>> > >> >> >> >>>>>>> > 03-04 >>> > >> Scan(groupscan=[HBaseGroupScan >>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec >>> [tableName=browser_action2, >>> > >> >> >> >>>>>>> startRow=0\x00, >>> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]]) >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > The data return very quickly , the result of this >>> sql >>> > is : >>> > >> >> >> >>>>>>> > +------+--------+ >>> > >> >> >> >>>>>>> > | k | p | >>> > >> >> >> >>>>>>> > +------+--------+ >>> > >> >> >> >>>>>>> > | pay | 12180 | >>> > >> >> >> >>>>>>> > +------+-------- >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > But I have millons of data in the table. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > I tried to change the physical plan. if I change >>> the >>> > json >>> > >> >> >> explain >>> > >> >> >> >>>>>>> > *"columns" >>> > >> >> >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it >>> will >>> > >> >> return >>> > >> >> >> the >>> > >> >> >> >>>>>>> correct >>> > >> >> >> >>>>>>> > result. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > It seems the physical plan is not correct. >>> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the >>> > reason, >>> > >> >> but >>> > >> >> >> its >>> > >> >> >> >>>>>>> too >>> > >> >> >> >>>>>>> > complicate. Can anyone help me. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> > Also this sql have the same issue. >>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), >>> > 'UTF8') >>> > >> >> as >>> > >> >> >> k, >>> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a >>> group by >>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); >>> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* >>> to >>> > >> >> >> *"columns" : >>> > >> >> >> >>>>>>> [ >>> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result. >>> > >> >> >> >>>>>>> > >>> > >> >> >> >>>>>>> >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>>> >>> > >> >> >> >>>>> >>> > >> >> >> >>>> >>> > >> >> >> >>> >>> > >> >> >> >> >>> > >> >> >> > >>> > >> >> >> >>> > >> >> > >>> > >> >> > >>> > >> >> >>> > >> > >>> > >> > >>> > >> >>> > > >>> > > >>> > >>> >> >> >