Ok, Thanks very much. 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> Hello Qiang, > > Someone from our Drill team (in San Jose) will get back to you soon. I > work from the India lab and I am in a different time zone as compared to > San Jose office, some one from MapR San Jose will get back to you as soon > as possible. > > Thanks, > Khurram > > On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> wrote: > >> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion? >> >> Any idea how to sovle it? >> >> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >>> So I tried to create the table using HBase API (with no data inserted >>> into table) and I got the query plan for drill 1.7.0 >>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 >>> >>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; >>> +--------------+------------+--------------+ >>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >>> +--------------+------------+--------------+ >>> | row_key | ANY | NO | >>> | v | MAP | NO | >>> +--------------+------------+--------------+ >>> 2 rows selected (1.665 seconds) >>> >>> Table creation Java program >>> >>> {noformat} >>> public class PutIntDataToHBase { >>> public static void main(String args[]) throws IOException { >>> Configuration conf = HBaseConfiguration.create(); >>> conf.set("hbase.zookeeper.property.clientPort","5181"); >>> HBaseAdmin admin = new HBaseAdmin(conf); >>> if (admin.tableExists("browser_action2")) { >>> admin.disableTable("browser_action2"); >>> admin.deleteTable("browser_action2"); >>> } >>> >>> byte[][] SPLIT_KEYS = >>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; >>> HTableDescriptor tableDesc = new >>> HTableDescriptor(TableName.valueOf("browser_action2")); >>> >>> tableDesc.addFamily(new HColumnDescriptor("v")); >>> admin.createTable(tableDesc,SPLIT_KEYS); >>> >>> } >>> } >>> {noformat} >>> >>> Query plan for the query that was reported as returning wrong results. >>> >>> {noformat} >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from >>> hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; >>> +------+------+ >>> | text | json | >>> +------+------+ >>> | 00-00 Screen >>> 00-01 Project(k=[$0], p=[$1]) >>> 00-02 UnionExchange >>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> 01-03 Project($f0=[$0], p=[$1]) >>> 01-04 HashToRandomExchange(dist0=[[$0]]) >>> 02-01 UnorderedMuxExchange >>> 03-01 Project($f0=[$0], p=[$1], >>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >>> 03-03 Project($f0=[ITEM($1, 'e0')]) >>> 03-04 Scan(groupscan=[HBaseGroupScan >>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00, >>> stopRow=, filter=null], columns=[`*`]]]) >>> {noformat} >>> >>> and the query plan for the other problem query mentioned in the first >>> email. >>> >>> {noformat} >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, >>> . . . . . . . . . . . . . . > count(a.row_key) p from >>> hbase.browser_action2 a group by >>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); >>> +------+------+ >>> | text | json | >>> +------+------+ >>> | 00-00 Screen >>> 00-01 Project(k=[$0], p=[$1]) >>> 00-02 UnionExchange >>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> 01-03 Project($f0=[$0], p=[$1]) >>> 01-04 HashToRandomExchange(dist0=[[$0]]) >>> 02-01 UnorderedMuxExchange >>> 03-01 Project($f0=[$0], p=[$1], >>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) >>> 03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)], >>> row_key=[$0]) >>> 03-04 Scan(groupscan=[HBaseGroupScan >>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null, >>> stopRow=null, filter=null], columns=[`*`]]]) >>> {noformat} >>> >>> Thanks, >>> Khurram >>> >>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> wrote: >>> >>>> Yes. >>>> I use hbase API to create it. >>>> >>>> The main code is: >>>> >>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, >>>> {'7'},{'8'}, {'9'} }; >>>> TableName tableName = TableName.valueOf("browser_action2"); >>>> >>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName); >>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); >>>> tableDesc.addFamily(columnDesc); >>>> >>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); >>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); >>>> >>>> admin.createTable(tableDesc, SPLIT_KEYS); >>>> >>>> >>>> >>>> >>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: >>>> >>>>> Can you provide the CREATE TABLE statement you used to reproduce this >>>>> problem so we can try to reproduce it on our end. >>>>> >>>>> Thanks. >>>>> >>>>> -- Zelaine >>>>> >>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> >>>>> wrote: >>>>> >>>>> > Hi , >>>>> > >>>>> > I recently meet a issue that can not query the correct data from >>>>> hbase with >>>>> > sql by drill, can anybody help me. >>>>> > >>>>> > I test with the drill 1.6. >>>>> > My hbase scheme: >>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts >>>>> > cf : v >>>>> > qualifier: v, e0, e1 >>>>> > >>>>> > The wrong result only happened when I use group by clause. >>>>> > >>>>> > This sql will not return correct result: >>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p >>>>> from >>>>> > hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; >>>>> > Part of explain of this sql is: >>>>> > >>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select >>>>> CONVERT_FROM(a.`v`.`e0`, >>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where >>>>> > a.row_key > '0' group by a.`v`.`e0`; >>>>> > +------+------+ >>>>> > | text | json | >>>>> > +------+------+ >>>>> > | 00-00 Screen >>>>> > 00-01 Project(k=[$0], p=[$1]) >>>>> > 00-02 UnionExchange >>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>>>> > 01-03 Project($f0=[$0], p=[$1]) >>>>> > 01-04 HashToRandomExchange(dist0=[[$0]]) >>>>> > 02-01 UnorderedMuxExchange >>>>> > 03-01 Project($f0=[$0], p=[$1], >>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>>>> > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >>>>> > 03-03 Project($f0=[ITEM($1, 'e0')]) >>>>> > 03-04 Scan(groupscan=[HBaseGroupScan >>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >>>>> startRow=0\x00, >>>>> > stopRow=, filter=null], columns=[`*`]]]) >>>>> > >>>>> > The data return very quickly , the result of this sql is : >>>>> > +------+--------+ >>>>> > | k | p | >>>>> > +------+--------+ >>>>> > | pay | 12180 | >>>>> > +------+-------- >>>>> > >>>>> > But I have millons of data in the table. >>>>> > >>>>> > I tried to change the physical plan. if I change the json explain >>>>> > *"columns" >>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return the >>>>> correct >>>>> > result. >>>>> > >>>>> > It seems the physical plan is not correct. >>>>> > I also try to debug the sql parser to find out the reason, but its >>>>> too >>>>> > complicate. Can anyone help me. >>>>> > >>>>> > Also this sql have the same issue. >>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, >>>>> > count(a.row_key) p from hbase.browser_action2 a group by >>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); >>>>> > I change the json explain *"columns" : [ "`*`" ]* to *"columns" : [ >>>>> > "`row_key`" ] *, it will return the correct result. >>>>> > >>>>> >>>> >>>> >>> >> >