Hi Khurram, Thanks very much to reproduce it, so what's the conclusion? Any idea how to sovle it?
2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: > So I tried to create the table using HBase API (with no data inserted into > table) and I got the query plan for drill 1.7.0 > Drill 1.7.0-SNAPSHOT commit ID : 09b26277 > > 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; > +--------------+------------+--------------+ > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | > +--------------+------------+--------------+ > | row_key | ANY | NO | > | v | MAP | NO | > +--------------+------------+--------------+ > 2 rows selected (1.665 seconds) > > Table creation Java program > > {noformat} > public class PutIntDataToHBase { > public static void main(String args[]) throws IOException { > Configuration conf = HBaseConfiguration.create(); > conf.set("hbase.zookeeper.property.clientPort","5181"); > HBaseAdmin admin = new HBaseAdmin(conf); > if (admin.tableExists("browser_action2")) { > admin.disableTable("browser_action2"); > admin.deleteTable("browser_action2"); > } > > byte[][] SPLIT_KEYS = > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; > HTableDescriptor tableDesc = new > HTableDescriptor(TableName.valueOf("browser_action2")); > > tableDesc.addFamily(new HColumnDescriptor("v")); > admin.createTable(tableDesc,SPLIT_KEYS); > > } > } > {noformat} > > Query plan for the query that was reported as returning wrong results. > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select > CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from > hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(k=[$0], p=[$1]) > 00-02 UnionExchange > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > 01-03 Project($f0=[$0], p=[$1]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01 UnorderedMuxExchange > 03-01 Project($f0=[$0], p=[$1], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) > 03-03 Project($f0=[ITEM($1, 'e0')]) > 03-04 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00, > stopRow=, filter=null], columns=[`*`]]]) > {noformat} > > and the query plan for the other problem query mentioned in the first > email. > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select > CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, > . . . . . . . . . . . . . . > count(a.row_key) p from > hbase.browser_action2 a group by > . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(k=[$0], p=[$1]) > 00-02 UnionExchange > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > 01-03 Project($f0=[$0], p=[$1]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01 UnorderedMuxExchange > 03-01 Project($f0=[$0], p=[$1], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) > 03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)], > row_key=[$0]) > 03-04 Scan(groupscan=[HBaseGroupScan > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null, > stopRow=null, filter=null], columns=[`*`]]]) > {noformat} > > Thanks, > Khurram > > On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> wrote: > >> Yes. >> I use hbase API to create it. >> >> The main code is: >> >> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, >> {'7'},{'8'}, {'9'} }; >> TableName tableName = TableName.valueOf("browser_action2"); >> >> HTableDescriptor tableDesc = new HTableDescriptor(tableName); >> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); >> tableDesc.addFamily(columnDesc); >> >> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); >> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); >> >> admin.createTable(tableDesc, SPLIT_KEYS); >> >> >> >> >> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: >> >>> Can you provide the CREATE TABLE statement you used to reproduce this >>> problem so we can try to reproduce it on our end. >>> >>> Thanks. >>> >>> -- Zelaine >>> >>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> wrote: >>> >>> > Hi , >>> > >>> > I recently meet a issue that can not query the correct data from hbase >>> with >>> > sql by drill, can anybody help me. >>> > >>> > I test with the drill 1.6. >>> > My hbase scheme: >>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts >>> > cf : v >>> > qualifier: v, e0, e1 >>> > >>> > The wrong result only happened when I use group by clause. >>> > >>> > This sql will not return correct result: >>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from >>> > hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; >>> > Part of explain of this sql is: >>> > >>> > 0: jdbc:drill:zk=rfdc5> explain plan for select >>> CONVERT_FROM(a.`v`.`e0`, >>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where >>> > a.row_key > '0' group by a.`v`.`e0`; >>> > +------+------+ >>> > | text | json | >>> > +------+------+ >>> > | 00-00 Screen >>> > 00-01 Project(k=[$0], p=[$1]) >>> > 00-02 UnionExchange >>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >>> > 01-03 Project($f0=[$0], p=[$1]) >>> > 01-04 HashToRandomExchange(dist0=[[$0]]) >>> > 02-01 UnorderedMuxExchange >>> > 03-01 Project($f0=[$0], p=[$1], >>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >>> > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >>> > 03-03 Project($f0=[ITEM($1, 'e0')]) >>> > 03-04 Scan(groupscan=[HBaseGroupScan >>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >>> startRow=0\x00, >>> > stopRow=, filter=null], columns=[`*`]]]) >>> > >>> > The data return very quickly , the result of this sql is : >>> > +------+--------+ >>> > | k | p | >>> > +------+--------+ >>> > | pay | 12180 | >>> > +------+-------- >>> > >>> > But I have millons of data in the table. >>> > >>> > I tried to change the physical plan. if I change the json explain >>> > *"columns" >>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return the >>> correct >>> > result. >>> > >>> > It seems the physical plan is not correct. >>> > I also try to debug the sql parser to find out the reason, but its too >>> > complicate. Can anyone help me. >>> > >>> > Also this sql have the same issue. >>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, >>> > count(a.row_key) p from hbase.browser_action2 a group by >>> > BYTE_SUBSTR(a.row_key, 1 , 9); >>> > I change the json explain *"columns" : [ "`*`" ]* to *"columns" : [ >>> > "`row_key`" ] *, it will return the correct result. >>> > >>> >> >> >