Yes. I use hbase API to create it. The main code is:
byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'},{'8'}, {'9'} }; TableName tableName = TableName.valueOf("browser_action2"); HTableDescriptor tableDesc = new HTableDescriptor(tableName); HColumnDescriptor columnDesc = new HColumnDescriptor("v"); tableDesc.addFamily(columnDesc); columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); admin.createTable(tableDesc, SPLIT_KEYS); 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: > Can you provide the CREATE TABLE statement you used to reproduce this > problem so we can try to reproduce it on our end. > > Thanks. > > -- Zelaine > > On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> wrote: > > > Hi , > > > > I recently meet a issue that can not query the correct data from hbase > with > > sql by drill, can anybody help me. > > > > I test with the drill 1.6. > > My hbase scheme: > > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts > > cf : v > > qualifier: v, e0, e1 > > > > The wrong result only happened when I use group by clause. > > > > This sql will not return correct result: > > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from > > hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; > > Part of explain of this sql is: > > > > 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`, > > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where > > a.row_key > '0' group by a.`v`.`e0`; > > +------+------+ > > | text | json | > > +------+------+ > > | 00-00 Screen > > 00-01 Project(k=[$0], p=[$1]) > > 00-02 UnionExchange > > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > > 01-03 Project($f0=[$0], p=[$1]) > > 01-04 HashToRandomExchange(dist0=[[$0]]) > > 02-01 UnorderedMuxExchange > > 03-01 Project($f0=[$0], p=[$1], > > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) > > 03-03 Project($f0=[ITEM($1, 'e0')]) > > 03-04 Scan(groupscan=[HBaseGroupScan > > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00, > > stopRow=, filter=null], columns=[`*`]]]) > > > > The data return very quickly , the result of this sql is : > > +------+--------+ > > | k | p | > > +------+--------+ > > | pay | 12180 | > > +------+-------- > > > > But I have millons of data in the table. > > > > I tried to change the physical plan. if I change the json explain > > *"columns" > > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return the > correct > > result. > > > > It seems the physical plan is not correct. > > I also try to debug the sql parser to find out the reason, but its too > > complicate. Can anyone help me. > > > > Also this sql have the same issue. > > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, > > count(a.row_key) p from hbase.browser_action2 a group by > > BYTE_SUBSTR(a.row_key, 1 , 9); > > I change the json explain *"columns" : [ "`*`" ]* to *"columns" : [ > > "`row_key`" ] *, it will return the correct result. > > >