Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?

Any idea how to sovle it?

2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:

> So I tried to create the table using HBase API (with no data inserted into
> table) and I got the query plan for drill 1.7.0
> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>
> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> +--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> | row_key      | ANY        | NO           |
> | v            | MAP        | NO           |
> +--------------+------------+--------------+
> 2 rows selected (1.665 seconds)
>
> Table creation Java program
>
> {noformat}
> public class PutIntDataToHBase {
>     public static void main(String args[]) throws IOException {
>         Configuration conf = HBaseConfiguration.create();
>         conf.set("hbase.zookeeper.property.clientPort","5181");
>         HBaseAdmin admin = new HBaseAdmin(conf);
>         if (admin.tableExists("browser_action2")) {
>             admin.disableTable("browser_action2");
>             admin.deleteTable("browser_action2");
>         }
>
>         byte[][] SPLIT_KEYS =
> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>         HTableDescriptor tableDesc = new
>             HTableDescriptor(TableName.valueOf("browser_action2"));
>
>         tableDesc.addFamily(new HColumnDescriptor("v"));
>         admin.createTable(tableDesc,SPLIT_KEYS);
>
>     }
> }
> {noformat}
>
> Query plan for the query that was reported as returning wrong results.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(k=[$0], p=[$1])
> 00-02        UnionExchange
> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> 01-03              Project($f0=[$0], p=[$1])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project($f0=[$0], p=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> 03-03                        Project($f0=[ITEM($1, 'e0')])
> 03-04                          Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
> stopRow=, filter=null], columns=[`*`]]])
> {noformat}
>
> and the query plan for the other problem query mentioned in the first
> email.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> . . . . . . . . . . . . . . > count(a.row_key) p from
> hbase.browser_action2 a group by
> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(k=[$0], p=[$1])
> 00-02        UnionExchange
> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> 01-03              Project($f0=[$0], p=[$1])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 02-01                  UnorderedMuxExchange
> 03-01                    Project($f0=[$0], p=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
> row_key=[$0])
> 03-04                          Scan(groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
> stopRow=null, filter=null], columns=[`*`]]])
> {noformat}
>
> Thanks,
> Khurram
>
> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> wrote:
>
>> Yes.
>> I use hbase API to create it.
>>
>> The main code is:
>>
>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, 
>> {'7'},{'8'}, {'9'} };
>> TableName tableName = TableName.valueOf("browser_action2");
>>
>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> tableDesc.addFamily(columnDesc);
>>
>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>
>> admin.createTable(tableDesc, SPLIT_KEYS);
>>
>>
>>
>>
>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>
>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>> problem so we can try to reproduce it on our end.
>>>
>>> Thanks.
>>>
>>> -- Zelaine
>>>
>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> wrote:
>>>
>>> > Hi ,
>>> >
>>> > I recently meet a issue that can not query the correct data from hbase
>>> with
>>> > sql by drill, can anybody help me.
>>> >
>>> > I test with the drill 1.6.
>>> > My hbase scheme:
>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>> > cf : v
>>> > qualifier: v, e0, e1
>>> >
>>> > The wrong result only happened when I use group by clause.
>>> >
>>> > This sql will not return correct result:
>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>> > Part of explain of this sql is:
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>> CONVERT_FROM(a.`v`.`e0`,
>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>> > +------+------+
>>> > | text | json |
>>> > +------+------+
>>> > | 00-00    Screen
>>> > 00-01      Project(k=[$0], p=[$1])
>>> > 00-02        UnionExchange
>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > 01-03              Project($f0=[$0], p=[$1])
>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > 02-01                  UnorderedMuxExchange
>>> > 03-01                    Project($f0=[$0], p=[$1],
>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>> startRow=0\x00,
>>> > stopRow=, filter=null], columns=[`*`]]])
>>> >
>>> > The data return very quickly , the result of this sql is :
>>> > +------+--------+
>>> > |  k   |   p    |
>>> > +------+--------+
>>> > | pay  | 12180  |
>>> > +------+--------
>>> >
>>> > But I have millons of data in the table.
>>> >
>>> > I tried to change the physical plan.  if I change the json explain
>>> > *"columns"
>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>> correct
>>> > result.
>>> >
>>> > It seems the physical plan is not correct.
>>> > I also try to debug the sql parser to find out the reason, but its too
>>> > complicate. Can anyone help me.
>>> >
>>> > Also this sql have the same issue.
>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>> > "`row_key`" ] *, it will return the correct result.
>>> >
>>>
>>
>>
>

Reply via email to