Re: query from hbase issue

qiang li Wed, 18 May 2016 02:47:55 -0700

Ok, Thanks very much.

2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:


> Hello Qiang,
>
> Someone from our Drill team (in San Jose) will get back to you soon. I
> work from the India lab and I am in a different time zone as compared to
> San Jose office, some one from MapR San Jose will get back to you as soon
> as possible.
>
> Thanks,
> Khurram
>
> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> wrote:
>
>> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?
>>
>> Any idea how to sovle it?
>>
>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>>
>>> So I tried to create the table using HBase API (with no data inserted
>>> into table) and I got the query plan for drill 1.7.0
>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>>
>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>>> +--------------+------------+--------------+
>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>>> +--------------+------------+--------------+
>>> | row_key      | ANY        | NO           |
>>> | v            | MAP        | NO           |
>>> +--------------+------------+--------------+
>>> 2 rows selected (1.665 seconds)
>>>
>>> Table creation Java program
>>>
>>> {noformat}
>>> public class PutIntDataToHBase {
>>>     public static void main(String args[]) throws IOException {
>>>         Configuration conf = HBaseConfiguration.create();
>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>>         if (admin.tableExists("browser_action2")) {
>>>             admin.disableTable("browser_action2");
>>>             admin.deleteTable("browser_action2");
>>>         }
>>>
>>>         byte[][] SPLIT_KEYS =
>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>>         HTableDescriptor tableDesc = new
>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>>
>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>>
>>>     }
>>> }
>>> {noformat}
>>>
>>> Query plan for the query that was reported as returning wrong results.
>>>
>>> {noformat}
>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>> +------+------+
>>> | text | json |
>>> +------+------+
>>> | 00-00    Screen
>>> 00-01      Project(k=[$0], p=[$1])
>>> 00-02        UnionExchange
>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> 01-03              Project($f0=[$0], p=[$1])
>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> 02-01                  UnorderedMuxExchange
>>> 03-01                    Project($f0=[$0], p=[$1],
>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>>> stopRow=, filter=null], columns=[`*`]]])
>>> {noformat}
>>>
>>> and the query plan for the other problem query mentioned in the first
>>> email.
>>>
>>> {noformat}
>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>>> hbase.browser_action2 a group by
>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>>> +------+------+
>>> | text | json |
>>> +------+------+
>>> | 00-00    Screen
>>> 00-01      Project(k=[$0], p=[$1])
>>> 00-02        UnionExchange
>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>> 01-03              Project($f0=[$0], p=[$1])
>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> 02-01                  UnorderedMuxExchange
>>> 03-01                    Project($f0=[$0], p=[$1],
>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>>> row_key=[$0])
>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>>> stopRow=null, filter=null], columns=[`*`]]])
>>> {noformat}
>>>
>>> Thanks,
>>> Khurram
>>>
>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> wrote:
>>>
>>>> Yes.
>>>> I use hbase API to create it.
>>>>
>>>> The main code is:
>>>>
>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, 
>>>> {'7'},{'8'}, {'9'} };
>>>> TableName tableName = TableName.valueOf("browser_action2");
>>>>
>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>>> tableDesc.addFamily(columnDesc);
>>>>
>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>>
>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>>
>>>>
>>>>
>>>>
>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>>
>>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>>> problem so we can try to reproduce it on our end.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> -- Zelaine
>>>>>
>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi ,
>>>>> >
>>>>> > I recently meet a issue that can not query the correct data from
>>>>> hbase with
>>>>> > sql by drill, can anybody help me.
>>>>> >
>>>>> > I test with the drill 1.6.
>>>>> > My hbase scheme:
>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>>> > cf : v
>>>>> > qualifier: v, e0, e1
>>>>> >
>>>>> > The wrong result only happened when I use group by clause.
>>>>> >
>>>>> > This sql will not return correct result:
>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p
>>>>> from
>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>>> > Part of explain of this sql is:
>>>>> >
>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>>> CONVERT_FROM(a.`v`.`e0`,
>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>>> > +------+------+
>>>>> > | text | json |
>>>>> > +------+------+
>>>>> > | 00-00    Screen
>>>>> > 00-01      Project(k=[$0], p=[$1])
>>>>> > 00-02        UnionExchange
>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>>> > 02-01                  UnorderedMuxExchange
>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>>> startRow=0\x00,
>>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>>> >
>>>>> > The data return very quickly , the result of this sql is :
>>>>> > +------+--------+
>>>>> > |  k   |   p    |
>>>>> > +------+--------+
>>>>> > | pay  | 12180  |
>>>>> > +------+--------
>>>>> >
>>>>> > But I have millons of data in the table.
>>>>> >
>>>>> > I tried to change the physical plan.  if I change the json explain
>>>>> > *"columns"
>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>>> correct
>>>>> > result.
>>>>> >
>>>>> > It seems the physical plan is not correct.
>>>>> > I also try to debug the sql parser to find out the reason, but its
>>>>> too
>>>>> > complicate. Can anyone help me.
>>>>> >
>>>>> > Also this sql have the same issue.
>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>>>> > "`row_key`" ] *, it will return the correct result.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: query from hbase issue

Reply via email to