Hello Qiang,

Someone from our Drill team (in San Jose) will get back to you soon. I work
from the India lab and I am in a different time zone as compared to San
Jose office, some one from MapR San Jose will get back to you as soon as
possible.

Thanks,
Khurram

On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> wrote:

> Hi Khurram, Thanks very much to reproduce it, so what's the conclusion?
>
> Any idea how to sovle it?
>
> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>
>> So I tried to create the table using HBase API (with no data inserted
>> into table) and I got the query plan for drill 1.7.0
>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>>
>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> +--------------+------------+--------------+
>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> +--------------+------------+--------------+
>> | row_key      | ANY        | NO           |
>> | v            | MAP        | NO           |
>> +--------------+------------+--------------+
>> 2 rows selected (1.665 seconds)
>>
>> Table creation Java program
>>
>> {noformat}
>> public class PutIntDataToHBase {
>>     public static void main(String args[]) throws IOException {
>>         Configuration conf = HBaseConfiguration.create();
>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>>         HBaseAdmin admin = new HBaseAdmin(conf);
>>         if (admin.tableExists("browser_action2")) {
>>             admin.disableTable("browser_action2");
>>             admin.deleteTable("browser_action2");
>>         }
>>
>>         byte[][] SPLIT_KEYS =
>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>         HTableDescriptor tableDesc = new
>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>>
>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>
>>     }
>> }
>> {noformat}
>>
>> Query plan for the query that was reported as returning wrong results.
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>> +------+------+
>> | text | json |
>> +------+------+
>> | 00-00    Screen
>> 00-01      Project(k=[$0], p=[$1])
>> 00-02        UnionExchange
>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> 01-03              Project($f0=[$0], p=[$1])
>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> 02-01                  UnorderedMuxExchange
>> 03-01                    Project($f0=[$0], p=[$1],
>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
>> stopRow=, filter=null], columns=[`*`]]])
>> {noformat}
>>
>> and the query plan for the other problem query mentioned in the first
>> email.
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> hbase.browser_action2 a group by
>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> +------+------+
>> | text | json |
>> +------+------+
>> | 00-00    Screen
>> 00-01      Project(k=[$0], p=[$1])
>> 00-02        UnionExchange
>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> 01-03              Project($f0=[$0], p=[$1])
>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> 02-01                  UnorderedMuxExchange
>> 03-01                    Project($f0=[$0], p=[$1],
>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> row_key=[$0])
>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=null,
>> stopRow=null, filter=null], columns=[`*`]]])
>> {noformat}
>>
>> Thanks,
>> Khurram
>>
>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> wrote:
>>
>>> Yes.
>>> I use hbase API to create it.
>>>
>>> The main code is:
>>>
>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, 
>>> {'7'},{'8'}, {'9'} };
>>> TableName tableName = TableName.valueOf("browser_action2");
>>>
>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>>> tableDesc.addFamily(columnDesc);
>>>
>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>>
>>> admin.createTable(tableDesc, SPLIT_KEYS);
>>>
>>>
>>>
>>>
>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>>>
>>>> Can you provide the CREATE TABLE statement you used to reproduce this
>>>> problem so we can try to reproduce it on our end.
>>>>
>>>> Thanks.
>>>>
>>>> -- Zelaine
>>>>
>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> wrote:
>>>>
>>>> > Hi ,
>>>> >
>>>> > I recently meet a issue that can not query the correct data from
>>>> hbase with
>>>> > sql by drill, can anybody help me.
>>>> >
>>>> > I test with the drill 1.6.
>>>> > My hbase scheme:
>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>>>> > cf : v
>>>> > qualifier: v, e0, e1
>>>> >
>>>> > The wrong result only happened when I use group by clause.
>>>> >
>>>> > This sql will not return correct result:
>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>>>> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>>>> > Part of explain of this sql is:
>>>> >
>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>>>> CONVERT_FROM(a.`v`.`e0`,
>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>>>> > +------+------+
>>>> > | text | json |
>>>> > +------+------+
>>>> > | 00-00    Screen
>>>> > 00-01      Project(k=[$0], p=[$1])
>>>> > 00-02        UnionExchange
>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>>>> > 01-03              Project($f0=[$0], p=[$1])
>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>>>> > 02-01                  UnorderedMuxExchange
>>>> > 03-01                    Project($f0=[$0], p=[$1],
>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>>>> startRow=0\x00,
>>>> > stopRow=, filter=null], columns=[`*`]]])
>>>> >
>>>> > The data return very quickly , the result of this sql is :
>>>> > +------+--------+
>>>> > |  k   |   p    |
>>>> > +------+--------+
>>>> > | pay  | 12180  |
>>>> > +------+--------
>>>> >
>>>> > But I have millons of data in the table.
>>>> >
>>>> > I tried to change the physical plan.  if I change the json explain
>>>> > *"columns"
>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
>>>> correct
>>>> > result.
>>>> >
>>>> > It seems the physical plan is not correct.
>>>> > I also try to debug the sql parser to find out the reason, but its too
>>>> > complicate. Can anyone help me.
>>>> >
>>>> > Also this sql have the same issue.
>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>>>> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
>>>> > "`row_key`" ] *, it will return the correct result.
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to