Khurram , I send the mail again, the last mail forget to cc to
user@drill.apache.org

The main process is the same, but my rowkey is more complicate,
Here is the detail I tested.
rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
[uid long] + [ts long]
also I have other qualifiers, only qualifier v:v is integer, the others are
string.

example:
hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
ROW                                                          COLUMN+CELL



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
timestamp=1461839343076, value=pay


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
timestamp=1461839343076, value=bijia


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
timestamp=1461839343076, value=browser


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
timestamp=1461839343076, value=*


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
timestamp=1459771200000, value=20


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
timestamp=1459771200000, value=0


 1$\xD2\x00



 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
timestamp=1459771200000, value=\x00\x00\x00\x17


 1$\xD2\x00



1 row(s) in 0.0410 seconds


Here is the example I how the issue look like:

hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
STARTROW=> '0'}
........
 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
timestamp=1463723029448, value=visit


 x01T\x00\x0A\xFA\x00



 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
timestamp=1463723029217, value=visit


 x01T\x00\x0A\xFA\x00



 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
timestamp=1463723029295, value=visit


 x01T\x00\x0A\xFA\x00



9994 row(s) in 123.8650 seconds

the drill result:
0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
 group by a.`v`.`e0`;
+--------+-------+
|   k    |   p   |
+--------+-------+
| visit  | 1216  |
+--------+-------+


I find out that if the row size larger than 10000  will have the issue. The
result is right if less that 1000 rows. But not always that way.
What I can make sure is if I updated the columns in the physical plan and
query by web UI , the result will be correct.


Thanks

2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:

> Qiang, can you please take a look at DRILL-4686 and confirm if the data
> set used in my repro is the same as the one you have used. If the data set
> is different please let us know the type of data that you have used in your
> table.
>
> Aman - I will try to repro the problem on Drill 1.6.0 and share results.
>
> Thanks,
> Khurram
>
> On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> wrote:
>
>> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
>> error whereas the original problem from qiang was an incorrect result.
>> Can
>> you use the same version (1.6) that he was using.  Also, is the data set
>> similar ? If you are unable to repro the exact same issue,  perhaps qiang
>> should file a JIRA with a smaller repro if possible.
>>
>>
>>
>> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com>
>> wrote:
>>
>> > Hello Qiang,
>> >
>> > DRILL-4686 is reported to track this problem.
>> >
>> > Thanks,
>> > Khurram
>> >
>> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> wrote:
>> >
>> >> Ok, Thanks very much.
>> >>
>> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>> >>
>> >>> Hello Qiang,
>> >>>
>> >>> Someone from our Drill team (in San Jose) will get back to you soon. I
>> >>> work from the India lab and I am in a different time zone as compared
>> to
>> >>> San Jose office, some one from MapR San Jose will get back to you as
>> soon
>> >>> as possible.
>> >>>
>> >>> Thanks,
>> >>> Khurram
>> >>>
>> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>> >>>> conclusion?
>> >>>>
>> >>>> Any idea how to sovle it?
>> >>>>
>> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>> >>>>
>> >>>>> So I tried to create the table using HBase API (with no data
>> inserted
>> >>>>> into table) and I got the query plan for drill 1.7.0
>> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> >>>>>
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> >>>>> +--------------+------------+--------------+
>> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> >>>>> +--------------+------------+--------------+
>> >>>>> | row_key      | ANY        | NO           |
>> >>>>> | v            | MAP        | NO           |
>> >>>>> +--------------+------------+--------------+
>> >>>>> 2 rows selected (1.665 seconds)
>> >>>>>
>> >>>>> Table creation Java program
>> >>>>>
>> >>>>> {noformat}
>> >>>>> public class PutIntDataToHBase {
>> >>>>>     public static void main(String args[]) throws IOException {
>> >>>>>         Configuration conf = HBaseConfiguration.create();
>> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> >>>>>         if (admin.tableExists("browser_action2")) {
>> >>>>>             admin.disableTable("browser_action2");
>> >>>>>             admin.deleteTable("browser_action2");
>> >>>>>         }
>> >>>>>
>> >>>>>         byte[][] SPLIT_KEYS =
>> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> >>>>>         HTableDescriptor tableDesc = new
>> >>>>>             HTableDescriptor(TableName.valueOf("browser_action2"));
>> >>>>>
>> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> >>>>>
>> >>>>>     }
>> >>>>> }
>> >>>>> {noformat}
>> >>>>>
>> >>>>> Query plan for the query that was reported as returning wrong
>> results.
>> >>>>>
>> >>>>> {noformat}
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
>> >>>>> +------+------+
>> >>>>> | text | json |
>> >>>>> +------+------+
>> >>>>> | 00-00    Screen
>> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >>>>> 00-02        UnionExchange
>> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>> 02-01                  UnorderedMuxExchange
>> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> startRow=0\x00,
>> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> >>>>> {noformat}
>> >>>>>
>> >>>>> and the query plan for the other problem query mentioned in the
>> first
>> >>>>> email.
>> >>>>>
>> >>>>> {noformat}
>> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> >>>>> hbase.browser_action2 a group by
>> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >>>>> +------+------+
>> >>>>> | text | json |
>> >>>>> +------+------+
>> >>>>> | 00-00    Screen
>> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >>>>> 00-02        UnionExchange
>> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>> 02-01                  UnorderedMuxExchange
>> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> >>>>> row_key=[$0])
>> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> startRow=null,
>> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> >>>>> {noformat}
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Khurram
>> >>>>>
>> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Yes.
>> >>>>>> I use hbase API to create it.
>> >>>>>>
>> >>>>>> The main code is:
>> >>>>>>
>> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
>> {'6'}, {'7'},{'8'}, {'9'} };
>> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
>> >>>>>>
>> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> >>>>>> tableDesc.addFamily(columnDesc);
>> >>>>>>
>> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> >>>>>>
>> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>> >>>>>>
>> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce
>> this
>> >>>>>>> problem so we can try to reproduce it on our end.
>> >>>>>>>
>> >>>>>>> Thanks.
>> >>>>>>>
>> >>>>>>> -- Zelaine
>> >>>>>>>
>> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> > Hi ,
>> >>>>>>> >
>> >>>>>>> > I recently meet a issue that can not query the correct data from
>> >>>>>>> hbase with
>> >>>>>>> > sql by drill, can anybody help me.
>> >>>>>>> >
>> >>>>>>> > I test with the drill 1.6.
>> >>>>>>> > My hbase scheme:
>> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> >>>>>>> > cf : v
>> >>>>>>> > qualifier: v, e0, e1
>> >>>>>>> >
>> >>>>>>> > The wrong result only happened when I use group by clause.
>> >>>>>>> >
>> >>>>>>> > This sql will not return correct result:
>> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`)
>> p
>> >>>>>>> from
>> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
>> a.`v`.`e0`;
>> >>>>>>> > Part of explain of this sql is:
>> >>>>>>> >
>> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a
>> >>>>>>> where
>> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> >>>>>>> > +------+------+
>> >>>>>>> > | text | json |
>> >>>>>>> > +------+------+
>> >>>>>>> > | 00-00    Screen
>> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> >>>>>>> > 00-02        UnionExchange
>> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> >>>>>>> > 02-01                  UnorderedMuxExchange
>> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >>>>>>> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >>>>>>> startRow=0\x00,
>> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> >>>>>>> >
>> >>>>>>> > The data return very quickly , the result of this sql is :
>> >>>>>>> > +------+--------+
>> >>>>>>> > |  k   |   p    |
>> >>>>>>> > +------+--------+
>> >>>>>>> > | pay  | 12180  |
>> >>>>>>> > +------+--------
>> >>>>>>> >
>> >>>>>>> > But I have millons of data in the table.
>> >>>>>>> >
>> >>>>>>> > I tried to change the physical plan.  if I change the json
>> explain
>> >>>>>>> > *"columns"
>> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return
>> the
>> >>>>>>> correct
>> >>>>>>> > result.
>> >>>>>>> >
>> >>>>>>> > It seems the physical plan is not correct.
>> >>>>>>> > I also try to debug the sql parser to find out the reason, but
>> its
>> >>>>>>> too
>> >>>>>>> > complicate. Can anyone help me.
>> >>>>>>> >
>> >>>>>>> > Also this sql have the same issue.
>> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
>> k,
>> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> *"columns" :
>> >>>>>>> [
>> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> >>>>>>> >
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Reply via email to