Qiang, Can you please let us know the hbase version and hadoop distribution
version that you are using.

On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kngu...@maprtech.com>
wrote:

> Can you please let us know the hbase version and hadoop distribution
> version that you are using.
>
>
> On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com> wrote:
>
>> Khurram , I send the mail again, the last mail forget to cc to
>> user@drill.apache.org
>>
>> The main process is the same, but my rowkey is more complicate,
>> Here is the detail I tested.
>> rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] +
>> [uid long] + [ts long]
>> also I have other qualifiers, only qualifier v:v is integer, the others
>> are
>> string.
>>
>> example:
>> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> ROW                                                          COLUMN+CELL
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0,
>> timestamp=1461839343076, value=pay
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1,
>> timestamp=1461839343076, value=bijia
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2,
>> timestamp=1461839343076, value=browser
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5,
>> timestamp=1461839343076, value=*
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
>> timestamp=1459771200000, value=20
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
>> timestamp=1459771200000, value=0
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
>> timestamp=1459771200000, value=\x00\x00\x00\x17
>>
>>
>>  1$\xD2\x00
>>
>>
>>
>> 1 row(s) in 0.0410 seconds
>>
>>
>> Here is the example I how the issue look like:
>>
>> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> STARTROW=> '0'}
>> ........
>>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0,
>> timestamp=1463723029448, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0,
>> timestamp=1463723029217, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0,
>> timestamp=1463723029295, value=visit
>>
>>
>>  x01T\x00\x0A\xFA\x00
>>
>>
>>
>> 9994 row(s) in 123.8650 seconds
>>
>> the drill result:
>> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
>>  group by a.`v`.`e0`;
>> +--------+-------+
>> |   k    |   p   |
>> +--------+-------+
>> | visit  | 1216  |
>> +--------+-------+
>>
>>
>> I find out that if the row size larger than 10000  will have the issue.
>> The
>> result is right if less that 1000 rows. But not always that way.
>> What I can make sure is if I updated the columns in the physical plan and
>> query by web UI , the result will be correct.
>>
>>
>> Thanks
>>
>> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>>
>> > Qiang, can you please take a look at DRILL-4686 and confirm if the data
>> > set used in my repro is the same as the one you have used. If the data
>> set
>> > is different please let us know the type of data that you have used in
>> your
>> > table.
>> >
>> > Aman - I will try to repro the problem on Drill 1.6.0 and share results.
>> >
>> > Thanks,
>> > Khurram
>> >
>> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org>
>> wrote:
>> >
>> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting an
>> >> error whereas the original problem from qiang was an incorrect result.
>> >> Can
>> >> you use the same version (1.6) that he was using.  Also, is the data
>> set
>> >> similar ? If you are unable to repro the exact same issue,  perhaps
>> qiang
>> >> should file a JIRA with a smaller repro if possible.
>> >>
>> >>
>> >>
>> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com>
>> >> wrote:
>> >>
>> >> > Hello Qiang,
>> >> >
>> >> > DRILL-4686 is reported to track this problem.
>> >> >
>> >> > Thanks,
>> >> > Khurram
>> >> >
>> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com>
>> wrote:
>> >> >
>> >> >> Ok, Thanks very much.
>> >> >>
>> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>> >> >>
>> >> >>> Hello Qiang,
>> >> >>>
>> >> >>> Someone from our Drill team (in San Jose) will get back to you
>> soon. I
>> >> >>> work from the India lab and I am in a different time zone as
>> compared
>> >> to
>> >> >>> San Jose office, some one from MapR San Jose will get back to you
>> as
>> >> soon
>> >> >>> as possible.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Khurram
>> >> >>>
>> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
>> >> >>>> conclusion?
>> >> >>>>
>> >> >>>> Any idea how to sovle it?
>> >> >>>>
>> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>> >> >>>>
>> >> >>>>> So I tried to create the table using HBase API (with no data
>> >> inserted
>> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> >> >>>>>
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> | row_key      | ANY        | NO           |
>> >> >>>>> | v            | MAP        | NO           |
>> >> >>>>> +--------------+------------+--------------+
>> >> >>>>> 2 rows selected (1.665 seconds)
>> >> >>>>>
>> >> >>>>> Table creation Java program
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> public class PutIntDataToHBase {
>> >> >>>>>     public static void main(String args[]) throws IOException {
>> >> >>>>>         Configuration conf = HBaseConfiguration.create();
>> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
>> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> >> >>>>>         if (admin.tableExists("browser_action2")) {
>> >> >>>>>             admin.disableTable("browser_action2");
>> >> >>>>>             admin.deleteTable("browser_action2");
>> >> >>>>>         }
>> >> >>>>>
>> >> >>>>>         byte[][] SPLIT_KEYS =
>> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> >> >>>>>         HTableDescriptor tableDesc = new
>> >> >>>>>
>>  HTableDescriptor(TableName.valueOf("browser_action2"));
>> >> >>>>>
>> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> >> >>>>>
>> >> >>>>>     }
>> >> >>>>> }
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> Query plan for the query that was reported as returning wrong
>> >> results.
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
>> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>> a.`v`.`e0`;
>> >> >>>>> +------+------+
>> >> >>>>> | text | json |
>> >> >>>>> +------+------+
>> >> >>>>> | 00-00    Screen
>> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >>>>> 00-02        UnionExchange
>> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
>> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> startRow=0\x00,
>> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> and the query plan for the other problem query mentioned in the
>> >> first
>> >> >>>>> email.
>> >> >>>>>
>> >> >>>>> {noformat}
>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
>> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> >> >>>>> hbase.browser_action2 a group by
>> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >>>>> +------+------+
>> >> >>>>> | text | json |
>> >> >>>>> +------+------+
>> >> >>>>> | 00-00    Screen
>> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> >> >>>>> 00-02        UnionExchange
>> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>> 02-01                  UnorderedMuxExchange
>> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
>> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1, 9)],
>> >> >>>>> row_key=[$0])
>> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> startRow=null,
>> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> >> >>>>> {noformat}
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Khurram
>> >> >>>>>
>> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Yes.
>> >> >>>>>> I use hbase API to create it.
>> >> >>>>>>
>> >> >>>>>> The main code is:
>> >> >>>>>>
>> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
>> {'5'},
>> >> {'6'}, {'7'},{'8'}, {'9'} };
>> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
>> >> >>>>>>
>> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
>> >> >>>>>> tableDesc.addFamily(columnDesc);
>> >> >>>>>>
>> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> >> >>>>>>
>> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
>> >> >>>>>>
>> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> reproduce
>> >> this
>> >> >>>>>>> problem so we can try to reproduce it on our end.
>> >> >>>>>>>
>> >> >>>>>>> Thanks.
>> >> >>>>>>>
>> >> >>>>>>> -- Zelaine
>> >> >>>>>>>
>> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> tiredqi...@gmail.com>
>> >> >>>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> > Hi ,
>> >> >>>>>>> >
>> >> >>>>>>> > I recently meet a issue that can not query the correct data
>> from
>> >> >>>>>>> hbase with
>> >> >>>>>>> > sql by drill, can anybody help me.
>> >> >>>>>>> >
>> >> >>>>>>> > I test with the drill 1.6.
>> >> >>>>>>> > My hbase scheme:
>> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
>> >> >>>>>>> > cf : v
>> >> >>>>>>> > qualifier: v, e0, e1
>> >> >>>>>>> >
>> >> >>>>>>> > The wrong result only happened when I use group by clause.
>> >> >>>>>>> >
>> >> >>>>>>> > This sql will not return correct result:
>> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> count(a.`v`.`e0`)
>> >> p
>> >> >>>>>>> from
>> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
>> >> a.`v`.`e0`;
>> >> >>>>>>> > Part of explain of this sql is:
>> >> >>>>>>> >
>> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2
>> a
>> >> >>>>>>> where
>> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> >> >>>>>>> > +------+------+
>> >> >>>>>>> > | text | json |
>> >> >>>>>>> > +------+------+
>> >> >>>>>>> > | 00-00    Screen
>> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> >> >>>>>>> > 00-02        UnionExchange
>> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
>> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>> p=[COUNT($0)])
>> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
>> >> >>>>>>> > 03-04                          Scan(groupscan=[HBaseGroupScan
>> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> >> >>>>>>> startRow=0\x00,
>> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> >> >>>>>>> >
>> >> >>>>>>> > The data return very quickly , the result of this sql is :
>> >> >>>>>>> > +------+--------+
>> >> >>>>>>> > |  k   |   p    |
>> >> >>>>>>> > +------+--------+
>> >> >>>>>>> > | pay  | 12180  |
>> >> >>>>>>> > +------+--------
>> >> >>>>>>> >
>> >> >>>>>>> > But I have millons of data in the table.
>> >> >>>>>>> >
>> >> >>>>>>> > I tried to change the physical plan.  if I change the json
>> >> explain
>> >> >>>>>>> > *"columns"
>> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
>> return
>> >> the
>> >> >>>>>>> correct
>> >> >>>>>>> > result.
>> >> >>>>>>> >
>> >> >>>>>>> > It seems the physical plan is not correct.
>> >> >>>>>>> > I also try to debug the sql parser to find out the reason,
>> but
>> >> its
>> >> >>>>>>> too
>> >> >>>>>>> > complicate. Can anyone help me.
>> >> >>>>>>> >
>> >> >>>>>>> > Also this sql have the same issue.
>> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')
>> as
>> >> k,
>> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
>> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> >> *"columns" :
>> >> >>>>>>> [
>> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> >> >>>>>>> >
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>

Reply via email to