I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
again, The result are correct now. Thanks for the help.

Even though, I think the query plan still can be optimized.

Here is what I think can improve:
a. specify the columns when need
b. remove the rowFilter when scan startRow and stopRow can meet it.
    for example : select row_key from hbase.browser_action a where
a.row_key >'0' and  a.row_key < '1' , I think sql like this will query
faster without rowfilter.



2016-05-24 9:29 GMT+08:00 qiang li <tiredqi...@gmail.com>:

> Yes, Its seems like the same issue. I will upgrade it and test again. But
> do you think we can update the physical plan too.  If we only want to query
> one qualifier,  then the columns of the plan should only contains the
> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>
> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>:
>
>> Hi Qiang,
>>
>> Looks like you might be encountering this issue:
>> https://issues.apache.org/jira/browse/DRILL-4271
>>
>> Thanks
>>
>> On Sun, May 22, 2016 at 8:38 PM, qiang li <tiredqi...@gmail.com> wrote:
>>
>> > I test it step by step again. And finally I find out that the issue
>> > happened only if the qualifier number is more than 3.
>> >
>> > It's werid, but this is the result I test.
>> >
>> > I tested about 10 thousands row of data. The length of the event is
>> 6,the
>> > code I used to test is like below:
>> >
>> > String[] earr = action.getEvent().geteArr();
>> > for(int i=0;i<6;i++){
>> >     put.addColumn(family, Bytes.toBytes("e"+i),
>> Bytes.toBytes(earr[i]));
>> > }
>> >
>> > Then I test step by step like below :
>> >
>> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
>> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
>> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
>> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>> >  group by a.`v`.`e0`;
>> > +-----+-------+
>> > |  k  |   p   |
>> > +-----+-------+
>> > | e0  | 3856  |
>> > +-----+-------+
>> >
>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>> > +-----+-------+
>> > |  k  |   p   |
>> > +-----+-------+
>> > | e0  | 9094  |
>> > +-----+-------+
>> >
>> >
>> > if I put the qualifier "e3" into the table(that is more than 3
>> qulifier),
>> > then the issue reproduced.
>> >
>> >
>> > 2016-05-23 9:23 GMT+08:00 qiang li <tiredqi...@gmail.com>:
>> >
>> > > Sorry late.
>> > >
>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>> > >
>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>:
>> > >
>> > >> Qiang, Can you please let us know the hbase version and hadoop
>> > >> distribution
>> > >> version that you are using.
>> > >>
>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>> kngu...@maprtech.com>
>> > >> wrote:
>> > >>
>> > >> > Can you please let us know the hbase version and hadoop
>> distribution
>> > >> > version that you are using.
>> > >> >
>> > >> >
>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com>
>> > wrote:
>> > >> >
>> > >> >> Khurram , I send the mail again, the last mail forget to cc to
>> > >> >> user@drill.apache.org
>> > >> >>
>> > >> >> The main process is the same, but my rowkey is more complicate,
>> > >> >> Here is the detail I tested.
>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
>> > >> [event] +
>> > >> >> [uid long] + [ts long]
>> > >> >> also I have other qualifiers, only qualifier v:v is integer, the
>> > others
>> > >> >> are
>> > >> >> string.
>> > >> >>
>> > >> >> example:
>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
>> > >> >> ROW
>> > >> COLUMN+CELL
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e0,
>> > >> >> timestamp=1461839343076, value=pay
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e1,
>> > >> >> timestamp=1461839343076, value=bijia
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e2,
>> > >> >> timestamp=1461839343076, value=browser
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e3,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e4,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:e5,
>> > >> >> timestamp=1461839343076, value=*
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:h,
>> > >> >> timestamp=1459771200000, value=20
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:m,
>> > >> >> timestamp=1459771200000, value=0
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>> > >> column=v:v,
>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>> > >> >>
>> > >> >>
>> > >> >>  1$\xD2\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 1 row(s) in 0.0410 seconds
>> > >> >>
>> > >> >>
>> > >> >> Here is the example I how the issue look like:
>> > >> >>
>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
>> > >> >> STARTROW=> '0'}
>> > >> >> ........
>> > >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029448, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029217, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>> > >> column=v:e0,
>> > >> >> timestamp=1463723029295, value=visit
>> > >> >>
>> > >> >>
>> > >> >>  x01T\x00\x0A\xFA\x00
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> 9994 row(s) in 123.8650 seconds
>> > >> >>
>> > >> >> the drill result:
>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8')
>> as k,
>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key >
>> > '0'
>> > >> >>  group by a.`v`.`e0`;
>> > >> >> +--------+-------+
>> > >> >> |   k    |   p   |
>> > >> >> +--------+-------+
>> > >> >> | visit  | 1216  |
>> > >> >> +--------+-------+
>> > >> >>
>> > >> >>
>> > >> >> I find out that if the row size larger than 10000  will have the
>> > issue.
>> > >> >> The
>> > >> >> result is right if less that 1000 rows. But not always that way.
>> > >> >> What I can make sure is if I updated the columns in the physical
>> plan
>> > >> and
>> > >> >> query by web UI , the result will be correct.
>> > >> >>
>> > >> >>
>> > >> >> Thanks
>> > >> >>
>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
>> > >> >>
>> > >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if
>> the
>> > >> data
>> > >> >> > set used in my repro is the same as the one you have used. If
>> the
>> > >> data
>> > >> >> set
>> > >> >> > is different please let us know the type of data that you have
>> used
>> > >> in
>> > >> >> your
>> > >> >> > table.
>> > >> >> >
>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
>> > >> results.
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Khurram
>> > >> >> >
>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>> amansi...@apache.org
>> > >
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Khurram,  DRILL-4686 seems like a different issue...it is
>> > reporting
>> > >> an
>> > >> >> >> error whereas the original problem from qiang was an incorrect
>> > >> result.
>> > >> >> >> Can
>> > >> >> >> you use the same version (1.6) that he was using.  Also, is the
>> > data
>> > >> >> set
>> > >> >> >> similar ? If you are unable to repro the exact same issue,
>> > perhaps
>> > >> >> qiang
>> > >> >> >> should file a JIRA with a smaller repro if possible.
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
>> > >> kfar...@maprtech.com>
>> > >> >> >> wrote:
>> > >> >> >>
>> > >> >> >> > Hello Qiang,
>> > >> >> >> >
>> > >> >> >> > DRILL-4686 is reported to track this problem.
>> > >> >> >> >
>> > >> >> >> > Thanks,
>> > >> >> >> > Khurram
>> > >> >> >> >
>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <
>> tiredqi...@gmail.com
>> > >
>> > >> >> wrote:
>> > >> >> >> >
>> > >> >> >> >> Ok, Thanks very much.
>> > >> >> >> >>
>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <
>> > kfar...@maprtech.com
>> > >> >:
>> > >> >> >> >>
>> > >> >> >> >>> Hello Qiang,
>> > >> >> >> >>>
>> > >> >> >> >>> Someone from our Drill team (in San Jose) will get back to
>> you
>> > >> >> soon. I
>> > >> >> >> >>> work from the India lab and I am in a different time zone
>> as
>> > >> >> compared
>> > >> >> >> to
>> > >> >> >> >>> San Jose office, some one from MapR San Jose will get back
>> to
>> > >> you
>> > >> >> as
>> > >> >> >> soon
>> > >> >> >> >>> as possible.
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Khurram
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <
>> > tiredqi...@gmail.com
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's
>> the
>> > >> >> >> >>>> conclusion?
>> > >> >> >> >>>>
>> > >> >> >> >>>> Any idea how to sovle it?
>> > >> >> >> >>>>
>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <
>> > >> kfar...@maprtech.com>:
>> > >> >> >> >>>>
>> > >> >> >> >>>>> So I tried to create the table using HBase API (with no
>> data
>> > >> >> >> inserted
>> > >> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> | row_key      | ANY        | NO           |
>> > >> >> >> >>>>> | v            | MAP        | NO           |
>> > >> >> >> >>>>> +--------------+------------+--------------+
>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Table creation Java program
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> public class PutIntDataToHBase {
>> > >> >> >> >>>>>     public static void main(String args[]) throws
>> > IOException
>> > >> {
>> > >> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
>> > >> >> >> >>>>>
>> > >>  conf.set("hbase.zookeeper.property.clientPort","5181");
>> > >> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
>> > >> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
>> > >> >> >> >>>>>             admin.disableTable("browser_action2");
>> > >> >> >> >>>>>             admin.deleteTable("browser_action2");
>> > >> >> >> >>>>>         }
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>         byte[][] SPLIT_KEYS =
>> > >> >> >> >>>>>
>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>> > >> >> >> >>>>>         HTableDescriptor tableDesc = new
>> > >> >> >> >>>>>
>> > >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
>> > >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>     }
>> > >> >> >> >>>>> }
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Query plan for the query that was reported as returning
>> > wrong
>> > >> >> >> results.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`)
>> p
>> > >> from
>> > >> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
>> > >> >> a.`v`.`e0`;
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00    Screen
>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02        UnionExchange
>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>> > p=[COUNT($0)])
>> > >> >> >> >>>>> 03-03                        Project($f0=[ITEM($1,
>> 'e0')])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=0\x00,
>> > >> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> and the query plan for the other problem query mentioned
>> in
>> > >> the
>> > >> >> >> first
>> > >> >> >> >>>>> email.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as
>> k,
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
>> > >> >> >> >>>>> hbase.browser_action2 a group by
>> > >> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 ,
>> 9);
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | text | json |
>> > >> >> >> >>>>> +------+------+
>> > >> >> >> >>>>> | 00-00    Screen
>> > >> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>> 00-02        UnionExchange
>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>> 03-02                      HashAgg(group=[{0}],
>> > p=[COUNT($1)])
>> > >> >> >> >>>>> 03-03
>> Project($f0=[BYTE_SUBSTR($0, 1,
>> > >> 9)],
>> > >> >> >> >>>>> row_key=[$0])
>> > >> >> >> >>>>> 03-04
>> > Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
>> > >> >> >> startRow=null,
>> > >> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>> {noformat}
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Thanks,
>> > >> >> >> >>>>> Khurram
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
>> > >> tiredqi...@gmail.com>
>> > >> >> >> >>>>> wrote:
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>> Yes.
>> > >> >> >> >>>>>> I use hbase API to create it.
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> The main code is:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'},
>> {'4'},
>> > >> >> {'5'},
>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>> > >> >> >> >>>>>> TableName tableName =
>> TableName.valueOf("browser_action2");
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> HTableDescriptor tableDesc = new
>> > HTableDescriptor(tableName);
>> > >> >> >> >>>>>> HColumnDescriptor columnDesc = new
>> HColumnDescriptor("v");
>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>> > >> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <
>> zf...@maprtech.com
>> > >:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
>> > >> >> reproduce
>> > >> >> >> this
>> > >> >> >> >>>>>>> problem so we can try to reproduce it on our end.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> Thanks.
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> -- Zelaine
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
>> > >> >> tiredqi...@gmail.com>
>> > >> >> >> >>>>>>> wrote:
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>> > Hi ,
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I recently meet a issue that can not query the
>> correct
>> > >> data
>> > >> >> from
>> > >> >> >> >>>>>>> hbase with
>> > >> >> >> >>>>>>> > sql by drill, can anybody help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I test with the drill 1.6.
>> > >> >> >> >>>>>>> > My hbase scheme:
>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg:
>> > 120160411visituidts
>> > >> >> >> >>>>>>> > cf : v
>> > >> >> >> >>>>>>> > qualifier: v, e0, e1
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The wrong result only happened when I use group by
>> > clause.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > This sql will not return correct result:
>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>> > >> >> count(a.`v`.`e0`)
>> > >> >> >> p
>> > >> >> >> >>>>>>> from
>> > >> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group
>> by
>> > >> >> >> a.`v`.`e0`;
>> > >> >> >> >>>>>>> > Part of explain of this sql is:
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>> > >> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
>> > >> hbase.browser_action2
>> > >> >> a
>> > >> >> >> >>>>>>> where
>> > >> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | text | json |
>> > >> >> >> >>>>>>> > +------+------+
>> > >> >> >> >>>>>>> > | 00-00    Screen
>> > >> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 00-02        UnionExchange
>> > >> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)],
>> p=[$1])
>> > >> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
>> > >> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
>> > >> >> >> >>>>>>> > 01-04
>> HashToRandomExchange(dist0=[[$0]])
>> > >> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
>> > >> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>> > >> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
>> > >> >> p=[COUNT($0)])
>> > >> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1,
>> > 'e0')])
>> > >> >> >> >>>>>>> > 03-04
>> > >> Scan(groupscan=[HBaseGroupScan
>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>> [tableName=browser_action2,
>> > >> >> >> >>>>>>> startRow=0\x00,
>> > >> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > The data return very quickly , the result of this sql
>> > is :
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > |  k   |   p    |
>> > >> >> >> >>>>>>> > +------+--------+
>> > >> >> >> >>>>>>> > | pay  | 12180  |
>> > >> >> >> >>>>>>> > +------+--------
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > But I have millons of data in the table.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > I tried to change the physical plan.  if I change the
>> > json
>> > >> >> >> explain
>> > >> >> >> >>>>>>> > *"columns"
>> > >> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it
>> will
>> > >> >> return
>> > >> >> >> the
>> > >> >> >> >>>>>>> correct
>> > >> >> >> >>>>>>> > result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > It seems the physical plan is not correct.
>> > >> >> >> >>>>>>> > I also try to debug the sql parser to find out the
>> > reason,
>> > >> >> but
>> > >> >> >> its
>> > >> >> >> >>>>>>> too
>> > >> >> >> >>>>>>> > complicate. Can anyone help me.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>> > Also this sql have the same issue.
>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9),
>> > 'UTF8')
>> > >> >> as
>> > >> >> >> k,
>> > >> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a
>> group by
>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
>> > >> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
>> > >> >> >> *"columns" :
>> > >> >> >> >>>>>>> [
>> > >> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
>> > >> >> >> >>>>>>> >
>> > >> >> >> >>>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Reply via email to