Re: query from hbase issue

qiang li Sun, 22 May 2016 18:24:07 -0700

Sorry late.

Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.


2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>:

> Qiang, Can you please let us know the hbase version and hadoop distribution
> version that you are using.
>
> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kngu...@maprtech.com>
> wrote:
>
> > Can you please let us know the hbase version and hadoop distribution
> > version that you are using.
> >
> >
> > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com> wrote:
> >
> >> Khurram , I send the mail again, the last mail forget to cc to
> >> user@drill.apache.org
> >>
> >> The main process is the same, but my rowkey is more complicate,
> >> Here is the detail I tested.
> >> rowkey is like this : [salt 1byte string] + [day 8byte string] +
> [event] +
> >> [uid long] + [ts long]
> >> also I have other qualifiers, only qualifier v:v is integer, the others
> >> are
> >> string.
> >>
> >> example:
> >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1}
> >> ROW                                                          COLUMN+CELL
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e0,
> >> timestamp=1461839343076, value=pay
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e1,
> >> timestamp=1461839343076, value=bijia
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e2,
> >> timestamp=1461839343076, value=browser
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e3,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e4,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
> column=v:e5,
> >> timestamp=1461839343076, value=*
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h,
> >> timestamp=1459771200000, value=20
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m,
> >> timestamp=1459771200000, value=0
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v,
> >> timestamp=1459771200000, value=\x00\x00\x00\x17
> >>
> >>
> >>  1$\xD2\x00
> >>
> >>
> >>
> >> 1 row(s) in 0.0410 seconds
> >>
> >>
> >> Here is the example I how the issue look like:
> >>
> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'],
> >> STARTROW=> '0'}
> >> ........
> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
> column=v:e0,
> >> timestamp=1463723029448, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
> column=v:e0,
> >> timestamp=1463723029217, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
> column=v:e0,
> >> timestamp=1463723029295, value=visit
> >>
> >>
> >>  x01T\x00\x0A\xFA\x00
> >>
> >>
> >>
> >> 9994 row(s) in 123.8650 seconds
> >>
> >> the drill result:
> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0'
> >>  group by a.`v`.`e0`;
> >> +--------+-------+
> >> |   k    |   p   |
> >> +--------+-------+
> >> | visit  | 1216  |
> >> +--------+-------+
> >>
> >>
> >> I find out that if the row size larger than 10000  will have the issue.
> >> The
> >> result is right if less that 1000 rows. But not always that way.
> >> What I can make sure is if I updated the columns in the physical plan
> and
> >> query by web UI , the result will be correct.
> >>
> >>
> >> Thanks
> >>
> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >>
> >> > Qiang, can you please take a look at DRILL-4686 and confirm if the
> data
> >> > set used in my repro is the same as the one you have used. If the data
> >> set
> >> > is different please let us know the type of data that you have used in
> >> your
> >> > table.
> >> >
> >> > Aman - I will try to repro the problem on Drill 1.6.0 and share
> results.
> >> >
> >> > Thanks,
> >> > Khurram
> >> >
> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org>
> >> wrote:
> >> >
> >> >> Khurram,  DRILL-4686 seems like a different issue...it is reporting
> an
> >> >> error whereas the original problem from qiang was an incorrect
> result.
> >> >> Can
> >> >> you use the same version (1.6) that he was using.  Also, is the data
> >> set
> >> >> similar ? If you are unable to repro the exact same issue,  perhaps
> >> qiang
> >> >> should file a JIRA with a smaller repro if possible.
> >> >>
> >> >>
> >> >>
> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <
> kfar...@maprtech.com>
> >> >> wrote:
> >> >>
> >> >> > Hello Qiang,
> >> >> >
> >> >> > DRILL-4686 is reported to track this problem.
> >> >> >
> >> >> > Thanks,
> >> >> > Khurram
> >> >> >
> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Ok, Thanks very much.
> >> >> >>
> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>:
> >> >> >>
> >> >> >>> Hello Qiang,
> >> >> >>>
> >> >> >>> Someone from our Drill team (in San Jose) will get back to you
> >> soon. I
> >> >> >>> work from the India lab and I am in a different time zone as
> >> compared
> >> >> to
> >> >> >>> San Jose office, some one from MapR San Jose will get back to you
> >> as
> >> >> soon
> >> >> >>> as possible.
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Khurram
> >> >> >>>
> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com>
> >> >> wrote:
> >> >> >>>
> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the
> >> >> >>>> conclusion?
> >> >> >>>>
> >> >> >>>> Any idea how to sovle it?
> >> >> >>>>
> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com
> >:
> >> >> >>>>
> >> >> >>>>> So I tried to create the table using HBase API (with no data
> >> >> inserted
> >> >> >>>>> into table) and I got the query plan for drill 1.7.0
> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit ID :  09b26277
> >> >> >>>>>
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2;
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> | row_key      | ANY        | NO           |
> >> >> >>>>> | v            | MAP        | NO           |
> >> >> >>>>> +--------------+------------+--------------+
> >> >> >>>>> 2 rows selected (1.665 seconds)
> >> >> >>>>>
> >> >> >>>>> Table creation Java program
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> public class PutIntDataToHBase {
> >> >> >>>>>     public static void main(String args[]) throws IOException {
> >> >> >>>>>         Configuration conf = HBaseConfiguration.create();
> >> >> >>>>>         conf.set("hbase.zookeeper.property.clientPort","5181");
> >> >> >>>>>         HBaseAdmin admin = new HBaseAdmin(conf);
> >> >> >>>>>         if (admin.tableExists("browser_action2")) {
> >> >> >>>>>             admin.disableTable("browser_action2");
> >> >> >>>>>             admin.deleteTable("browser_action2");
> >> >> >>>>>         }
> >> >> >>>>>
> >> >> >>>>>         byte[][] SPLIT_KEYS =
> >> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
> >> >> >>>>>         HTableDescriptor tableDesc = new
> >> >> >>>>>
> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
> >> >> >>>>>
> >> >> >>>>>         tableDesc.addFamily(new HColumnDescriptor("v"));
> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
> >> >> >>>>>
> >> >> >>>>>     }
> >> >> >>>>> }
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> Query plan for the query that was reported as returning wrong
> >> >> results.
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> >> >> >>>>> hbase.browser_action2 a where a.row_key > '0'  group by
> >> a.`v`.`e0`;
> >> >> >>>>> +------+------+
> >> >> >>>>> | text | json |
> >> >> >>>>> +------+------+
> >> >> >>>>> | 00-00    Screen
> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>> 00-02        UnionExchange
> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> >> >> >>>>> 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> startRow=0\x00,
> >> >> >>>>> stopRow=, filter=null], columns=[`*`]]])
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> and the query plan for the other problem query mentioned in the
> >> >> first
> >> >> >>>>> email.
> >> >> >>>>>
> >> >> >>>>> {noformat}
> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select
> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from
> >> >> >>>>> hbase.browser_action2 a group by
> >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >>>>> +------+------+
> >> >> >>>>> | text | json |
> >> >> >>>>> +------+------+
> >> >> >>>>> | 00-00    Screen
> >> >> >>>>> 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>> 00-02        UnionExchange
> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>> 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>> 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>> 02-01                  UnorderedMuxExchange
> >> >> >>>>> 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>> 03-02                      HashAgg(group=[{0}], p=[COUNT($1)])
> >> >> >>>>> 03-03                        Project($f0=[BYTE_SUBSTR($0, 1,
> 9)],
> >> >> >>>>> row_key=[$0])
> >> >> >>>>> 03-04                          Scan(groupscan=[HBaseGroupScan
> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> startRow=null,
> >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]])
> >> >> >>>>> {noformat}
> >> >> >>>>>
> >> >> >>>>> Thanks,
> >> >> >>>>> Khurram
> >> >> >>>>>
> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <
> tiredqi...@gmail.com>
> >> >> >>>>> wrote:
> >> >> >>>>>
> >> >> >>>>>> Yes.
> >> >> >>>>>> I use hbase API to create it.
> >> >> >>>>>>
> >> >> >>>>>> The main code is:
> >> >> >>>>>>
> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'},
> >> {'5'},
> >> >> {'6'}, {'7'},{'8'}, {'9'} };
> >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2");
> >> >> >>>>>>
> >> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName);
> >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v");
> >> >> >>>>>> tableDesc.addFamily(columnDesc);
> >> >> >>>>>>
> >> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
> >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
> >> >> >>>>>>
> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS);
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:
> >> >> >>>>>>
> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to
> >> reproduce
> >> >> this
> >> >> >>>>>>> problem so we can try to reproduce it on our end.
> >> >> >>>>>>>
> >> >> >>>>>>> Thanks.
> >> >> >>>>>>>
> >> >> >>>>>>> -- Zelaine
> >> >> >>>>>>>
> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <
> >> tiredqi...@gmail.com>
> >> >> >>>>>>> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> > Hi ,
> >> >> >>>>>>> >
> >> >> >>>>>>> > I recently meet a issue that can not query the correct data
> >> from
> >> >> >>>>>>> hbase with
> >> >> >>>>>>> > sql by drill, can anybody help me.
> >> >> >>>>>>> >
> >> >> >>>>>>> > I test with the drill 1.6.
> >> >> >>>>>>> > My hbase scheme:
> >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> >> >> >>>>>>> > cf : v
> >> >> >>>>>>> > qualifier: v, e0, e1
> >> >> >>>>>>> >
> >> >> >>>>>>> > The wrong result only happened when I use group by clause.
> >> >> >>>>>>> >
> >> >> >>>>>>> > This sql will not return correct result:
> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
> >> count(a.`v`.`e0`)
> >> >> p
> >> >> >>>>>>> from
> >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0'  group by
> >> >> a.`v`.`e0`;
> >> >> >>>>>>> > Part of explain of this sql is:
> >> >> >>>>>>> >
> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select
> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
> >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from
> hbase.browser_action2
> >> a
> >> >> >>>>>>> where
> >> >> >>>>>>> > a.row_key > '0'  group by a.`v`.`e0`;
> >> >> >>>>>>> > +------+------+
> >> >> >>>>>>> > | text | json |
> >> >> >>>>>>> > +------+------+
> >> >> >>>>>>> > | 00-00    Screen
> >> >> >>>>>>> > 00-01      Project(k=[$0], p=[$1])
> >> >> >>>>>>> > 00-02        UnionExchange
> >> >> >>>>>>> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> >> >> >>>>>>> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> >> >> >>>>>>> > 01-03              Project($f0=[$0], p=[$1])
> >> >> >>>>>>> > 01-04                HashToRandomExchange(dist0=[[$0]])
> >> >> >>>>>>> > 02-01                  UnorderedMuxExchange
> >> >> >>>>>>> > 03-01                    Project($f0=[$0], p=[$1],
> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> >> >> >>>>>>> > 03-02                      HashAgg(group=[{0}],
> >> p=[COUNT($0)])
> >> >> >>>>>>> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> >> >> >>>>>>> > 03-04
> Scan(groupscan=[HBaseGroupScan
> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2,
> >> >> >>>>>>> startRow=0\x00,
> >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]])
> >> >> >>>>>>> >
> >> >> >>>>>>> > The data return very quickly , the result of this sql is :
> >> >> >>>>>>> > +------+--------+
> >> >> >>>>>>> > |  k   |   p    |
> >> >> >>>>>>> > +------+--------+
> >> >> >>>>>>> > | pay  | 12180  |
> >> >> >>>>>>> > +------+--------
> >> >> >>>>>>> >
> >> >> >>>>>>> > But I have millons of data in the table.
> >> >> >>>>>>> >
> >> >> >>>>>>> > I tried to change the physical plan.  if I change the json
> >> >> explain
> >> >> >>>>>>> > *"columns"
> >> >> >>>>>>> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will
> >> return
> >> >> the
> >> >> >>>>>>> correct
> >> >> >>>>>>> > result.
> >> >> >>>>>>> >
> >> >> >>>>>>> > It seems the physical plan is not correct.
> >> >> >>>>>>> > I also try to debug the sql parser to find out the reason,
> >> but
> >> >> its
> >> >> >>>>>>> too
> >> >> >>>>>>> > complicate. Can anyone help me.
> >> >> >>>>>>> >
> >> >> >>>>>>> > Also this sql have the same issue.
> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')
> >> as
> >> >> k,
> >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by
> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9);
> >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]*  to
> >> >> *"columns" :
> >> >> >>>>>>> [
> >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result.
> >> >> >>>>>>> >
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: query from hbase issue

Reply via email to