Re: query from hbase issue

qiang li Tue, 17 May 2016 18:31:48 -0700

Yes.
I use hbase API to create it.

The main code is:


byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'},
{'6'}, {'7'},{'8'}, {'9'} };
TableName tableName = TableName.valueOf("browser_action2");

HTableDescriptor tableDesc = new HTableDescriptor(tableName);
HColumnDescriptor columnDesc = new HColumnDescriptor("v");
tableDesc.addFamily(columnDesc);

columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);

admin.createTable(tableDesc, SPLIT_KEYS);




2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>:

> Can you provide the CREATE TABLE statement you used to reproduce this
> problem so we can try to reproduce it on our end.
>
> Thanks.
>
> -- Zelaine
>
> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> wrote:
>
> > Hi ,
> >
> > I recently meet a issue that can not query the correct data from hbase
> with
> > sql by drill, can anybody help me.
> >
> > I test with the drill 1.6.
> > My hbase scheme:
> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts
> > cf : v
> > qualifier: v, e0, e1
> >
> > The wrong result only happened when I use group by clause.
> >
> > This sql will not return correct result:
> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from
> > hbase.browser_action2 a where a.row_key > '0'  group by a.`v`.`e0`;
> > Part of explain of this sql is:
> >
> > 0: jdbc:drill:zk=rfdc5> explain plan for select CONVERT_FROM(a.`v`.`e0`,
> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where
> > a.row_key > '0'  group by a.`v`.`e0`;
> > +------+------+
> > | text | json |
> > +------+------+
> > | 00-00    Screen
> > 00-01      Project(k=[$0], p=[$1])
> > 00-02        UnionExchange
> > 01-01          Project(k=[CONVERT_FROMUTF8($0)], p=[$1])
> > 01-02            HashAgg(group=[{0}], p=[$SUM0($1)])
> > 01-03              Project($f0=[$0], p=[$1])
> > 01-04                HashToRandomExchange(dist0=[[$0]])
> > 02-01                  UnorderedMuxExchange
> > 03-01                    Project($f0=[$0], p=[$1],
> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
> > 03-02                      HashAgg(group=[{0}], p=[COUNT($0)])
> > 03-03                        Project($f0=[ITEM($1, 'e0')])
> > 03-04                          Scan(groupscan=[HBaseGroupScan
> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, startRow=0\x00,
> > stopRow=, filter=null], columns=[`*`]]])
> >
> > The data return very quickly , the result of this sql is :
> > +------+--------+
> > |  k   |   p    |
> > +------+--------+
> > | pay  | 12180  |
> > +------+--------
> >
> > But I have millons of data in the table.
> >
> > I tried to change the physical plan.  if I change the json explain
> > *"columns"
> > : [ "`*`" ]*  to *"columns" : [ "`v`.`e0`" ] *, it will return the
> correct
> > result.
> >
> > It seems the physical plan is not correct.
> > I also try to debug the sql parser to find out the reason, but its too
> > complicate. Can anyone help me.
> >
> > Also this sql have the same issue.
> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8')  as k,
> > count(a.row_key) p from hbase.browser_action2 a group by
> > BYTE_SUBSTR(a.row_key, 1 , 9);
> > I change the json explain *"columns" : [ "`*`" ]*  to *"columns" : [
> > "`row_key`" ] *, it will return the correct result.
> >
>

Re: query from hbase issue

Reply via email to