Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

qiang li Mon, 20 Jun 2016 18:48:39 -0700

Hi Aman,

I did not fully test with the old version.


Cloud you please help me create the JIRA issue,  I think my account have
not the privilege, my account is griffinli and can not find the place to
create new issue. Below is the explain detail for the same SQL in different
nodes of cluster.


This is the correct plan which only have two nodes:
0: jdbc:drill:zk=xxx:> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
 BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
 > '0br' and `nation`.row_key  < '0bs' limit 10;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(uid=[$0], v=[$1])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            UnionExchange
01-01              SelectionVectorRemover
01-02                Limit(fetch=[10])
01-03                  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
-8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
01-04                    Project(row_key=[$3], v=[$4], ITEM=[$5],
row_key0=[$0], v0=[$1], $f2=[$2])
01-05                      HashJoin(condition=[=($2, $5)], joinType=[inner])
01-07                        Project(row_key=[$0], v=[$1],
$f2=[BYTE_SUBSTR($0, -8, 8)])
01-09                          Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
01-06                        Project(row_key0=[$0], v0=[$1], ITEM=[$2])
01-08                          *BroadcastExchange*
02-01                            Project(row_key=[$0], v=[$1],
ITEM=[ITEM($1, 'v')])
02-02                              Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx,
startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter
(GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`,
`v`.`v`]]])


This is the plan that fails which have more than 5 nodes:
0: jdbc:drill:zk=xxx:> explain plan for select
CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx` as
`nation` join hbase.offers_ref0 as `ref0` on
 BYTE_SUBSTR(`ref0`.row_key,-8,8) = nation.`v`.`v` where `nation`.row_key
 > '0br' and `nation`.row_key  < '0bs' limit 10;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(uid=[$0], v=[$1])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            UnionExchange
01-01              SelectionVectorRemover
01-02                Limit(fetch=[10])
01-03                  Project(uid=[CONVERT_FROMBIGINT_BE(BYTE_SUBSTR($3,
-8, 8))], v=[CONVERT_FROMUTF8(ITEM($4, 'v'))])
01-04                    Project(row_key=[$3], v=[$4], ITEM=[$5],
row_key0=[$0], v0=[$1], $f2=[$2])
01-05                      HashJoin(condition=[=($2, $5)], joinType=[inner])
01-07                        Project(row_key=[$0], v=[$1], $f2=[$2])
01-09                          *HashToRandomExchange*(dist0=[[$2]])
02-01                            UnorderedMuxExchange
04-01                              Project(row_key=[$0], v=[$1], $f2=[$2],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)])
04-02                                Project(row_key=[$0], v=[$1],
$f2=[BYTE_SUBSTR($0, -8, 8)])
04-03                                  Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_ref0, startRow=null,
stopRow=null, filter=null], columns=[`*`]]])
01-06                        Project(row_key0=[$0], v0=[$1], ITEM=[$2])
01-08                          Project(row_key=[$0], v=[$1], ITEM=[$2])
01-10                            *HashToRandomExchange*(dist0=[[$2]])
03-01                              UnorderedMuxExchange
05-01                                Project(row_key=[$0], v=[$1],
ITEM=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2)])
05-02                                  Project(row_key=[$0], v=[$1],
ITEM=[ITEM($1, 'v')])
05-03                                    Scan(groupscan=[HBaseGroupScan
[HBaseScanSpec=HBaseScanSpec [tableName=offers_nation_idx,
startRow=0br\x00, stopRow=0bs, filter=FilterList AND (2/2): [RowFilter
(GREATER, 0br), RowFilter (LESS, 0bs)]], columns=[`row_key`, `v`,
`v`.`v`]]])

The difference is use *BroadcastExchange *and  *HashToRandomExchange.*

You can create the JIRA and send me the link .

Thanks.


2016-06-20 23:44 GMT+08:00 Aman Sinha <amansi...@apache.org>:

> Hi Qiang,
> were you seeing this same issue with the prior HBase version also ?  (I
> would think this is not a regression).  It would be best to create a new
> JIRA and attach the EXPLAIN plans for the successful and failed runs.  With
> more nodes some minor fragments of the hash join may be getting empty input
> batches and I am guessing that has something to do with the
> SchemaChangeException.   Someone would need to debug once you create the
> JIRA with relevant details.
>
> -Aman
>
> On Mon, Jun 20, 2016 at 5:13 AM, qiang li <tiredqi...@gmail.com> wrote:
>
> > Thanks Aditya.
> >
> > By the way, I found another issue.
> >
> > Let say I have two tables.
> >
> > offers_ref0 : rowkey salt(1byte)+long uid(8 byte ) , family: v,
> qualifier:
> > v(string)
> > offers_nation_idx: rowkey salt(1byte) + string, family:v, qualifier:
> v(long
> > 8 byte)
> >
> > there is the SQL:
> >
> > select CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') as uid,
> > convert_from(`ref0`.`v`.`v`,'UTF8') as v  from hbase.`offers_nation_idx`
> as
> > `nation` join hbase.offers_ref0 as `ref0` on
> > CONVERT_FROM(BYTE_SUBSTR(`ref0`.row_key,-8,8),'BIGINT_BE') =
> > CONVERT_FROM(nation.`v`.`v`,'BIGINT_BE') where `nation`.row_key  > '0br'
> > and `nation`.row_key  < '0bs' limit 10
> >
> > When I execute the query with single node or less than 5 nodes, its
> working
> > good. But when I execute it in cluster which have about 14 nodes, its
> throw
> > a exception:
> >
> > First time will throw this exception:
> > *Caused by: java.sql.SQLException: SYSTEM ERROR: SchemaChangeException:
> > Hash join does not support schema changes*
> >
> > Then if I query again, it will always throw below exception:
> > *Query Failed: An Error Occurred*
> > *org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > IllegalStateException: Failure while reading vector. Expected vector
> class
> > of org.apache.drill.exec.vector.NullableIntVector but was holding vector
> > class org.apache.drill.exec.vector.complex.MapVector, field=
> > v(MAP:REQUIRED)[v(VARBINARY:OPTIONAL)[$bits$(UINT1:REQUIRED),
> > v(VARBINARY:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]] Fragment 12:4 [Error
> Id:
> > 06c6eae4-0822-4714-b0bf-a6e04ebfec79 on xxx:31010]*
> >
> > Its very strange, and I do not know how to solve it.
> > I tried add node to the cluster one by one, it will reproduce when I
> added
> > 5 nodes. Can anyone help me solve this issue?
> >
> >
> >
> >
> > 2016-06-17 4:39 GMT+08:00 Aditya <adityakish...@gmail.com>:
> >
> > > https://issues.apache.org/jira/browse/DRILL-4727
> > >
> > > On Thu, Jun 16, 2016 at 11:39 AM, Aman Sinha <amansi...@apache.org>
> > wrote:
> > >
> > >> Qiang/Aditya can you create a JIRA for this and mark it for 1.7.
> > thanks.
> > >>
> > >> On Thu, Jun 16, 2016 at 11:25 AM, Aditya <adityakish...@gmail.com>
> > wrote:
> > >>
> > >> > Thanks for reporting, I'm looking into it and will post a patch
> soon.
> > >> >
> > >> > On Wed, Jun 15, 2016 at 7:27 PM, qiang li <tiredqi...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi Aditya,
> > >> > >
> > >> > > I tested the latest version and got this exception and the
> drillbit
> > >> fail
> > >> > > to startup .
> > >> > >
> > >> > > Exception in thread "main" java.lang.NoSuchMethodError:
> > >> > > io.netty.util.UniqueName.<init>(Ljava/lang/String;)V
> > >> > >         at
> > >> io.netty.channel.ChannelOption.<init>(ChannelOption.java:136)
> > >> > >         at
> > >> io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99)
> > >> > >         at
> > >> io.netty.channel.ChannelOption.<clinit>(ChannelOption.java:42)
> > >> > >         at
> > >> > > org.apache.drill.exec.rpc.BasicServer.<init>(BasicServer.java:63)
> > >> > >         at
> > >> > >
> org.apache.drill.exec.rpc.user.UserServer.<init>(UserServer.java:74)
> > >> > >         at
> > >> > >
> > >>
> > org.apache.drill.exec.service.ServiceEngine.<init>(ServiceEngine.java:78)
> > >> > >         at
> > >> > org.apache.drill.exec.server.Drillbit.<init>(Drillbit.java:108)
> > >> > >         at
> > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> > >> > >         at
> > >> org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
> > >> > >         at
> > >> org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
> > >> > >
> > >> > > It will working if I remove
> > jars/3rdparty/netty-all-4.0.23.Final.jar,
> > >> the
> > >> > > drill can startup. I think there have some package dependency
> > version
> > >> > > issue, do you think so ?
> > >> > >
> > >> > >
> > >> > >
> > >> > > 2016-06-15 8:14 GMT+08:00 Aditya <adityakish...@gmail.com>:
> > >> > >
> > >> > >> HBase 1.x support has been merged and is available in latest
> > >> > >> 1.7.0-SNAPSHOT
> > >> > >> builds.
> > >> > >>
> > >> > >> On Wed, Jun 1, 2016 at 1:23 PM, Aditya <adityakish...@gmail.com>
> > >> wrote:
> > >> > >>
> > >> > >> > Thanks Jacques for promptly reviewing my long series of
> patches!
> > >> > >> >
> > >> > >> > I'm planning to merge the HBase 1.x support some time in next
> 48
> > >> > hours.
> > >> > >> >
> > >> > >> > If anyone else is interested and willing, please review the
> > latest
> > >> > patch
> > >> > >> > here[1].
> > >> > >> >
> > >> > >> > aditya...
> > >> > >> >
> > >> > >> > [1] https://github.com/apache/drill/pull/443/files
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: DRILL-4199: Add Support for HBase 1.X - planning to merge

Reply via email to