Sorry late. Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
2016-05-20 23:06 GMT+08:00 Krystal Nguyen <kngu...@maprtech.com>: > Qiang, Can you please let us know the hbase version and hadoop distribution > version that you are using. > > On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kngu...@maprtech.com> > wrote: > > > Can you please let us know the hbase version and hadoop distribution > > version that you are using. > > > > > > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com> wrote: > > > >> Khurram , I send the mail again, the last mail forget to cc to > >> user@drill.apache.org > >> > >> The main process is the same, but my rowkey is more complicate, > >> Here is the detail I tested. > >> rowkey is like this : [salt 1byte string] + [day 8byte string] + > [event] + > >> [uid long] + [ts long] > >> also I have other qualifiers, only qualifier v:v is integer, the others > >> are > >> string. > >> > >> example: > >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1} > >> ROW COLUMN+CELL > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e0, > >> timestamp=1461839343076, value=pay > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e1, > >> timestamp=1461839343076, value=bijia > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e2, > >> timestamp=1461839343076, value=browser > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e3, > >> timestamp=1461839343076, value=* > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e4, > >> timestamp=1461839343076, value=* > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE > column=v:e5, > >> timestamp=1461839343076, value=* > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h, > >> timestamp=1459771200000, value=20 > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m, > >> timestamp=1459771200000, value=0 > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v, > >> timestamp=1459771200000, value=\x00\x00\x00\x17 > >> > >> > >> 1$\xD2\x00 > >> > >> > >> > >> 1 row(s) in 0.0410 seconds > >> > >> > >> Here is the example I how the issue look like: > >> > >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'], > >> STARTROW=> '0'} > >> ........ > >> 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ > column=v:e0, > >> timestamp=1463723029448, value=visit > >> > >> > >> x01T\x00\x0A\xFA\x00 > >> > >> > >> > >> 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ > column=v:e0, > >> timestamp=1463723029217, value=visit > >> > >> > >> x01T\x00\x0A\xFA\x00 > >> > >> > >> > >> 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ > column=v:e0, > >> timestamp=1463723029295, value=visit > >> > >> > >> x01T\x00\x0A\xFA\x00 > >> > >> > >> > >> 9994 row(s) in 123.8650 seconds > >> > >> the drill result: > >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, > >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0' > >> group by a.`v`.`e0`; > >> +--------+-------+ > >> | k | p | > >> +--------+-------+ > >> | visit | 1216 | > >> +--------+-------+ > >> > >> > >> I find out that if the row size larger than 10000 will have the issue. > >> The > >> result is right if less that 1000 rows. But not always that way. > >> What I can make sure is if I updated the columns in the physical plan > and > >> query by web UI , the result will be correct. > >> > >> > >> Thanks > >> > >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: > >> > >> > Qiang, can you please take a look at DRILL-4686 and confirm if the > data > >> > set used in my repro is the same as the one you have used. If the data > >> set > >> > is different please let us know the type of data that you have used in > >> your > >> > table. > >> > > >> > Aman - I will try to repro the problem on Drill 1.6.0 and share > results. > >> > > >> > Thanks, > >> > Khurram > >> > > >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> > >> wrote: > >> > > >> >> Khurram, DRILL-4686 seems like a different issue...it is reporting > an > >> >> error whereas the original problem from qiang was an incorrect > result. > >> >> Can > >> >> you use the same version (1.6) that he was using. Also, is the data > >> set > >> >> similar ? If you are unable to repro the exact same issue, perhaps > >> qiang > >> >> should file a JIRA with a smaller repro if possible. > >> >> > >> >> > >> >> > >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz < > kfar...@maprtech.com> > >> >> wrote: > >> >> > >> >> > Hello Qiang, > >> >> > > >> >> > DRILL-4686 is reported to track this problem. > >> >> > > >> >> > Thanks, > >> >> > Khurram > >> >> > > >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> > >> wrote: > >> >> > > >> >> >> Ok, Thanks very much. > >> >> >> > >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: > >> >> >> > >> >> >>> Hello Qiang, > >> >> >>> > >> >> >>> Someone from our Drill team (in San Jose) will get back to you > >> soon. I > >> >> >>> work from the India lab and I am in a different time zone as > >> compared > >> >> to > >> >> >>> San Jose office, some one from MapR San Jose will get back to you > >> as > >> >> soon > >> >> >>> as possible. > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Khurram > >> >> >>> > >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> > >> >> wrote: > >> >> >>> > >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the > >> >> >>>> conclusion? > >> >> >>>> > >> >> >>>> Any idea how to sovle it? > >> >> >>>> > >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com > >: > >> >> >>>> > >> >> >>>>> So I tried to create the table using HBase API (with no data > >> >> inserted > >> >> >>>>> into table) and I got the query plan for drill 1.7.0 > >> >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 > >> >> >>>>> > >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; > >> >> >>>>> +--------------+------------+--------------+ > >> >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | > >> >> >>>>> +--------------+------------+--------------+ > >> >> >>>>> | row_key | ANY | NO | > >> >> >>>>> | v | MAP | NO | > >> >> >>>>> +--------------+------------+--------------+ > >> >> >>>>> 2 rows selected (1.665 seconds) > >> >> >>>>> > >> >> >>>>> Table creation Java program > >> >> >>>>> > >> >> >>>>> {noformat} > >> >> >>>>> public class PutIntDataToHBase { > >> >> >>>>> public static void main(String args[]) throws IOException { > >> >> >>>>> Configuration conf = HBaseConfiguration.create(); > >> >> >>>>> conf.set("hbase.zookeeper.property.clientPort","5181"); > >> >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf); > >> >> >>>>> if (admin.tableExists("browser_action2")) { > >> >> >>>>> admin.disableTable("browser_action2"); > >> >> >>>>> admin.deleteTable("browser_action2"); > >> >> >>>>> } > >> >> >>>>> > >> >> >>>>> byte[][] SPLIT_KEYS = > >> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; > >> >> >>>>> HTableDescriptor tableDesc = new > >> >> >>>>> > >> HTableDescriptor(TableName.valueOf("browser_action2")); > >> >> >>>>> > >> >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v")); > >> >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS); > >> >> >>>>> > >> >> >>>>> } > >> >> >>>>> } > >> >> >>>>> {noformat} > >> >> >>>>> > >> >> >>>>> Query plan for the query that was reported as returning wrong > >> >> results. > >> >> >>>>> > >> >> >>>>> {noformat} > >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select > >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from > >> >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by > >> a.`v`.`e0`; > >> >> >>>>> +------+------+ > >> >> >>>>> | text | json | > >> >> >>>>> +------+------+ > >> >> >>>>> | 00-00 Screen > >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) > >> >> >>>>> 00-02 UnionExchange > >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) > >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >> >>>>> 02-01 UnorderedMuxExchange > >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], > >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) > >> >> >>>>> 03-03 Project($f0=[ITEM($1, 'e0')]) > >> >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan > >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> >> startRow=0\x00, > >> >> >>>>> stopRow=, filter=null], columns=[`*`]]]) > >> >> >>>>> {noformat} > >> >> >>>>> > >> >> >>>>> and the query plan for the other problem query mentioned in the > >> >> first > >> >> >>>>> email. > >> >> >>>>> > >> >> >>>>> {noformat} > >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select > >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, > >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from > >> >> >>>>> hbase.browser_action2 a group by > >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); > >> >> >>>>> +------+------+ > >> >> >>>>> | text | json | > >> >> >>>>> +------+------+ > >> >> >>>>> | 00-00 Screen > >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) > >> >> >>>>> 00-02 UnionExchange > >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) > >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >> >>>>> 02-01 UnorderedMuxExchange > >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], > >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) > >> >> >>>>> 03-03 Project($f0=[BYTE_SUBSTR($0, 1, > 9)], > >> >> >>>>> row_key=[$0]) > >> >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan > >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> >> startRow=null, > >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]]) > >> >> >>>>> {noformat} > >> >> >>>>> > >> >> >>>>> Thanks, > >> >> >>>>> Khurram > >> >> >>>>> > >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li < > tiredqi...@gmail.com> > >> >> >>>>> wrote: > >> >> >>>>> > >> >> >>>>>> Yes. > >> >> >>>>>> I use hbase API to create it. > >> >> >>>>>> > >> >> >>>>>> The main code is: > >> >> >>>>>> > >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, > >> {'5'}, > >> >> {'6'}, {'7'},{'8'}, {'9'} }; > >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2"); > >> >> >>>>>> > >> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName); > >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); > >> >> >>>>>> tableDesc.addFamily(columnDesc); > >> >> >>>>>> > >> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); > >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); > >> >> >>>>>> > >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS); > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: > >> >> >>>>>> > >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to > >> reproduce > >> >> this > >> >> >>>>>>> problem so we can try to reproduce it on our end. > >> >> >>>>>>> > >> >> >>>>>>> Thanks. > >> >> >>>>>>> > >> >> >>>>>>> -- Zelaine > >> >> >>>>>>> > >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li < > >> tiredqi...@gmail.com> > >> >> >>>>>>> wrote: > >> >> >>>>>>> > >> >> >>>>>>> > Hi , > >> >> >>>>>>> > > >> >> >>>>>>> > I recently meet a issue that can not query the correct data > >> from > >> >> >>>>>>> hbase with > >> >> >>>>>>> > sql by drill, can anybody help me. > >> >> >>>>>>> > > >> >> >>>>>>> > I test with the drill 1.6. > >> >> >>>>>>> > My hbase scheme: > >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts > >> >> >>>>>>> > cf : v > >> >> >>>>>>> > qualifier: v, e0, e1 > >> >> >>>>>>> > > >> >> >>>>>>> > The wrong result only happened when I use group by clause. > >> >> >>>>>>> > > >> >> >>>>>>> > This sql will not return correct result: > >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, > >> count(a.`v`.`e0`) > >> >> p > >> >> >>>>>>> from > >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' group by > >> >> a.`v`.`e0`; > >> >> >>>>>>> > Part of explain of this sql is: > >> >> >>>>>>> > > >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select > >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`, > >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from > hbase.browser_action2 > >> a > >> >> >>>>>>> where > >> >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`; > >> >> >>>>>>> > +------+------+ > >> >> >>>>>>> > | text | json | > >> >> >>>>>>> > +------+------+ > >> >> >>>>>>> > | 00-00 Screen > >> >> >>>>>>> > 00-01 Project(k=[$0], p=[$1]) > >> >> >>>>>>> > 00-02 UnionExchange > >> >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) > >> >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) > >> >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1]) > >> >> >>>>>>> > 01-04 HashToRandomExchange(dist0=[[$0]]) > >> >> >>>>>>> > 02-01 UnorderedMuxExchange > >> >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1], > >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) > >> >> >>>>>>> > 03-02 HashAgg(group=[{0}], > >> p=[COUNT($0)]) > >> >> >>>>>>> > 03-03 Project($f0=[ITEM($1, 'e0')]) > >> >> >>>>>>> > 03-04 > Scan(groupscan=[HBaseGroupScan > >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, > >> >> >>>>>>> startRow=0\x00, > >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]]) > >> >> >>>>>>> > > >> >> >>>>>>> > The data return very quickly , the result of this sql is : > >> >> >>>>>>> > +------+--------+ > >> >> >>>>>>> > | k | p | > >> >> >>>>>>> > +------+--------+ > >> >> >>>>>>> > | pay | 12180 | > >> >> >>>>>>> > +------+-------- > >> >> >>>>>>> > > >> >> >>>>>>> > But I have millons of data in the table. > >> >> >>>>>>> > > >> >> >>>>>>> > I tried to change the physical plan. if I change the json > >> >> explain > >> >> >>>>>>> > *"columns" > >> >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will > >> return > >> >> the > >> >> >>>>>>> correct > >> >> >>>>>>> > result. > >> >> >>>>>>> > > >> >> >>>>>>> > It seems the physical plan is not correct. > >> >> >>>>>>> > I also try to debug the sql parser to find out the reason, > >> but > >> >> its > >> >> >>>>>>> too > >> >> >>>>>>> > complicate. Can anyone help me. > >> >> >>>>>>> > > >> >> >>>>>>> > Also this sql have the same issue. > >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') > >> as > >> >> k, > >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by > >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); > >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* to > >> >> *"columns" : > >> >> >>>>>>> [ > >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result. > >> >> >>>>>>> > > >> >> >>>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>> > >> >> >>>> > >> >> >>> > >> >> >> > >> >> > > >> >> > >> > > >> > > >> > > > > >