Qiang, Can you please let us know the hbase version and hadoop distribution version that you are using.
On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <kngu...@maprtech.com> wrote: > Can you please let us know the hbase version and hadoop distribution > version that you are using. > > > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqi...@gmail.com> wrote: > >> Khurram , I send the mail again, the last mail forget to cc to >> user@drill.apache.org >> >> The main process is the same, but my rowkey is more complicate, >> Here is the detail I tested. >> rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] + >> [uid long] + [ts long] >> also I have other qualifiers, only qualifier v:v is integer, the others >> are >> string. >> >> example: >> hbase(main):004:0> scan 'browser_action2', { LIMIT => 1} >> ROW COLUMN+CELL >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0, >> timestamp=1461839343076, value=pay >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1, >> timestamp=1461839343076, value=bijia >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2, >> timestamp=1461839343076, value=browser >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3, >> timestamp=1461839343076, value=* >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4, >> timestamp=1461839343076, value=* >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5, >> timestamp=1461839343076, value=* >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h, >> timestamp=1459771200000, value=20 >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m, >> timestamp=1459771200000, value=0 >> >> >> 1$\xD2\x00 >> >> >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v, >> timestamp=1459771200000, value=\x00\x00\x00\x17 >> >> >> 1$\xD2\x00 >> >> >> >> 1 row(s) in 0.0410 seconds >> >> >> Here is the example I how the issue look like: >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'], >> STARTROW=> '0'} >> ........ >> 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0, >> timestamp=1463723029448, value=visit >> >> >> x01T\x00\x0A\xFA\x00 >> >> >> >> 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0, >> timestamp=1463723029217, value=visit >> >> >> x01T\x00\x0A\xFA\x00 >> >> >> >> 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0, >> timestamp=1463723029295, value=visit >> >> >> x01T\x00\x0A\xFA\x00 >> >> >> >> 9994 row(s) in 123.8650 seconds >> >> the drill result: >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >> count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0' >> group by a.`v`.`e0`; >> +--------+-------+ >> | k | p | >> +--------+-------+ >> | visit | 1216 | >> +--------+-------+ >> >> >> I find out that if the row size larger than 10000 will have the issue. >> The >> result is right if less that 1000 rows. But not always that way. >> What I can make sure is if I updated the columns in the physical plan and >> query by web UI , the result will be correct. >> >> >> Thanks >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >> > Qiang, can you please take a look at DRILL-4686 and confirm if the data >> > set used in my repro is the same as the one you have used. If the data >> set >> > is different please let us know the type of data that you have used in >> your >> > table. >> > >> > Aman - I will try to repro the problem on Drill 1.6.0 and share results. >> > >> > Thanks, >> > Khurram >> > >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> >> wrote: >> > >> >> Khurram, DRILL-4686 seems like a different issue...it is reporting an >> >> error whereas the original problem from qiang was an incorrect result. >> >> Can >> >> you use the same version (1.6) that he was using. Also, is the data >> set >> >> similar ? If you are unable to repro the exact same issue, perhaps >> qiang >> >> should file a JIRA with a smaller repro if possible. >> >> >> >> >> >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com> >> >> wrote: >> >> >> >> > Hello Qiang, >> >> > >> >> > DRILL-4686 is reported to track this problem. >> >> > >> >> > Thanks, >> >> > Khurram >> >> > >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> >> wrote: >> >> > >> >> >> Ok, Thanks very much. >> >> >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >> >> >> >> >>> Hello Qiang, >> >> >>> >> >> >>> Someone from our Drill team (in San Jose) will get back to you >> soon. I >> >> >>> work from the India lab and I am in a different time zone as >> compared >> >> to >> >> >>> San Jose office, some one from MapR San Jose will get back to you >> as >> >> soon >> >> >>> as possible. >> >> >>> >> >> >>> Thanks, >> >> >>> Khurram >> >> >>> >> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> >> >> wrote: >> >> >>> >> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the >> >> >>>> conclusion? >> >> >>>> >> >> >>>> Any idea how to sovle it? >> >> >>>> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >> >>>> >> >> >>>>> So I tried to create the table using HBase API (with no data >> >> inserted >> >> >>>>> into table) and I got the query plan for drill 1.7.0 >> >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 >> >> >>>>> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; >> >> >>>>> +--------------+------------+--------------+ >> >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >> >> >>>>> +--------------+------------+--------------+ >> >> >>>>> | row_key | ANY | NO | >> >> >>>>> | v | MAP | NO | >> >> >>>>> +--------------+------------+--------------+ >> >> >>>>> 2 rows selected (1.665 seconds) >> >> >>>>> >> >> >>>>> Table creation Java program >> >> >>>>> >> >> >>>>> {noformat} >> >> >>>>> public class PutIntDataToHBase { >> >> >>>>> public static void main(String args[]) throws IOException { >> >> >>>>> Configuration conf = HBaseConfiguration.create(); >> >> >>>>> conf.set("hbase.zookeeper.property.clientPort","5181"); >> >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf); >> >> >>>>> if (admin.tableExists("browser_action2")) { >> >> >>>>> admin.disableTable("browser_action2"); >> >> >>>>> admin.deleteTable("browser_action2"); >> >> >>>>> } >> >> >>>>> >> >> >>>>> byte[][] SPLIT_KEYS = >> >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; >> >> >>>>> HTableDescriptor tableDesc = new >> >> >>>>> >> HTableDescriptor(TableName.valueOf("browser_action2")); >> >> >>>>> >> >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v")); >> >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS); >> >> >>>>> >> >> >>>>> } >> >> >>>>> } >> >> >>>>> {noformat} >> >> >>>>> >> >> >>>>> Query plan for the query that was reported as returning wrong >> >> results. >> >> >>>>> >> >> >>>>> {noformat} >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >> >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from >> >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by >> a.`v`.`e0`; >> >> >>>>> +------+------+ >> >> >>>>> | text | json | >> >> >>>>> +------+------+ >> >> >>>>> | 00-00 Screen >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) >> >> >>>>> 00-02 UnionExchange >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >> >> >>>>> 02-01 UnorderedMuxExchange >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >> >> >>>>> 03-03 Project($f0=[ITEM($1, 'e0')]) >> >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> >> startRow=0\x00, >> >> >>>>> stopRow=, filter=null], columns=[`*`]]]) >> >> >>>>> {noformat} >> >> >>>>> >> >> >>>>> and the query plan for the other problem query mentioned in the >> >> first >> >> >>>>> email. >> >> >>>>> >> >> >>>>> {noformat} >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, >> >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from >> >> >>>>> hbase.browser_action2 a group by >> >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); >> >> >>>>> +------+------+ >> >> >>>>> | text | json | >> >> >>>>> +------+------+ >> >> >>>>> | 00-00 Screen >> >> >>>>> 00-01 Project(k=[$0], p=[$1]) >> >> >>>>> 00-02 UnionExchange >> >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >> >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >> >> >>>>> 02-01 UnorderedMuxExchange >> >> >>>>> 03-01 Project($f0=[$0], p=[$1], >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) >> >> >>>>> 03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)], >> >> >>>>> row_key=[$0]) >> >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan >> >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> >> startRow=null, >> >> >>>>> stopRow=null, filter=null], columns=[`*`]]]) >> >> >>>>> {noformat} >> >> >>>>> >> >> >>>>> Thanks, >> >> >>>>> Khurram >> >> >>>>> >> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> >> >> >>>>> wrote: >> >> >>>>> >> >> >>>>>> Yes. >> >> >>>>>> I use hbase API to create it. >> >> >>>>>> >> >> >>>>>> The main code is: >> >> >>>>>> >> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, >> {'5'}, >> >> {'6'}, {'7'},{'8'}, {'9'} }; >> >> >>>>>> TableName tableName = TableName.valueOf("browser_action2"); >> >> >>>>>> >> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName); >> >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); >> >> >>>>>> tableDesc.addFamily(columnDesc); >> >> >>>>>> >> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); >> >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); >> >> >>>>>> >> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS); >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: >> >> >>>>>> >> >> >>>>>>> Can you provide the CREATE TABLE statement you used to >> reproduce >> >> this >> >> >>>>>>> problem so we can try to reproduce it on our end. >> >> >>>>>>> >> >> >>>>>>> Thanks. >> >> >>>>>>> >> >> >>>>>>> -- Zelaine >> >> >>>>>>> >> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li < >> tiredqi...@gmail.com> >> >> >>>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> > Hi , >> >> >>>>>>> > >> >> >>>>>>> > I recently meet a issue that can not query the correct data >> from >> >> >>>>>>> hbase with >> >> >>>>>>> > sql by drill, can anybody help me. >> >> >>>>>>> > >> >> >>>>>>> > I test with the drill 1.6. >> >> >>>>>>> > My hbase scheme: >> >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts >> >> >>>>>>> > cf : v >> >> >>>>>>> > qualifier: v, e0, e1 >> >> >>>>>>> > >> >> >>>>>>> > The wrong result only happened when I use group by clause. >> >> >>>>>>> > >> >> >>>>>>> > This sql will not return correct result: >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, >> count(a.`v`.`e0`) >> >> p >> >> >>>>>>> from >> >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' group by >> >> a.`v`.`e0`; >> >> >>>>>>> > Part of explain of this sql is: >> >> >>>>>>> > >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`, >> >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 >> a >> >> >>>>>>> where >> >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`; >> >> >>>>>>> > +------+------+ >> >> >>>>>>> > | text | json | >> >> >>>>>>> > +------+------+ >> >> >>>>>>> > | 00-00 Screen >> >> >>>>>>> > 00-01 Project(k=[$0], p=[$1]) >> >> >>>>>>> > 00-02 UnionExchange >> >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1]) >> >> >>>>>>> > 01-04 HashToRandomExchange(dist0=[[$0]]) >> >> >>>>>>> > 02-01 UnorderedMuxExchange >> >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1], >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >> >>>>>>> > 03-02 HashAgg(group=[{0}], >> p=[COUNT($0)]) >> >> >>>>>>> > 03-03 Project($f0=[ITEM($1, 'e0')]) >> >> >>>>>>> > 03-04 Scan(groupscan=[HBaseGroupScan >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> >> >>>>>>> startRow=0\x00, >> >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]]) >> >> >>>>>>> > >> >> >>>>>>> > The data return very quickly , the result of this sql is : >> >> >>>>>>> > +------+--------+ >> >> >>>>>>> > | k | p | >> >> >>>>>>> > +------+--------+ >> >> >>>>>>> > | pay | 12180 | >> >> >>>>>>> > +------+-------- >> >> >>>>>>> > >> >> >>>>>>> > But I have millons of data in the table. >> >> >>>>>>> > >> >> >>>>>>> > I tried to change the physical plan. if I change the json >> >> explain >> >> >>>>>>> > *"columns" >> >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will >> return >> >> the >> >> >>>>>>> correct >> >> >>>>>>> > result. >> >> >>>>>>> > >> >> >>>>>>> > It seems the physical plan is not correct. >> >> >>>>>>> > I also try to debug the sql parser to find out the reason, >> but >> >> its >> >> >>>>>>> too >> >> >>>>>>> > complicate. Can anyone help me. >> >> >>>>>>> > >> >> >>>>>>> > Also this sql have the same issue. >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') >> as >> >> k, >> >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by >> >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); >> >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* to >> >> *"columns" : >> >> >>>>>>> [ >> >> >>>>>>> > "`row_key`" ] *, it will return the correct result. >> >> >>>>>>> > >> >> >>>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> >> >> > >> >> >> > >> > >> > >