Khurram , I send the mail again, the last mail forget to cc to user@drill.apache.org
The main process is the same, but my rowkey is more complicate, Here is the detail I tested. rowkey is like this : [salt 1byte string] + [day 8byte string] + [event] + [uid long] + [ts long] also I have other qualifiers, only qualifier v:v is integer, the others are string. example: hbase(main):004:0> scan 'browser_action2', { LIMIT => 1} ROW COLUMN+CELL 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e0, timestamp=1461839343076, value=pay 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e1, timestamp=1461839343076, value=bijia 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e2, timestamp=1461839343076, value=browser 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e3, timestamp=1461839343076, value=* 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e4, timestamp=1461839343076, value=* 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:e5, timestamp=1461839343076, value=* 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:h, timestamp=1459771200000, value=20 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:m, timestamp=1459771200000, value=0 1$\xD2\x00 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE column=v:v, timestamp=1459771200000, value=\x00\x00\x00\x17 1$\xD2\x00 1 row(s) in 0.0410 seconds Here is the example I how the issue look like: hbase(main):69904:0> scan 'browser_action2', {COLUMNS => ['v:e0'], STARTROW=> '0'} ........ 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\ column=v:e0, timestamp=1463723029448, value=visit x01T\x00\x0A\xFA\x00 920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\ column=v:e0, timestamp=1463723029217, value=visit x01T\x00\x0A\xFA\x00 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\ column=v:e0, timestamp=1463723029295, value=visit x01T\x00\x0A\xFA\x00 9994 row(s) in 123.8650 seconds the drill result: 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; +--------+-------+ | k | p | +--------+-------+ | visit | 1216 | +--------+-------+ I find out that if the row size larger than 10000 will have the issue. The result is right if less that 1000 rows. But not always that way. What I can make sure is if I updated the columns in the physical plan and query by web UI , the result will be correct. Thanks 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: > Qiang, can you please take a look at DRILL-4686 and confirm if the data > set used in my repro is the same as the one you have used. If the data set > is different please let us know the type of data that you have used in your > table. > > Aman - I will try to repro the problem on Drill 1.6.0 and share results. > > Thanks, > Khurram > > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <amansi...@apache.org> wrote: > >> Khurram, DRILL-4686 seems like a different issue...it is reporting an >> error whereas the original problem from qiang was an incorrect result. >> Can >> you use the same version (1.6) that he was using. Also, is the data set >> similar ? If you are unable to repro the exact same issue, perhaps qiang >> should file a JIRA with a smaller repro if possible. >> >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz <kfar...@maprtech.com> >> wrote: >> >> > Hello Qiang, >> > >> > DRILL-4686 is reported to track this problem. >> > >> > Thanks, >> > Khurram >> > >> > On Wed, May 18, 2016 at 3:16 PM, qiang li <tiredqi...@gmail.com> wrote: >> > >> >> Ok, Thanks very much. >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >> >> >>> Hello Qiang, >> >>> >> >>> Someone from our Drill team (in San Jose) will get back to you soon. I >> >>> work from the India lab and I am in a different time zone as compared >> to >> >>> San Jose office, some one from MapR San Jose will get back to you as >> soon >> >>> as possible. >> >>> >> >>> Thanks, >> >>> Khurram >> >>> >> >>> On Wed, May 18, 2016 at 3:09 PM, qiang li <tiredqi...@gmail.com> >> wrote: >> >>> >> >>>> Hi Khurram, Thanks very much to reproduce it, so what's the >> >>>> conclusion? >> >>>> >> >>>> Any idea how to sovle it? >> >>>> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram Faraaz <kfar...@maprtech.com>: >> >>>> >> >>>>> So I tried to create the table using HBase API (with no data >> inserted >> >>>>> into table) and I got the query plan for drill 1.7.0 >> >>>>> Drill 1.7.0-SNAPSHOT commit ID : 09b26277 >> >>>>> >> >>>>> 0: jdbc:drill:schema=dfs.tmp> describe browser_action2; >> >>>>> +--------------+------------+--------------+ >> >>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >> >>>>> +--------------+------------+--------------+ >> >>>>> | row_key | ANY | NO | >> >>>>> | v | MAP | NO | >> >>>>> +--------------+------------+--------------+ >> >>>>> 2 rows selected (1.665 seconds) >> >>>>> >> >>>>> Table creation Java program >> >>>>> >> >>>>> {noformat} >> >>>>> public class PutIntDataToHBase { >> >>>>> public static void main(String args[]) throws IOException { >> >>>>> Configuration conf = HBaseConfiguration.create(); >> >>>>> conf.set("hbase.zookeeper.property.clientPort","5181"); >> >>>>> HBaseAdmin admin = new HBaseAdmin(conf); >> >>>>> if (admin.tableExists("browser_action2")) { >> >>>>> admin.disableTable("browser_action2"); >> >>>>> admin.deleteTable("browser_action2"); >> >>>>> } >> >>>>> >> >>>>> byte[][] SPLIT_KEYS = >> >>>>> {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}}; >> >>>>> HTableDescriptor tableDesc = new >> >>>>> HTableDescriptor(TableName.valueOf("browser_action2")); >> >>>>> >> >>>>> tableDesc.addFamily(new HColumnDescriptor("v")); >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS); >> >>>>> >> >>>>> } >> >>>>> } >> >>>>> {noformat} >> >>>>> >> >>>>> Query plan for the query that was reported as returning wrong >> results. >> >>>>> >> >>>>> {noformat} >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >> >>>>> CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) p from >> >>>>> hbase.browser_action2 a where a.row_key > '0' group by a.`v`.`e0`; >> >>>>> +------+------+ >> >>>>> | text | json | >> >>>>> +------+------+ >> >>>>> | 00-00 Screen >> >>>>> 00-01 Project(k=[$0], p=[$1]) >> >>>>> 00-02 UnionExchange >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >> >>>>> 02-01 UnorderedMuxExchange >> >>>>> 03-01 Project($f0=[$0], p=[$1], >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >> >>>>> 03-03 Project($f0=[ITEM($1, 'e0')]) >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> startRow=0\x00, >> >>>>> stopRow=, filter=null], columns=[`*`]]]) >> >>>>> {noformat} >> >>>>> >> >>>>> and the query plan for the other problem query mentioned in the >> first >> >>>>> email. >> >>>>> >> >>>>> {noformat} >> >>>>> 0: jdbc:drill:schema=dfs.tmp> explain plan for select >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as k, >> >>>>> . . . . . . . . . . . . . . > count(a.row_key) p from >> >>>>> hbase.browser_action2 a group by >> >>>>> . . . . . . . . . . . . . . > BYTE_SUBSTR(a.row_key, 1 , 9); >> >>>>> +------+------+ >> >>>>> | text | json | >> >>>>> +------+------+ >> >>>>> | 00-00 Screen >> >>>>> 00-01 Project(k=[$0], p=[$1]) >> >>>>> 00-02 UnionExchange >> >>>>> 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >>>>> 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >>>>> 01-03 Project($f0=[$0], p=[$1]) >> >>>>> 01-04 HashToRandomExchange(dist0=[[$0]]) >> >>>>> 02-01 UnorderedMuxExchange >> >>>>> 03-01 Project($f0=[$0], p=[$1], >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >>>>> 03-02 HashAgg(group=[{0}], p=[COUNT($1)]) >> >>>>> 03-03 Project($f0=[BYTE_SUBSTR($0, 1, 9)], >> >>>>> row_key=[$0]) >> >>>>> 03-04 Scan(groupscan=[HBaseGroupScan >> >>>>> [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> startRow=null, >> >>>>> stopRow=null, filter=null], columns=[`*`]]]) >> >>>>> {noformat} >> >>>>> >> >>>>> Thanks, >> >>>>> Khurram >> >>>>> >> >>>>> On Wed, May 18, 2016 at 7:01 AM, qiang li <tiredqi...@gmail.com> >> >>>>> wrote: >> >>>>> >> >>>>>> Yes. >> >>>>>> I use hbase API to create it. >> >>>>>> >> >>>>>> The main code is: >> >>>>>> >> >>>>>> byte[][] SPLIT_KEYS = { {'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, >> {'6'}, {'7'},{'8'}, {'9'} }; >> >>>>>> TableName tableName = TableName.valueOf("browser_action2"); >> >>>>>> >> >>>>>> HTableDescriptor tableDesc = new HTableDescriptor(tableName); >> >>>>>> HColumnDescriptor columnDesc = new HColumnDescriptor("v"); >> >>>>>> tableDesc.addFamily(columnDesc); >> >>>>>> >> >>>>>> columnDesc.setCompressionType(Compression.Algorithm.SNAPPY); >> >>>>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF); >> >>>>>> >> >>>>>> admin.createTable(tableDesc, SPLIT_KEYS); >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> 2016-05-18 1:48 GMT+08:00 Zelaine Fong <zf...@maprtech.com>: >> >>>>>> >> >>>>>>> Can you provide the CREATE TABLE statement you used to reproduce >> this >> >>>>>>> problem so we can try to reproduce it on our end. >> >>>>>>> >> >>>>>>> Thanks. >> >>>>>>> >> >>>>>>> -- Zelaine >> >>>>>>> >> >>>>>>> On Tue, May 17, 2016 at 4:50 AM, qiang li <tiredqi...@gmail.com> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> > Hi , >> >>>>>>> > >> >>>>>>> > I recently meet a issue that can not query the correct data from >> >>>>>>> hbase with >> >>>>>>> > sql by drill, can anybody help me. >> >>>>>>> > >> >>>>>>> > I test with the drill 1.6. >> >>>>>>> > My hbase scheme: >> >>>>>>> > rowkey: salt+day+event+uid + ts , eg: 120160411visituidts >> >>>>>>> > cf : v >> >>>>>>> > qualifier: v, e0, e1 >> >>>>>>> > >> >>>>>>> > The wrong result only happened when I use group by clause. >> >>>>>>> > >> >>>>>>> > This sql will not return correct result: >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k, count(a.`v`.`e0`) >> p >> >>>>>>> from >> >>>>>>> > hbase.browser_action2 a where a.row_key > '0' group by >> a.`v`.`e0`; >> >>>>>>> > Part of explain of this sql is: >> >>>>>>> > >> >>>>>>> > 0: jdbc:drill:zk=rfdc5> explain plan for select >> >>>>>>> CONVERT_FROM(a.`v`.`e0`, >> >>>>>>> > 'UTF8') as k, count(a.`v`.`e0`) p from hbase.browser_action2 a >> >>>>>>> where >> >>>>>>> > a.row_key > '0' group by a.`v`.`e0`; >> >>>>>>> > +------+------+ >> >>>>>>> > | text | json | >> >>>>>>> > +------+------+ >> >>>>>>> > | 00-00 Screen >> >>>>>>> > 00-01 Project(k=[$0], p=[$1]) >> >>>>>>> > 00-02 UnionExchange >> >>>>>>> > 01-01 Project(k=[CONVERT_FROMUTF8($0)], p=[$1]) >> >>>>>>> > 01-02 HashAgg(group=[{0}], p=[$SUM0($1)]) >> >>>>>>> > 01-03 Project($f0=[$0], p=[$1]) >> >>>>>>> > 01-04 HashToRandomExchange(dist0=[[$0]]) >> >>>>>>> > 02-01 UnorderedMuxExchange >> >>>>>>> > 03-01 Project($f0=[$0], p=[$1], >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) >> >>>>>>> > 03-02 HashAgg(group=[{0}], p=[COUNT($0)]) >> >>>>>>> > 03-03 Project($f0=[ITEM($1, 'e0')]) >> >>>>>>> > 03-04 Scan(groupscan=[HBaseGroupScan >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec [tableName=browser_action2, >> >>>>>>> startRow=0\x00, >> >>>>>>> > stopRow=, filter=null], columns=[`*`]]]) >> >>>>>>> > >> >>>>>>> > The data return very quickly , the result of this sql is : >> >>>>>>> > +------+--------+ >> >>>>>>> > | k | p | >> >>>>>>> > +------+--------+ >> >>>>>>> > | pay | 12180 | >> >>>>>>> > +------+-------- >> >>>>>>> > >> >>>>>>> > But I have millons of data in the table. >> >>>>>>> > >> >>>>>>> > I tried to change the physical plan. if I change the json >> explain >> >>>>>>> > *"columns" >> >>>>>>> > : [ "`*`" ]* to *"columns" : [ "`v`.`e0`" ] *, it will return >> the >> >>>>>>> correct >> >>>>>>> > result. >> >>>>>>> > >> >>>>>>> > It seems the physical plan is not correct. >> >>>>>>> > I also try to debug the sql parser to find out the reason, but >> its >> >>>>>>> too >> >>>>>>> > complicate. Can anyone help me. >> >>>>>>> > >> >>>>>>> > Also this sql have the same issue. >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key, 1 , 9), 'UTF8') as >> k, >> >>>>>>> > count(a.row_key) p from hbase.browser_action2 a group by >> >>>>>>> > BYTE_SUBSTR(a.row_key, 1 , 9); >> >>>>>>> > I change the json explain *"columns" : [ "`*`" ]* to >> *"columns" : >> >>>>>>> [ >> >>>>>>> > "`row_key`" ] *, it will return the correct result. >> >>>>>>> > >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> > >