[ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni resolved DRILL-4704. ------------------------------- Resolution: Fixed Fix Version/s: (was: Future) 1.8.0 Fixed in commit: aaf220ffdddaa329e38237d69b438f4e09d16a87 > select statement behavior is inconsistent for decimal values in parquet > ----------------------------------------------------------------------- > > Key: DRILL-4704 > URL: https://issues.apache.org/jira/browse/DRILL-4704 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.6.0 > Environment: Windows 7 Pro, Java 1.8.0_91 > Reporter: Dave Oshinsky > Fix For: 1.8.0 > > > A select statement that searches a parquet file for a decimal value matching > a specific value behaves inconsistently. The query expressed most simply > finds nothing: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIRE_DATE | > +--------------+-------------+------------+--------+---------------+-----------+ > +--------------+-------------+------------+--------+---------------+-----------+ > No rows selected (0.348 seconds) > The query can be modified to find the matching row in a few ways, such as the > following (using between instead of '=', changing 100 to 100.0, or casting as > decimal: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id between 100 and 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.226 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100.0; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.259 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > cast(employee_id AS DECIMAL) = 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.232 seconds) > 0: jdbc:drill:zk=local> > The schema of the parquet data that is being searched is as follows: > $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet > file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet > creator: parquet-mr version 1.8.1 (build > 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) > ..... > file schema: HR.EMPLOYEES > -------------------------------------------------------------------------------- > EMPLOYEE_ID: REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0 > FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1 > LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0 > EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0 > PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1 > HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0 > JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0 > SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1: RC:107 TS:9943 OFFSET:4 > -------------------------------------------------------------------------------- > EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 > VC:107 ENC:PLAIN,BIT_PACKED > FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 > ENC:PLAIN,BIT_PACKED > EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 > ENC:PLAIN,BIT_PACKED > PHONE_NUMBER: BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 > ENC:PLAIN,RLE,BIT_PACKED > HIRE_DATE: BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 > ENC:PLAIN_DICTIONARY,BIT_PACKED > JOB_ID: BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 > ENC:PLAIN_DICTIONARY,BIT_PACKED > SALARY: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > COMMISSION_PCT: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > MANAGER_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > DEPARTMENT_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 > VC:107 ENC:PLAIN,RLE,BIT_PACKED -- This message was sent by Atlassian JIRA (v6.3.4#6332)