Victoria Markman created DRILL-3895: ---------------------------------------
Summary: Can not join on int96 column coming from two different sources: hive and impala Key: DRILL-3895 URL: https://issues.apache.org/jira/browse/DRILL-3895 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.2.0 Reporter: Victoria Markman I would think that cross source join on a column of int96 type should work in drill, since we convert it to varbinary internally. It is very possible that I'm hitting a bug related to varbinary data type ... {code} Hive generated parquet: dfs.`test/type_test` [Mon Oct 05 09:37:37] # ~/parquet-tools/parquet-schema 000000_0 message hive_schema { optional int32 num; optional binary word (UTF8); optional int96 dtg; optional double dollar; } Implala generated parquet: dfs.`drill/testdata/subqueries/imp_t1` [Mon Oct 05 09:38:40 ] # ~/parquet-tools/parquet-schema 243293260064ba0-808af32a4ab4e487_393209663_data.0.parq message schema { optional binary c_varchar (UTF8); optional int32 c_integer; optional int64 c_bigint; optional float c_float; optional double c_double; optional binary c_date (UTF8); optional binary c_time (UTF8); optional int96 c_timestamp; optional boolean c_boolean; optional double d9; optional double d18; optional double d28; optional double d38; } 0: jdbc:drill:schema=dfs> select count(*) from `test/type_test`; +---------+ | EXPR$0 | +---------+ | 2 | +---------+ 1 row selected (0.3 seconds) 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1`; +---------+ | EXPR$0 | +---------+ | 10000 | +---------+ 1 row selected (0.259 seconds) {code} *Join 'AS IS' on int96 column results in an error* -- IN clause {code} 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1` where c_timestamp IN ( select c_timestamp from `test/type_test`); Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error Fragment 0:0 [Error Id: 8bce65e1-0e8f-45fe-9990-eb980aeae53e on atsqa4-133.qa.lab:31010] (state=,code=0) {code} -- NOT IN clause {code} 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1` where c_timestamp NOT IN ( select c_timestamp from `test/type_test`); Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error Fragment 0:0 [Error Id: 4307937f-fbc0-40c7-b2d4-8e4835e79ae8 on atsqa4-133.qa.lab:31010] (state=,code=0) {code} -- JOIN {code} 0: jdbc:drill:schema=dfs> select count(*) from dfs.`test/type_test` a, dfs.`drill/testdata/subqueries/imp_t1` b where a.c_timestamp = b.c_timestamp; Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error Fragment 0:0 [Error Id: e80225a3-eeb6-4c5b-bda1-a1f0d13d7edf on atsqa4-133.qa.lab:31010] (state=,code=0) {code} *Attempt to explicitly cast to varbinary type*(one of these queries should have returned non zero row count) {code} 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as varbinary(10)) NOT IN ( select cast(c_timestamp as varbinary(10)) from dfs.`test/type_test`); +---------+ | EXPR$0 | +---------+ | 0 | +---------+ 1 row selected (0.422 seconds) 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as varbinary(10)) IN ( select cast(c_timestamp as varbinary(10)) from dfs.`test/type_test`); +---------+ | EXPR$0 | +---------+ | 0 | +---------+ {code} *Use CONVERT_FROM function* (one of these queries should have returned non zero row count) {code} 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') from dfs.`test/type_test`); +---------+ | EXPR$0 | +---------+ | 0 | +---------+ 1 row selected (0.683 seconds) 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') NOT IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') from dfs.`test/type_test`); +---------+ | EXPR$0 | +---------+ | 0 | +---------+ 1 row selected (0.858 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)