[ https://issues.apache.org/jira/browse/DRILL-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman resolved DRILL-3895. ------------------------------------- Resolution: Invalid > Can not join on int96 column coming from two different sources: hive and > impala > ------------------------------------------------------------------------------- > > Key: DRILL-3895 > URL: https://issues.apache.org/jira/browse/DRILL-3895 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.2.0 > Reporter: Victoria Markman > > I would think that cross source join on a column of int96 type should work in > drill, since we convert it to varbinary internally. It is very possible that > I'm hitting a bug related to varbinary data type ... > {code} > Hive generated parquet: dfs.`test/type_test` > [Mon Oct 05 09:37:37] # ~/parquet-tools/parquet-schema 000000_0 > message hive_schema { > optional int32 num; > optional binary word (UTF8); > optional int96 dtg; > optional double dollar; > } > Implala generated parquet: dfs.`drill/testdata/subqueries/imp_t1` > [Mon Oct 05 09:38:40 ] # ~/parquet-tools/parquet-schema > 243293260064ba0-808af32a4ab4e487_393209663_data.0.parq > message schema { > optional binary c_varchar (UTF8); > optional int32 c_integer; > optional int64 c_bigint; > optional float c_float; > optional double c_double; > optional binary c_date (UTF8); > optional binary c_time (UTF8); > optional int96 c_timestamp; > optional boolean c_boolean; > optional double d9; > optional double d18; > optional double d28; > optional double d38; > } > 0: jdbc:drill:schema=dfs> select count(*) from `test/type_test`; > +---------+ > | EXPR$0 | > +---------+ > | 2 | > +---------+ > 1 row selected (0.3 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from > `drill/testdata/subqueries/imp_t1`; > +---------+ > | EXPR$0 | > +---------+ > | 10000 | > +---------+ > 1 row selected (0.259 seconds) > {code} > *Join 'AS IS' on int96 column results in an error* > -- IN clause > {code} > 0: jdbc:drill:schema=dfs> select count(*) from > `drill/testdata/subqueries/imp_t1` where c_timestamp IN ( select c_timestamp > from `test/type_test`); > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts > between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, > Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: 8bce65e1-0e8f-45fe-9990-eb980aeae53e on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > -- NOT IN clause > {code} > 0: jdbc:drill:schema=dfs> select count(*) from > `drill/testdata/subqueries/imp_t1` where c_timestamp NOT IN ( select > c_timestamp from `test/type_test`); > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts > between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, > Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: 4307937f-fbc0-40c7-b2d4-8e4835e79ae8 on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > -- JOIN > {code} > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`test/type_test` a, > dfs.`drill/testdata/subqueries/imp_t1` b where a.c_timestamp = b.c_timestamp; > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts > between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, > Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: e80225a3-eeb6-4c5b-bda1-a1f0d13d7edf on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > *Attempt to explicitly cast to varbinary type*(one of these queries should > have returned non zero row count) > {code} > 0: jdbc:drill:schema=dfs> select count(*) from > dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as > varbinary(10)) NOT IN ( select cast(c_timestamp as varbinary(10)) from > dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.422 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from > dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as > varbinary(10)) IN ( select cast(c_timestamp as varbinary(10)) from > dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > {code} > *Use CONVERT_FROM function* (one of these queries should have returned non > zero row count) > {code} > 0: jdbc:drill:schema=dfs> select count(*) from > dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, > 'TIMESTAMP_IMPALA') IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') > from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.683 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from > dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, > 'TIMESTAMP_IMPALA') NOT IN ( select CONVERT_FROM(c_timestamp, > 'TIMESTAMP_IMPALA') from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.858 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)