[jira] [Resolved] (DRILL-3895) Can not join on int96 column coming from two different sources: hive and impala

Victoria Markman (JIRA) Mon, 05 Oct 2015 17:42:07 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Victoria Markman resolved DRILL-3895.
-------------------------------------
    Resolution: Invalid

> Can not join on int96 column coming from two different sources: hive and 
> impala
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-3895
>                 URL: https://issues.apache.org/jira/browse/DRILL-3895
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>
> I would think that cross source join on a column of int96 type should work in 
> drill, since we convert it to varbinary internally. It is very possible that 
> I'm hitting a bug related to varbinary data type ...
> {code}
> Hive generated parquet: dfs.`test/type_test`
> [Mon Oct 05 09:37:37] # ~/parquet-tools/parquet-schema 000000_0
> message hive_schema {
>   optional int32 num;
>   optional binary word (UTF8);
>   optional int96 dtg;
>   optional double dollar;
> }
> Implala generated parquet: dfs.`drill/testdata/subqueries/imp_t1`
> [Mon Oct 05 09:38:40 ] # ~/parquet-tools/parquet-schema 
> 243293260064ba0-808af32a4ab4e487_393209663_data.0.parq
> message schema {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional binary c_date (UTF8);
>   optional binary c_time (UTF8);
>   optional int96 c_timestamp;
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> 0: jdbc:drill:schema=dfs> select count(*) from `test/type_test`;
> +---------+
> | EXPR$0  |
> +---------+
> | 2       |
> +---------+
> 1 row selected (0.3 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from 
> `drill/testdata/subqueries/imp_t1`;
> +---------+
> | EXPR$0  |
> +---------+
> | 10000   |
> +---------+
> 1 row selected (0.259 seconds)
> {code}
> *Join 'AS IS' on int96 column results in an error*
> -- IN clause
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from 
> `drill/testdata/subqueries/imp_t1` where c_timestamp IN ( select c_timestamp 
> from `test/type_test`);
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, 
> Right type: INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: 8bce65e1-0e8f-45fe-9990-eb980aeae53e on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> -- NOT IN clause
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from 
> `drill/testdata/subqueries/imp_t1` where c_timestamp NOT IN ( select 
> c_timestamp from `test/type_test`);
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, 
> Right type: INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: 4307937f-fbc0-40c7-b2d4-8e4835e79ae8 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> -- JOIN
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`test/type_test` a, 
> dfs.`drill/testdata/subqueries/imp_t1` b where a.c_timestamp = b.c_timestamp;
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
> between 1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, 
> Right type: INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: e80225a3-eeb6-4c5b-bda1-a1f0d13d7edf on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Attempt to explicitly cast to varbinary type*(one of these queries should 
> have returned non zero row count)
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from 
> dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as 
> varbinary(10)) NOT IN ( select cast(c_timestamp as varbinary(10)) from 
> dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.422 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from 
> dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as 
> varbinary(10)) IN ( select cast(c_timestamp as varbinary(10)) from 
> dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> {code}
> *Use CONVERT_FROM function* (one of these queries should have returned non 
> zero row count)
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from 
> dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 
> 'TIMESTAMP_IMPALA') IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') 
> from dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.683 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from 
> dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 
> 'TIMESTAMP_IMPALA') NOT IN ( select CONVERT_FROM(c_timestamp, 
> 'TIMESTAMP_IMPALA') from dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.858 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-3895) Can not join on int96 column coming from two different sources: hive and impala

Reply via email to