[ https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman closed DRILL-2431. ----------------------------------- > Document behavior of floating point types > ----------------------------------------- > > Key: DRILL-2431 > URL: https://issues.apache.org/jira/browse/DRILL-2431 > Project: Apache Drill > Issue Type: Bug > Components: Documentation > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Kristine Hahn > > Joining on columns of float and double data type produces confusing > result. Drill returns the same result as postgres. Part of me feels that we > should not follow postgres blindly in this case, removing implicit cast > between float and double would be better choice. > At a minimum we should have a section in our documentation that discusses > floating point types and this is a good example on how things can go wrong if > user does not understand the behavior. > Example of such a discussion in Postgres docs: > http://www.postgresql.org/docs/9.1/static/datatype-numeric.html > t1.csv > {code} > 997322.0399,997322.0399 > 982209.1438,982209.1438 > 997322,997322 > 982209,982209 > 963548,963548 > 959310,959310 > {code} > t2.csv > {code} > 997322.0399,997322.0399 > 982209.1438,982209.1438 > 997322,997322 > 982209,982209 > 963548,963548 > 959310,959310 > {code} > {code} > create table t1(c_float, c_double) as > select > case when columns[0] = '' then cast(null as float) else > cast(columns[0] as float) end, > case when columns[1] = '' then cast(null as double) else > cast(columns[1] as double) end > from `t1.csv`; > create table t2(c_float, c_double) as > select > case when columns[0] = '' then cast(null as float) else > cast(columns[0] as float) end, > case when columns[1] = '' then cast(null as double) else > cast(columns[1] as double) end > from `t2.csv`; > 0: jdbc:drill:schema=dfs> select * from t1; > +------------+------------+ > | c_float | c_double | > +------------+------------+ > | 997322.06 | 997322.0399 | > | 982209.1 | 982209.1438 | > | 997322.0 | 997322.0 | > | 982209.0 | 982209.0 | > | 963548.0 | 963548.0 | > | 959310.0 | 959310.0 | > +------------+------------+ > 6 rows selected (0.05 seconds) > 0: jdbc:drill:schema=dfs> select * from t2; > +------------+------------+ > | c_float | c_double | > +------------+------------+ > | 997322.06 | 997322.0399 | > | 982209.1 | 982209.1438 | > | 997322.0 | 997322.0 | > | 982209.0 | 982209.0 | > | 963548.0 | 963548.0 | > | 959310.0 | 959310.0 | > +------------+------------+ > 6 rows selected (0.044 seconds) > {code} > Implicit cast: looks incorrect, but in fact we can't expect this to work. > {code} > 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double; > +------------+------------+------------+------------+ > | c_float | c_double | c_float0 | c_double0 | > +------------+------------+------------+------------+ > | 959310.0 | 959310.0 | 959310.0 | 959310.0 | > | 963548.0 | 963548.0 | 963548.0 | 963548.0 | > | 982209.0 | 982209.0 | 982209.0 | 982209.0 | > | 997322.0 | 997322.0 | 997322.0 | 997322.0 | > +------------+------------+------------+------------+ > 4 rows selected (0.127 seconds) > {code} > Explicit cast: same > {code} > 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as > double) = t2.c_double; > +------------+------------+------------+------------+ > | c_float | c_double | c_float0 | c_double0 | > +------------+------------+------------+------------+ > | 959310.0 | 959310.0 | 959310.0 | 959310.0 | > | 963548.0 | 963548.0 | 963548.0 | 963548.0 | > | 982209.0 | 982209.0 | 982209.0 | 982209.0 | > | 997322.0 | 997322.0 | 997322.0 | 997322.0 | > +------------+------------+------------+------------+ > 4 rows selected (0.136 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)