[ https://issues.apache.org/jira/browse/DRILL-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
benj updated DRILL-7595: ------------------------ Description: like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_0 | 1500 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+ {code} With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`manyrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | INT | +--------+ {code} It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`manyrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+ {code} was: like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_0 | 1500 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+ {code} With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | INT | +--------+ {code} It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) {code:sql} apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`fewrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+ {code} > Change of data type from bigint to int when parquet with multiple fragment > -------------------------------------------------------------------------- > > Key: DRILL-7595 > URL: https://issues.apache.org/jira/browse/DRILL-7595 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.17.0 > Reporter: benj > Priority: Major > > like on DRILL-7104, there is a bug that change the type from BIGINT to INT > where a parquet have multiple fragment > With a file containing few row (all is fine (we store a BIGINT and really > have a BIGINT in the Parquet) > {code:sql} > apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS > d FROM dfs.tmp.`fewrowfile`; > +----------+---------------------------+ > | Fragment | Number of records written | > +----------+---------------------------+ > | 1_0 | 1500 | > +----------+---------------------------+ > apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; > +--------+ > | EXPR$0 | > +--------+ > | BIGINT | > +--------+ > {code} > With a file containing "enough" row (there is a problem (we store a BIGINT > but we unfortunatly have an INT in the Parquet) > {code:sql} > apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS > d FROM dfs.tmp.`manyrowfile`; > +----------+---------------------------+ > | Fragment | Number of records written | > +----------+---------------------------+ > | 1_1 | 934111 | > | 1_0 | 1488743 | > +----------+---------------------------+ > apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; > +--------+ > | EXPR$0 | > +--------+ > | INT | > +--------+ > {code} > > It's not really satisfactory but please note that there is a Trick to avoid > this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT) > {code:sql} > apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) > AS d FROM dfs.tmp.`manyrowfile`; > +----------+---------------------------+ > | Fragment | Number of records written | > +----------+---------------------------+ > | 1_1 | 934111 | > | 1_0 | 1488743 | > +----------+---------------------------+ > apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; > +--------+ > | EXPR$0 | > +--------+ > | BIGINT | > +--------+ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)