Victoria Markman created DRILL-2602: ---------------------------------------
Summary: Throw an error on schema change during streaming aggregation Key: DRILL-2602 URL: https://issues.apache.org/jira/browse/DRILL-2602 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Reporter: Victoria Markman Assignee: Chris Westin We don't recoginize schema change during streaming aggregation when column is a mix of required and optional types. Hash aggregation does throw correct error message. I have a table 'mix' where: {code} [Fri Mar 27 09:46:07 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ls -ltr total 753 -rwxr-xr-x 1 root root 759879 Mar 27 09:41 optional.parquet -rwxr-xr-x 1 root root 9867 Mar 27 09:41 required.parquet [Fri Mar 27 09:46:09 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema optional.parquet message root { optional binary c_varchar (UTF8); optional int32 c_integer; optional int64 c_bigint; optional float c_float; optional double c_double; optional int32 c_date (DATE); optional int32 c_time (TIME); optional int64 c_timestamp (TIMESTAMP); optional boolean c_boolean; optional double d9; optional double d18; optional double d28; optional double d38; } [Fri Mar 27 09:46:41 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema required.parquet message root { required binary c_varchar (UTF8); required int32 c_integer; required int64 c_bigint; required float c_float; required double c_double; required int32 c_date (DATE); required int32 c_time (TIME); required int64 c_timestamp (TIMESTAMP); required boolean c_boolean; required double d9; required double d18; required double d28; required double d38; } {code} Nice error message on hash aggregation: {code} 0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer; +------------+ | EXPR$0 | +------------+ Query failed: Query stopped., Hash aggregate does not support schema changes [ 2bc255ce-c7f9-47bf-80b0-a5c87cfa67be on atsqa4-134.qa.lab:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) {code} On streaming aggregation, exception that is hard for the end user to understand: {code} 0: jdbc:drill:schema=dfs> alter session set `planner.enable_hashagg` = false; +------------+------------+ | ok | summary | +------------+------------+ | true | planner.enable_hashagg updated. | +------------+------------+ 1 row selected (0.067 seconds) 0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer; +------------+ | EXPR$0 | +------------+ Query failed: RemoteRpcException: Failure while running fragment., Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.IntVector but was holding vector class org.apache.drill.exec.vector.NullableIntVector. [ 5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ] [ 5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)