Victoria Markman created DRILL-2602:
---------------------------------------

             Summary: Throw an error on schema change during streaming 
aggregation
                 Key: DRILL-2602
                 URL: https://issues.apache.org/jira/browse/DRILL-2602
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
            Reporter: Victoria Markman
            Assignee: Chris Westin


We don't recoginize schema change during streaming aggregation when column is a 
mix of required and optional types.
Hash aggregation does throw correct error message.

I have a table 'mix' where:

{code}
[Fri Mar 27 09:46:07 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] 
# ls -ltr
total 753
-rwxr-xr-x 1 root root 759879 Mar 27 09:41 optional.parquet
-rwxr-xr-x 1 root root   9867 Mar 27 09:41 required.parquet

[Fri Mar 27 09:46:09 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] 
# ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema optional.parquet
message root {
  optional binary c_varchar (UTF8);
  optional int32 c_integer;
  optional int64 c_bigint;
  optional float c_float;
  optional double c_double;
  optional int32 c_date (DATE);
  optional int32 c_time (TIME);
  optional int64 c_timestamp (TIMESTAMP);
  optional boolean c_boolean;
  optional double d9;
  optional double d18;
  optional double d28;
  optional double d38;
}

[Fri Mar 27 09:46:41 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] 
# ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema required.parquet
message root {
  required binary c_varchar (UTF8);
  required int32 c_integer;
  required int64 c_bigint;
  required float c_float;
  required double c_double;  required int32 c_date (DATE);
  required int32 c_time (TIME);
  required int64 c_timestamp (TIMESTAMP);
  required boolean c_boolean;
  required double d9;
  required double d18;
  required double d28;
  required double d38;
}
{code}
Nice error message on hash aggregation:
{code}
0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer;
+------------+
|   EXPR$0   |
+------------+
Query failed: Query stopped., Hash aggregate does not support schema changes [ 
2bc255ce-c7f9-47bf-80b0-a5c87cfa67be on atsqa4-134.qa.lab:31010 ]
java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
        at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
        at sqlline.SqlLine.print(SqlLine.java:1809)
        at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
        at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
        at sqlline.SqlLine.dispatch(SqlLine.java:889)
        at sqlline.SqlLine.begin(SqlLine.java:763)
        at sqlline.SqlLine.start(SqlLine.java:498)
        at sqlline.SqlLine.main(SqlLine.java:460)
{code}

On streaming aggregation, exception that is hard for the end user to understand:
{code}
0: jdbc:drill:schema=dfs> alter session set `planner.enable_hashagg` = false;
+------------+------------+
|     ok     |  summary   |
+------------+------------+
| true       | planner.enable_hashagg updated. |
+------------+------------+
1 row selected (0.067 seconds)

0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer;
+------------+
|   EXPR$0   |
+------------+
Query failed: RemoteRpcException: Failure while running fragment., Failure 
while reading vector.  Expected vector class of 
org.apache.drill.exec.vector.IntVector but was holding vector class 
org.apache.drill.exec.vector.NullableIntVector. [ 
5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ]
[ 5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ]
java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
query.
        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
        at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
        at sqlline.SqlLine.print(SqlLine.java:1809)
        at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
        at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
        at sqlline.SqlLine.dispatch(SqlLine.java:889)
        at sqlline.SqlLine.begin(SqlLine.java:763)
        at sqlline.SqlLine.start(SqlLine.java:498)
        at sqlline.SqlLine.main(SqlLine.java:460)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to