> I¹m getting an error in Hive when executing a query on a table in ORC
>format.

This is not an ORC bug, this looks like a vectorization issue.

Can you try comparing both query plans (³explain <query>²) for the
Execution mode: vectorized markers?

TextFile queries are not vectorized today, since you cannot find if any
column is marked as isRepeating=true in a row-major format.

> SELECT CONCAT(TO_DATE(datetime), '-'),   SUM(gpa)  FROM students_orc
>GROUP BY CONCAT(TO_DATE(datetime), '-Œ);

...
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported
>vector output type: StringGroup
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorCol
>umnSetInfo.java:139)
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKey
>WrapperBatch(VectorHashKeyWrapperBatch.java:521)
>        at 
>org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(V
>ectorGroupByOperator.java:786)

The correct fix would be to handle this query pattern for vectorization
(or automatically disable vectorization, like it has to do for Unions).

Can you log a bug on Apache JIRA against the correct version of hive which
threw this error up?

Cheers,
Gopal

Reply via email to