Matt McCline created HIVE-8197: ---------------------------------- Summary: Tez and Vectorization Insert into ORC Table with timestamp column erroneously repeats the last row's column value Key: HIVE-8197 URL: https://issues.apache.org/jira/browse/HIVE-8197 Project: Hive Issue Type: Bug Environment: Tez and Vectorization. Reporter: Matt McCline Assignee: Matt McCline Priority: Critical
In diagnosing why a only(?) a Tez and Vectorized query with min and max aggregates was always returning the last row read's column value, discovered the problem was in creating the test table.... {code} CREATE TABLE alltypesorc_string STORED AS ORC AS SELECT ctinyint as ctinyint, to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') as ctimestamp1, CAST(to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') AS STRING) as stimestamp1 FROM alltypesorc WHERE ctinyint > 0 LIMIT 40; {code} I think it is related what Prasanth mentioned as a possibility: Saving a Timestamp as a Writable object that gets overwritten. One suspect is the Writable[] records array in VectorFileSinkOperator in the ProcessOp method. Or, perhaps it is in VectorReduceSinkOperator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)