Matt McCline created HIVE-8197:
----------------------------------

             Summary: Tez and Vectorization Insert into ORC Table with 
timestamp column erroneously repeats the last row's column value
                 Key: HIVE-8197
                 URL: https://issues.apache.org/jira/browse/HIVE-8197
             Project: Hive
          Issue Type: Bug
         Environment: Tez and Vectorization.
            Reporter: Matt McCline
            Assignee: Matt McCline
            Priority: Critical


In diagnosing why a only(?) a Tez and Vectorized query with min and max 
aggregates was always returning the last row read's column value, discovered 
the problem was in creating the test table....

{code}
CREATE TABLE alltypesorc_string STORED AS ORC AS SELECT
  ctinyint as ctinyint,
  to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') as ctimestamp1,
  CAST(to_utc_timestamp(ctimestamp1, 'America/Los_Angeles') AS STRING) as 
stimestamp1
FROM alltypesorc WHERE ctinyint > 0
LIMIT 40;
{code}

I think it is related what Prasanth mentioned as a possibility: Saving a 
Timestamp as a Writable object that gets overwritten.  One suspect is the 
Writable[] records array in VectorFileSinkOperator in the ProcessOp method.  
Or, perhaps it is in VectorReduceSinkOperator.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to