[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919389#comment-16919389
 ] 

Piotr Findeisen commented on HIVE-21002:
----------------------------------------

[~klcopp] [~zi]  this issue explicitly talks about Avro and Parquet, whereas 
the same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS 
RCFILE;}}).
Has this been addressed too, or should I create a new issue?

> TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and 
> Parquet timestamps written by Hive 2.x incorrectly
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21002
>                 URL: https://issues.apache.org/jira/browse/HIVE-21002
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.1.1
>            Reporter: Zoltan Ivanfi
>            Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Writer time zone (in Hive 2.x)|Reader time zone|Result in Hive 2.x 
> reader|Result in Hive 3.1 reader|
> |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> *00*:00:00.0|2018-01-01 *08*:00:00.0|
> |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 
> *09*:00:00.0|2018-01-01 *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to