Hi @Stamatis Zampetakis<mailto:[email protected]>,
@David<mailto:[email protected]>,
Our current implementation using DateTimeFormatter is not backward compatible
and it leads to migration issues.
One of our customer who have this use-case where we don't have a better options
to migrate.
Hive 1.2/Spark 2.4 (Shared metastore):
Set VM time zone to Asia/Bangkok.
INSERT values ("1400-01-01 00:00:00") into parquet_table; // Here, parquet
writer converts the data into UTC (- 07:00:00) and stored it.
Migrate to Hive 3.x/Spark 3.x (Shared metastore)::
Set VM time zone to Asia/Bangkok.
SELECT ts from parquet_table; // Hive returns different value whereas Spark
(spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00
It is not easy to change thousands of Hive scripts to handle this difference
and it adds to migration cost.
I think, it is necessary to enable backward compatibility for smooth migration.
Pls share your thoughts.
Thanks,
Sankar
From: Ashish Sharma <[email protected]>
Sent: 29 September 2021 19:11
To: [email protected]; [email protected]
Cc: [email protected]
Subject: [EXTERNAL] Raise exception instead of silent change for new
DateTimeformatter
History
Hive 1.2 -
VM time zone set to Asia/Bangkok
Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
UTC','yyyy-MM-dd HH:mm:ss z'));
Result - 1800-01-01 07:00:00
Implementation details -
SimpleDateFormat formatter = new SimpleDateFormat(pattern);
Long unixtime = formatter.parse(textval).getTime() / 1000;
Date date = new Date(unixtime * 1000L);
https://docs.oracle.com/javase/8/docs/api/java/util/Date.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0>
. In official documentation they have mentioned that "Unfortunately, the API
for these functions was not amenable to internationalization and The
corresponding methods in Date are deprecated" . Due to that this is producing
wrong result
latest hive -
set hive.local.time.zone=Asia/Bangkok;
Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
UTC','yyyy-MM-dd HH:mm:ss z'));
Result - 1800-01-01 06:42:04
Implementation details -
DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendPattern(pattern)
.toFormatter();
ZonedDateTime zonedDateTime =
ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
Long dttime = zonedDateTime.toInstant().getEpochSecond();
Problem-
Now SimpleDateFormat has been replaced with DateTimeFormatter which is not
backward compatible. Causing issues at times for migration to the new version.
Because the older data written using Hive 1.x or 2.x is not compatible with
DateTimeFormatter.
Solution -
Introduce an config "hive.legacy.timeParserPolicy" with following values -
1. EXCEPTION - compare value of both SimpleDateFormat & DateTimeFormatter raise
exception if doesn't match
2. LEGACY - use SimpleDateFormat
3. CORRECTED - use DateTimeFormatter
This will help hive user in the following manner -
1. Migrate to new version using LEGACY
2. Find values which are not compatible with the new version - EXCEPTION
3. Use latest date apis - CORRECTED
Note: apache spark also face the same issue
https://issues.apache.org/jira/browse/SPARK-30668<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0>
Hive jira -
https://issues.apache.org/jira/browse/HIVE-25576<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0>
Thanks
Ashish Sharma