Hello,

I am trying to understand the reasons behind this decision by Impala devs.

>From Impala docs:
http://impala.apache.org/docs/build/html/topics/impala_timestamp.html

By default, Impala stores and interprets TIMESTAMP values in UTC time zone
when writing to data files, reading from data files, or converting to and
from system time values through functions.

And there are there two switches to change this behavior:

use_local_tz_for_unix_timestamp_conversions
convert_legacy_hive_parquet_utc_timestamps (performance killer that has
just been fixed in the latest Impala release which has not made to CDH yet)

My question is what are the thought process and reasons to do this
conversion in the first place  from UTC and having Impala "assume" that
timestamp is always UTC?

This is not how Hive or Spark or anything else I've seen before does it.
This is really unusual and causes tons of confusion if you try to use the
same data set from Hive, Spark and Impala, so when Impala is not the only
thing on a cluster.

And second option, why there is no option NOT to convert the time in the
first place and just use the one which was intended to be stored? So if I
stored 2015-01-01 12:12:00 whatever time zone time is, I still want to see
that exact time in Impala, Hive and Spark and I do not need Impala
converting this time to my local cluster time.

I am sure there is a reason for that just struggling to understand it...

Thanks,
Boris

Reply via email to