Hi,
I would like to hear the community's opinion about the short-term solution
I described in my previous mail. Do you agree with it or are there any
objections? If it sounds goods to everyone, I will prepare a more detailed
design doc describing the suggested behavior.
Thanks,
Zoltan
On Tue, J
Hi Michael,
To answer this I think we should distinguish between the long-term fix and
the short-term fix.
If understand the replies correctly, everyone agrees that the desired
long-term fix is to have two separate SQL types (TIMESTAMP [WITH|WITHOUT]
TIME ZONE). Because of having separate types,
Hi Zoltan,
I don't fully understand your proposal for table-specific timestamp type
semantics. I think it will be helpful to everyone in this conversation if you
can identify the expected behavior for a few concrete scenarios.
Suppose we have a Hive metastore table hivelogs with a column named
Hi,
We would like to solve the problem of interoperability of existing data,
and that is the main use case for having table-level control. Spark should
be able to read timestamps written by Impala or Hive and at the same time
read back its own data. These have different semantics, so having a sing
Yea I don't see why this needs to be per table config. If the user wants to
configure it per table, can't they just declare the data type on a per
table basis, once we have separate types for timestamp w/ tz and w/o tz?
On Thu, Jun 1, 2017 at 4:14 PM, Michael Allman wrote:
> I would suggest that
I would suggest that making timestamp type behavior configurable and persisted
per-table could introduce some real confusion, e.g. in queries involving tables
with different timestamp type semantics.
I suggest starting with the assumption that timestamp type behavior is a
per-session flag that
Hi,
If I remember correctly, the TIMESTAMP type had UTC-normalized local time
semantics even before Spark 2, so I can understand that Spark considers it
to be the "established" behavior that must not be broken. Unfortunately,
this behavior does not provide interoperability with other SQL engines o
I had asked zoltan to bring this discussion to the dev list because I think
it's a question that extends beyond a single jira (we can't figure out the
semantics of timestamp in parquet if we don't k ow the overall goal of the
timestamp type) and since its a design question the entire community shou
That's just my point 4, isn't it?
On Fri, May 26, 2017 at 1:07 AM, Ofir Manor wrote:
> Reynold,
> my point is that Spark should aim to follow the SQL standard instead of
> rolling its own type system.
> If I understand correctly, the existing implementation is similar to
> TIMESTAMP WITH LOCAL
Reynold,
my point is that Spark should aim to follow the SQL standard instead of
rolling its own type system.
If I understand correctly, the existing implementation is similar to
TIMESTAMP WITH LOCAL TIMEZONE data type in Oracle..
In addition, there are the standard TIMESTAMP and TIMESTAMP WITH TIM
Hi,
Ofir, thanks for your support. My understanding is that many users have the
same problem as you do.
Reynold, thanks for your reply and sorry for the confusion. My personal
e-mail was specifically about your concerns regarding SPARK-12297 and I
started this separate thread because this is abou
Zoltan,
Thanks for raising this again, although I'm a bit confused since I've
communicated with you a few times on JIRA and on private emails to explain
that you have some misunderstanding of the timestamp type in Spark and some
of your statements are wrong (e.g. the except text file part). Not su
Hi Zoltan,
thanks for bringing this up, this is really important to me!
Personally, as a user developing app on top of Spark and other tools, the
current timestamp semantics has been a source of some pain - needing to
undo Spark's "auto-correcting" of timestamps .
It would be really great if we cou
13 matches
Mail list logo