Re: proposal for expanded & consistent timestamp types

2019-01-08 Thread Zoltan Ivanfi
Hi, > ORC has long had a timestamp format. If extra attributes are needed on a > timestamp, as long as the default "no metadata" value isn't changed, then at > the file level things should be OK. > > more problematic is: what would happen to an existing app reading in > timestamps and ignoring

Re: proposal for expanded & consistent timestamp types

2019-01-02 Thread Steve Loughran
OK, I've seen the document now. Probably the best summary of timestamps out there I've ever seen. Irrespective of what historical stuff has done, the goal should be "make everything consistent enough that cut and paste SQL queries over the same data works" and "you shouldn't have to care about

Re: proposal for expanded & consistent timestamp types

2019-01-02 Thread Steve Loughran
On 17 Dec 2018, at 17:44, Zoltan Ivanfi mailto:z...@cloudera.com.INVALID>> wrote: Hi, On Sun, Dec 16, 2018 at 4:43 AM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Shall we include Parquet and ORC? If they don't support it, it's hard for general query engines like Spark to support it. Fo

Re: proposal for expanded & consistent timestamp types

2018-12-17 Thread Zoltan Ivanfi
Hi, On Sun, Dec 16, 2018 at 4:43 AM Wenchen Fan wrote: > Shall we include Parquet and ORC? If they don't support it, it's hard for > general query engines like Spark to support it. For each of the more explicit timestamp types we propose a single semantics regardless of the file format. Query

Re: proposal for expanded & consistent timestamp types

2018-12-15 Thread Wenchen Fan
I like this proposal. > We'll get agreement across Spark, Hive, and Impala. Shall we include Parquet and ORC? If they don't support it, it's hard for general query engines like Spark to support it. On Wed, Dec 12, 2018 at 3:36 AM Li Jin wrote: > Of course. I added some comments in the doc. > >

Re: proposal for expanded & consistent timestamp types

2018-12-11 Thread Li Jin
Of course. I added some comments in the doc. On Tue, Dec 11, 2018 at 12:01 PM Imran Rashid wrote: > Hi Li, > > thanks for the comments! I admit I had not thought very much about python > support, its a good point. But I'd actually like to clarify one thing > about the doc -- though it discusse

Re: proposal for expanded & consistent timestamp types

2018-12-11 Thread Imran Rashid
Hi Li, thanks for the comments! I admit I had not thought very much about python support, its a good point. But I'd actually like to clarify one thing about the doc -- though it discusses java types, the point is actually about having support for these logical types at the SQL level. The doc us

Re: proposal for expanded & consistent timestamp types

2018-12-07 Thread Maxim Gekk
Hello Imran, Thank you for bringing this problem up. I have faced to the issue of handling timestamps and dates when I implemented date/timestamp parsing in CSV/JSON datasource: https://github.com/apache/spark/pull/23150 https://github.com/apache/spark/pull/23196 Maxim Gekk Technical Solutions L

Re: proposal for expanded & consistent timestamp types

2018-12-07 Thread Li Jin
Imran, Thanks for sharing this. When working on interop between Spark and Pandas/Arrow in the past, we also faced some issues due to the different definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime o

proposal for expanded & consistent timestamp types

2018-12-06 Thread Imran Rashid
Hi, I'd like to discuss the future of timestamp support in Spark, in particular with respect of handling timezones in different SQL types. In a nutshell: * There are at least 3 different ways of handling the timestamp type across timezone changes * We'd like Spark to clearly distinguish the 3 t