Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-08-13 Thread Micah Kornfield
OK, I think I have completed the initial changes for the new interval type in https://github.com/apache/arrow/pull/10177 The code changes still need to be reviewed, but I don't think that should stop a vote. I'll start a vote on Monday unless there are more comments on the format changes. Thanks

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-08-11 Thread Micah Kornfield
As an update, I've gotten basic integration testing working in Java and C++ along with the format proposal updates [1]. I have a little bit more work to do on the initial implementations (make CI happy, add unit tests in Java) but I think this is getting close to the point that we can vote on it.

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-06 Thread Wes McKinney
Ah, that makes sense to wait then. On Thu, May 6, 2021 at 10:55 AM Micah Kornfield wrote: > > I'll address the feedback. I think in the past we've waited for > implementations in java and c++ with integration tests before formally > voting. If there is no more feedback I can start looking at

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-06 Thread Micah Kornfield
I'll address the feedback. I think in the past we've waited for implementations in java and c++ with integration tests before formally voting. If there is no more feedback I can start looking at implementations (happy to have help) On Thursday, May 6, 2021, Wes McKinney wrote: > The PR looks g

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-06 Thread Wes McKinney
The PR looks good. I just left some comments about typos. I would say it's probably about time to call a vote. Anywhere else where we should be soliciting feedback? On Mon, May 3, 2021 at 2:17 PM Jacek Pliszka wrote: > > Good idea, I've created JIRA issue: > > https://issues.apache.org/jira/brows

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-03 Thread Jacek Pliszka
Good idea, I've created JIRA issue: https://issues.apache.org/jira/browse/ARROW-12637 And named it range to avoid confusion with intervals... Though confusion will stay as it is called interval in Pandas and in logic (Allen's interval algebra) BR, Jacek pon., 3 maj 2021 o 18:05 Micah Kornfield

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-03 Thread Micah Kornfield
Hi Jacek, This seems like reasonable functionality. I think the probably comes in two parts: 1. This might be a good candidate for a "Well Known"/Officially supported Extension type. I can think of a few different representations but I would guess something like Struct[start: T, struct: end]] wit

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-03 Thread Jacek Pliszka
Sorry, my mistake. You are right - I meant anchored intervals as in pandas - ones with defined start and end - and I think many future users will make the same mistake. I would love to be able to do fast overlap joins on arrow level. Best Regards, Jacek niedz., 2 maj 2021 o 23:06 Wes McKinn

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-02 Thread Wes McKinney
I also don't understand the comment about closed / open / semi-open intervals. Perhaps there is a confusion, since "interval" as we mean it here is called a "time delta" in some other projects. An interval here does not refer to a time span with a distinct start and end point (I understand this mig

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-02 Thread Micah Kornfield
Hi Jacek, I'm not sure I fully understand the proposal, could you elaborate with more examples/details? For instance DAY_TIME isn't just a UINT64, it actually contains 2 seperate fields (days and milliseconds). In terms of closed vs half-open, in my limited understanding, that is more a concern o

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-02 Thread Jacek Pliszka
Hi! I wonder if it were possible to have generic interval with integers of specified size just to have common base for interval arithmetic. Then user can convert their period to ordinals and use the arithmetic (joining, deoverlapping, common parts, explosion etc.). So YEAR_MONTH and DAY_TIME wou

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-05-01 Thread Micah Kornfield
I believe I've addressed all outstanding feedback on the PR, are there other thoughts on this that should be discussed before we move forward towards an implementation / voting plan? On Tue, Apr 27, 2021 at 6:00 PM Wes McKinney wrote: > Thanks Micah — I commented in the PR. Once we've settled on

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-27 Thread Wes McKinney
Thanks Micah — I commented in the PR. Once we've settled on the details we can come up with an implementation / vote plan On Tue, Apr 27, 2021 at 1:12 PM Micah Kornfield wrote: > To nudge this along I opend up https://github.com/apache/arrow/pull/10177 > > Comments welcome. > > On Sun, Apr 11, 2

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-27 Thread Micah Kornfield
To nudge this along I opend up https://github.com/apache/arrow/pull/10177 Comments welcome. On Sun, Apr 11, 2021 at 9:38 PM Micah Kornfield wrote: > If there are no more comments on this maybe we should update the original > RFC PR and ensure we are OK with it in principle (Dmitry do you want t

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-11 Thread Micah Kornfield
If there are no more comments on this maybe we should update the original RFC PR and ensure we are OK with it in principle (Dmitry do you want to do this or should we start a new PR)? I can try to work on the C++/Python and Java code in the next few weeks. On Sun, Apr 4, 2021 at 1:35 PM Micah Ko

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-04 Thread Micah Kornfield
> > Looking more at the Postgres spec and storage details, I'd be > supportive of having a COMPLEX interval type which could be a packed > type (possibly using the same 16-byte storage layout as Postgres -- > depending on whether this complex interval needs granularity smaller > than seconds, more

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-04 Thread Wes McKinney
Looking more at the Postgres spec and storage details, I'd be supportive of having a COMPLEX interval type which could be a packed type (possibly using the same 16-byte storage layout as Postgres -- depending on whether this complex interval needs granularity smaller than seconds, more analysis nee

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-02 Thread Micah Kornfield
> > However it seems a little unfortunate that there is now way to represent a > "common" interval like "1 week and 1 hour" with native arrow types I might have misunderstood,but at least in postgres, I thought this boils down to "0 months, 7 days, 3600 seconds". Since months is 0, this seems li

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-02 Thread Andrew Lamb
I think it is plausible that we use Arrow structs to create a synthetic interval type for DataFusion (I don't have a compelling usecase to store the intervals themselves, or to expose them outside of DataFusion). However it seems a little unfortunate that there is now way to represent a "common" i

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-02 Thread Micah Kornfield
> > The real usecase I have is "postgres compatibility" Yeah, I'm a little conflicted on this. A broader analysis might be necessary and I'd welcome others thoughts, but at what point should we mostly consider the type system closed? Should we be aiming for full parity with ANSI SQL/Postgres SQ

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-02 Thread Andrew Lamb
The real usecase I have is "postgres compatibility" - in the sense that we can write SQL queries / expressions that use postgres interval type [1] and corresponding expressions with the full postgres interval range. I have no known need for the actual postgres timestamp internal representation. A

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-04-02 Thread Micah Kornfield
Andrew is the use-case you have simply postgres compatibility or is it more extensive? One potential problem with combining Month and Day fields, is that the type no longer has a defined sort order (the existing Day-Millisecond type without assumptions, in particular because I don't think today th

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Antoine Pitrou
Le 31/03/2021 à 17:55, Micah Kornfield a écrit : Thanks for the feedback. A couple of points here and some responses below. * One other question is whether the Nanoseconds should actually be configurable (i.e. use milliseconds or microseconds). I would lean towards no. Same for me. * I'm

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Micah Kornfield
Thanks for the feedback. A couple of points here and some responses below. * One other question is whether the Nanoseconds should actually be configurable (i.e. use milliseconds or microseconds). I would lean towards no. * I'm also still not 100% convinced we need this as a first class type in

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Antoine Pitrou
I would favour the following characteristics : - support for nanoseconds (especially as other Arrow temporal types support it) - easy to handle (which excludes the ZetaSQL representtaion IMHO) OTOH I don't really understand the point of supporting "the most reasonable ranges for Year, Month

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Andrew Lamb
I agree with you that having fixed precision (e.g. postgres / zetasql) is reasonable. Variable precision fields (ala SQL Server/Oracle) seem less valuable to me. I think support for nanosecond precision for intervals is important as there are nanosecond precision timestamps I don't think the post

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-30 Thread Micah Kornfield
To follow-up on this conversation I did some analysis on interval types: https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit Please feel free to add more details/systems I missed. Given the disparate requirements of different systems I think the following might

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread Micah Kornfield
> > I didn’t find any page/documentation on how to do RFC in Arrow protocol, > so can anyone point me to it or PR with email will be enough? That is enough to start discussion. Before formal acceptance and merging of the PR there needs to be a Java and C++ implementations for the type that pass i

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread Andrew Lamb
That is a great suggestion Wes, thank you. I wonder if we could get away with a 128 bit representation that is the concatenation of the two existing interval types (YearMonth)(DayTime). Or maybe even define a `struct` type with those fields that is used by DataFusion. Basically, given our reading

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread Wes McKinney
On Wed, Feb 17, 2021 at 3:46 PM wrote: > > > It's unclear to me that this needs to be introduced into the top-level > > Similar thing to columnar format, How to store interval like 1 month 1 day 1 > hour? It’s not possible to do it without converting 1 month to 30 days, which > is a bad way. >

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread talk
> It's unclear to me that this needs to be introduced into the top-level Similar thing to columnar format, How to store interval like 1 month 1 day 1 hour? It’s not possible to do it without converting 1 month to 30 days, which is a bad way. > On 17 Feb 2021, at 21:02, Wes McKinney wrote: > >

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread Wes McKinney
It's unclear to me that this needs to be introduced into the top-level columnar format without more analysis — have you considered implementing this for DataFusion as an extension type for the time being? On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me wrote: > > Hi, > > For now, There are only tw

[Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-02-17 Thread t...@dmtry.me
Hi, For now, There are only two types of IntervalUnit inside Arrow: - YearMonth - month stored as int32 - DayTime - days as int32 and time in milliseconds as in32. Total (64 bites) Since DF is using Arrow, It’s not possible to store “Complex” intervals such 1 MONTH 1 DAY 1 HOUR. I think, the b