OK, I think I have completed the initial changes for the new interval type
in https://github.com/apache/arrow/pull/10177
The code changes still need to be reviewed, but I don't think that should
stop a vote. I'll start a vote on Monday unless there are more comments on
the format changes.
Thanks
As an update, I've gotten basic integration testing working in Java and C++
along with the format proposal updates [1].
I have a little bit more work to do on the initial implementations (make CI
happy, add unit tests in Java) but I think this is getting close to the
point that we can vote on it.
Ah, that makes sense to wait then.
On Thu, May 6, 2021 at 10:55 AM Micah Kornfield wrote:
>
> I'll address the feedback. I think in the past we've waited for
> implementations in java and c++ with integration tests before formally
> voting. If there is no more feedback I can start looking at
I'll address the feedback. I think in the past we've waited for
implementations in java and c++ with integration tests before formally
voting. If there is no more feedback I can start looking at
implementations (happy to have help)
On Thursday, May 6, 2021, Wes McKinney wrote:
> The PR looks g
The PR looks good. I just left some comments about typos. I would say
it's probably about time to call a vote. Anywhere else where we should
be soliciting feedback?
On Mon, May 3, 2021 at 2:17 PM Jacek Pliszka wrote:
>
> Good idea, I've created JIRA issue:
>
> https://issues.apache.org/jira/brows
Good idea, I've created JIRA issue:
https://issues.apache.org/jira/browse/ARROW-12637
And named it range to avoid confusion with intervals...
Though confusion will stay as it is called interval in Pandas and in
logic (Allen's interval algebra)
BR,
Jacek
pon., 3 maj 2021 o 18:05 Micah Kornfield
Hi Jacek,
This seems like reasonable functionality. I think the probably comes in
two parts:
1. This might be a good candidate for a "Well Known"/Officially supported
Extension type. I can think of a few different representations but I would
guess something like Struct[start: T, struct: end]] wit
Sorry, my mistake.
You are right - I meant anchored intervals as in pandas - ones with
defined start and end - and I think many future users will make the
same mistake.
I would love to be able to do fast overlap joins on arrow level.
Best Regards,
Jacek
niedz., 2 maj 2021 o 23:06 Wes McKinn
I also don't understand the comment about closed / open / semi-open
intervals. Perhaps there is a confusion, since "interval" as we mean
it here is called a "time delta" in some other projects. An interval
here does not refer to a time span with a distinct start and end point
(I understand this mig
Hi Jacek,
I'm not sure I fully understand the proposal, could you elaborate with more
examples/details? For instance DAY_TIME isn't just a UINT64, it actually
contains 2 seperate fields (days and milliseconds).
In terms of closed vs half-open, in my limited understanding, that is more
a concern o
Hi!
I wonder if it were possible to have generic interval with integers of
specified size just to have common base for interval arithmetic.
Then user can convert their period to ordinals and use the arithmetic
(joining, deoverlapping, common parts, explosion etc.).
So YEAR_MONTH and DAY_TIME wou
I believe I've addressed all outstanding feedback on the PR, are there
other thoughts on this that should be discussed before we move forward
towards an implementation / voting plan?
On Tue, Apr 27, 2021 at 6:00 PM Wes McKinney wrote:
> Thanks Micah — I commented in the PR. Once we've settled on
Thanks Micah — I commented in the PR. Once we've settled on the details we
can come up with an implementation / vote plan
On Tue, Apr 27, 2021 at 1:12 PM Micah Kornfield
wrote:
> To nudge this along I opend up https://github.com/apache/arrow/pull/10177
>
> Comments welcome.
>
> On Sun, Apr 11, 2
To nudge this along I opend up https://github.com/apache/arrow/pull/10177
Comments welcome.
On Sun, Apr 11, 2021 at 9:38 PM Micah Kornfield
wrote:
> If there are no more comments on this maybe we should update the original
> RFC PR and ensure we are OK with it in principle (Dmitry do you want t
If there are no more comments on this maybe we should update the original
RFC PR and ensure we are OK with it in principle (Dmitry do you want to do
this or should we start a new PR)? I can try to work on the C++/Python and
Java code in the next few weeks.
On Sun, Apr 4, 2021 at 1:35 PM Micah Ko
>
> Looking more at the Postgres spec and storage details, I'd be
> supportive of having a COMPLEX interval type which could be a packed
> type (possibly using the same 16-byte storage layout as Postgres --
> depending on whether this complex interval needs granularity smaller
> than seconds, more
Looking more at the Postgres spec and storage details, I'd be
supportive of having a COMPLEX interval type which could be a packed
type (possibly using the same 16-byte storage layout as Postgres --
depending on whether this complex interval needs granularity smaller
than seconds, more analysis nee
>
> However it seems a little unfortunate that there is now way to represent a
> "common" interval like "1 week and 1 hour" with native arrow types
I might have misunderstood,but at least in postgres, I thought this boils
down to "0 months, 7 days, 3600 seconds". Since months is 0, this seems
li
I think it is plausible that we use Arrow structs to create a synthetic
interval type for DataFusion (I don't have a compelling usecase to store
the intervals themselves, or to expose them outside of DataFusion).
However it seems a little unfortunate that there is now way to represent a
"common" i
>
> The real usecase I have is "postgres compatibility"
Yeah, I'm a little conflicted on this. A broader analysis might be
necessary and I'd welcome others thoughts, but at what point should we
mostly consider the type system closed? Should we be aiming for full
parity with ANSI SQL/Postgres SQ
The real usecase I have is "postgres compatibility" - in the sense that we
can write SQL queries / expressions that use postgres interval type [1] and
corresponding expressions with the full postgres interval range. I have no
known need for the actual postgres timestamp internal representation.
A
Andrew is the use-case you have simply postgres compatibility or is it more
extensive?
One potential problem with combining Month and Day fields, is that the type
no longer has a defined sort order (the existing Day-Millisecond type
without assumptions, in particular because I don't think today th
Le 31/03/2021 à 17:55, Micah Kornfield a écrit :
Thanks for the feedback. A couple of points here and some responses below.
* One other question is whether the Nanoseconds should actually be
configurable (i.e. use milliseconds or microseconds). I would lean towards
no.
Same for me.
* I'm
Thanks for the feedback. A couple of points here and some responses below.
* One other question is whether the Nanoseconds should actually be
configurable (i.e. use milliseconds or microseconds). I would lean towards
no.
* I'm also still not 100% convinced we need this as a first class type in
I would favour the following characteristics :
- support for nanoseconds (especially as other Arrow temporal types
support it)
- easy to handle (which excludes the ZetaSQL representtaion IMHO)
OTOH I don't really understand the point of supporting "the most
reasonable ranges for Year, Month
I agree with you that having fixed precision (e.g. postgres / zetasql) is
reasonable. Variable precision fields (ala SQL Server/Oracle) seem less
valuable to me.
I think support for nanosecond precision for intervals is important as
there are nanosecond precision timestamps
I don't think the post
To follow-up on this conversation I did some analysis on interval types:
https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit
Please feel free to add more details/systems I missed.
Given the disparate requirements of different systems I think the following
might
>
> I didn’t find any page/documentation on how to do RFC in Arrow protocol,
> so can anyone point me to it or PR with email will be enough?
That is enough to start discussion. Before formal acceptance and merging
of the PR there needs to be a Java and C++ implementations for the type
that pass i
That is a great suggestion Wes, thank you.
I wonder if we could get away with a 128 bit representation that is the
concatenation of the two existing interval types (YearMonth)(DayTime). Or
maybe even define a `struct` type with those fields that is used by
DataFusion.
Basically, given our reading
On Wed, Feb 17, 2021 at 3:46 PM wrote:
>
> > It's unclear to me that this needs to be introduced into the top-level
>
> Similar thing to columnar format, How to store interval like 1 month 1 day 1
> hour? It’s not possible to do it without converting 1 month to 30 days, which
> is a bad way.
>
> It's unclear to me that this needs to be introduced into the top-level
Similar thing to columnar format, How to store interval like 1 month 1 day 1
hour? It’s not possible to do it without converting 1 month to 30 days, which
is a bad way.
> On 17 Feb 2021, at 21:02, Wes McKinney wrote:
>
>
It's unclear to me that this needs to be introduced into the top-level
columnar format without more analysis — have you considered
implementing this for DataFusion as an extension type for the time
being?
On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me wrote:
>
> Hi,
>
> For now, There are only tw
Hi,
For now, There are only two types of IntervalUnit inside Arrow:
- YearMonth - month stored as int32
- DayTime - days as int32 and time in milliseconds as in32. Total (64 bites)
Since DF is using Arrow, It’s not possible to store “Complex” intervals such 1
MONTH 1 DAY 1 HOUR.
I think, the b
33 matches
Mail list logo