Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
On Mon, Jun 21, 2021 at 4:58 PM Antoine Pitrou wrote: > > I certainly don't think we should have extension types with a different > type id. IMHO, it's a recipe for confusion. > Thanks, I think I got confused by the different perspectives in the thread. I'll do some more exploratory coding with

Re: Complex Number support in Arrow

2021-06-21 Thread Antoine Pitrou
I certainly don't think we should have extension types with a different type id. IMHO, it's a recipe for confusion. Regards Antoine. Le 21/06/2021 à 15:54, Simon Perkins a écrit : To put it another way, an Extension Type technically has Type::EXTENSION, but now there's Type::COMPLEX_FLOA

Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
To put it another way, an Extension Type technically has Type::EXTENSION, but now there's Type::COMPLEX_FLOAT and Type::COMPLEX_DOUBLE. When checking enums, the code see's a Type::COMPLEX_FLOAT and seems to mismatch on ComplexFloatType::Type::type_id, as the latter is Type::EXTENSION? On Mon, Jun

Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
I did some exploratory coding adding Complex Numbers as ExtensionTypes in this PR: https://github.com/apache/arrow/pull/10565 > My understanding is that it means having COMPLEX as an entry in the arrow/type_fwd.h Type enum. I agree this would make implementation work in the C++ library much more s

Re: Complex Number support in Arrow

2021-06-14 Thread Micah Kornfield
> > It would still be desirable to maintain the memory layout of C/C++/NumPy to > maintain zero-copy. > FixedList[2] maintains this layout, while a Struct[re, im] does not. I noted this before but there are some gaps in Parquet support for FixedSizeList around null handling. Just something to be

Re: Complex Number support in Arrow

2021-06-14 Thread Antoine Pitrou
Le 14/06/2021 à 10:54, Simon Perkins a écrit : > The reason why I am being nit-picky here is I think that having a first class type indicates that it should eventually be supported by all reference implementations. An "well known" extension type I think offers less guarantees which makes it

Re: Complex Number support in Arrow

2021-06-14 Thread Simon Perkins
> The reason why I am being nit-picky here is I think that having a first class type indicates that it should eventually be supported by all reference implementations. An "well known" extension type I think offers less guarantees which makes it seem more suitable for niche types. What are the re

Re: Complex Number support in Arrow

2021-06-14 Thread Simon Perkins
> What I conclude is that this does not seem to be a problem about a base > in-memory representation, but rather on whether we agree on a > representation that justifies adding associated metadata to the spec. > It would still be desirable to maintain the memory layout of C/C++/NumPy to maintain z

Re: Complex Number support in Arrow

2021-06-11 Thread Neal Richardson
Thanks Micah. Those criteria seem reasonable (and that discussion was recent enough that my memory of it should have been sharper). I've created https://issues.apache.org/jira/browse/ARROW-13055 so that we can document this decision. IMO we don't need a vote on these criteria--seems like there was

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > It might help this discussion and future discussions like it if we could > define how it is determined whether a type should be part of the Arrow > format, an extension type (and what does it mean to say there is a > "canonical" extension type), or just something that a language > implementati

Re: Complex Number support in Arrow

2021-06-10 Thread Jorge Cardoso Leitão
Isn't an array of complexes represented by what arrow already supports? In particular, I see at least two valid in-memory representations to use, that depend on what we are going to do with it: * Struct[re, im] * FixedList[2] In the first case, we have two buffers, [x0, x1, ...] and [y0, y1, ...]

Re: Complex Number support in Arrow

2021-06-10 Thread Neal Richardson
It might help this discussion and future discussions like it if we could define how it is determined whether a type should be part of the Arrow format, an extension type (and what does it mean to say there is a "canonical" extension type), or just something that a language implementation or downst

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > My understanding is that it means having COMPLEX as an entry in the > arrow/type_fwd.h Type enum. I agree this would make implementation > work in the C++ library much more straightforward. One idea I proposed would be to do that, and implement the > serialization of the complex metadata using

Re: Complex Number support in Arrow

2021-06-10 Thread Wes McKinney
My understanding is that it means having COMPLEX as an entry in the arrow/type_fwd.h Type enum. I agree this would make implementation work in the C++ library much more straightforward. One idea I proposed would be to do that, and implement the serialization of the complex metadata using Extension

Re: Complex Number support in Arrow

2021-06-10 Thread Weston Pace
> While dedicated types are not strictly required, compute functions would > be much easier to add for a first-class dedicated complex datatype > rather than for an extension type. @pitrou This is perhaps a naive question (and admittedly, I'm not up to speed on my compute kernels) but why is this

Re: Complex Number support in Arrow

2021-06-10 Thread Wes McKinney
I'd be supportive of starting with this as a "canonical" extension type so that all implementations are not expected to support complex types — this would encourage us to build sufficient integration e.g. with NumPy to get things working end-to-end with the on-wire representation being an extension

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > I'm convinced now that first-class types seem to be the way to go and I'm > happy to take this approach. I agree from an implementation effort it is simpler, but I'm still not convinced that we should be adding this as a first class type. As noted in the survey below it appears Complex numbe

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
On Wed, Jun 9, 2021 at 7:56 PM Antoine Pitrou wrote: > > Le 09/06/2021 à 17:52, Micah Kornfield a écrit : > > > > Adding a new first-class type in Arrow requires working integration tests > > between C++ and Java libraries (once the idea is informally agreed upon) > > and then a final vote for ap

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
On Wed, Jun 9, 2021 at 11:25 PM Wes McKinney wrote: > I think that having a top-level type for complex numbers would be > nicer than extension types Agreed. As Micha mentioned, adding these types don't seem to interfere with any existing protocol, I'd like to take this approach going forward.

Re: Complex Number support in Arrow

2021-06-10 Thread Antoine Pitrou
Le 10/06/2021 à 09:20, Simon Perkins a écrit : Ah so Arrow Structs are represented as a Struct of Arrays (SoA) vs an Array of Structs (AoS)? If you are not familiar with the Arrow format, I would suggest you start by reading https://arrow.apache.org/docs/format/Columnar.html (see "Struct

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
Hi Micah Please see a recent discussion on adding new types [1] > Thanks, this is useful. > My understanding is that feather.fbs is for V1 feather files and probably > shouldn't be touched. Only updating schema.fbs should be required and the > type should be doable in a backwards/forwards com

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
On Wed, 9 Jun 2021 15:34:41 -0700 Micah Kornfield wrote: > Hi Antoine, > In regards to conceptual simplicity, I might have misinterpreted when you > wrote: > > Since complex numbers are quite common in some domains, and since they > > are conceptually simply, > > > It seemed like a justificat

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
Hi Antoine, In regards to conceptual simplicity, I might have misinterpreted when you wrote: Since complex numbers are quite common in some domains, and since they > are conceptually simply, It seemed like a justification for adding them as a first class type. Thanks, Micah On Wed, Jun 9, 202

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
Le 10/06/2021 à 00:05, Micah Kornfield a écrit : While dedicated types are not strictly required, compute functions would be much easier to add for a first-class dedicated complex datatype rather than for an extension type. It seems like maybe this is an area to focus on? I'm not sure conce

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
> > While dedicated types are not strictly required, compute functions would > be much easier to add for a first-class dedicated complex datatype > rather than for an extension type. It seems like maybe this is an area to focus on? I'm not sure conceptually simple is the right criteria to apply

Re: Complex Number support in Arrow

2021-06-09 Thread Wes McKinney
I think that having a top-level type for complex numbers would be nicer than extension types, so it would look like table Complex { precision: Precision; } and the representation is a packed tuple of two floating point numbers of the indicated precision (I think this is the standard way that pe

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
Le 09/06/2021 à 17:52, Micah Kornfield a écrit : Adding a new first-class type in Arrow requires working integration tests between C++ and Java libraries (once the idea is informally agreed upon) and then a final vote for approval. We haven't formalized extension types but I imagine a similar

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
Hi Simon, Please see a recent discussion on adding new types [1] - Adding first class complex types seems to involve modifying >cpp/src/arrow/ipc/feather.fbs which may change the protocol and > introduce >breaking changes. I'm not sure about this and seek advice on how > invasive >t

Complex Number support in Arrow

2021-06-08 Thread Simon Perkins
Greetings Apache Dev Mailing List I'm interested in adding complex number support to Arrow. The use case is Radio Astronomy data, which is represented by complex values. xref https://issues.apache.org/jira/browse/ARROW-638 xref https://github.com/apache/arrow/pull/10452 It's fairly easy to suppo