Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
On Mon, Jun 21, 2021 at 4:58 PM Antoine Pitrou wrote: > > I certainly don't think we should have extension types with a different > type id. IMHO, it's a recipe for confusion. > Thanks, I think I got confused by the different perspectives in the thread. I'll do some more exploratory coding with

Re: Complex Number support in Arrow

2021-06-21 Thread Antoine Pitrou
I certainly don't think we should have extension types with a different type id. IMHO, it's a recipe for confusion. Regards Antoine. Le 21/06/2021 à 15:54, Simon Perkins a écrit : To put it another way, an Extension Type technically has Type::EXTENSION, but now there's Type::COMPLEX_FLOA

Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
To put it another way, an Extension Type technically has Type::EXTENSION, but now there's Type::COMPLEX_FLOAT and Type::COMPLEX_DOUBLE. When checking enums, the code see's a Type::COMPLEX_FLOAT and seems to mismatch on ComplexFloatType::Type::type_id, as the latter is Type::EXTENSION? On Mon, Jun

Re: Complex Number support in Arrow

2021-06-21 Thread Simon Perkins
I did some exploratory coding adding Complex Numbers as ExtensionTypes in this PR: https://github.com/apache/arrow/pull/10565 > My understanding is that it means having COMPLEX as an entry in the arrow/type_fwd.h Type enum. I agree this would make implementation work in the C++ library much more s

Re: Complex Number support in Arrow

2021-06-14 Thread Micah Kornfield
> > It would still be desirable to maintain the memory layout of C/C++/NumPy to > maintain zero-copy. > FixedList[2] maintains this layout, while a Struct[re, im] does not. I noted this before but there are some gaps in Parquet support for FixedSizeList around null handling. Just something to be

Re: Complex Number support in Arrow

2021-06-14 Thread Antoine Pitrou
Le 14/06/2021 à 10:54, Simon Perkins a écrit : > The reason why I am being nit-picky here is I think that having a first class type indicates that it should eventually be supported by all reference implementations. An "well known" extension type I think offers less guarantees which makes it

Re: Complex Number support in Arrow

2021-06-14 Thread Simon Perkins
> The reason why I am being nit-picky here is I think that having a first class type indicates that it should eventually be supported by all reference implementations. An "well known" extension type I think offers less guarantees which makes it seem more suitable for niche types. What are the re

Re: Complex Number support in Arrow

2021-06-14 Thread Simon Perkins
> What I conclude is that this does not seem to be a problem about a base > in-memory representation, but rather on whether we agree on a > representation that justifies adding associated metadata to the spec. > It would still be desirable to maintain the memory layout of C/C++/NumPy to maintain z

Re: Complex Number support in Arrow

2021-06-11 Thread Neal Richardson
Thanks Micah. Those criteria seem reasonable (and that discussion was recent enough that my memory of it should have been sharper). I've created https://issues.apache.org/jira/browse/ARROW-13055 so that we can document this decision. IMO we don't need a vote on these criteria--seems like there was

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > It might help this discussion and future discussions like it if we could > define how it is determined whether a type should be part of the Arrow > format, an extension type (and what does it mean to say there is a > "canonical" extension type), or just something that a language > implementati

Re: Complex Number support in Arrow

2021-06-10 Thread Jorge Cardoso Leitão
Isn't an array of complexes represented by what arrow already supports? In particular, I see at least two valid in-memory representations to use, that depend on what we are going to do with it: * Struct[re, im] * FixedList[2] In the first case, we have two buffers, [x0, x1, ...] and [y0, y1, ...]

Re: Complex Number support in Arrow

2021-06-10 Thread Neal Richardson
It might help this discussion and future discussions like it if we could define how it is determined whether a type should be part of the Arrow format, an extension type (and what does it mean to say there is a "canonical" extension type), or just something that a language implementation or downst

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > My understanding is that it means having COMPLEX as an entry in the > arrow/type_fwd.h Type enum. I agree this would make implementation > work in the C++ library much more straightforward. One idea I proposed would be to do that, and implement the > serialization of the complex metadata using

Re: Complex Number support in Arrow

2021-06-10 Thread Wes McKinney
My understanding is that it means having COMPLEX as an entry in the arrow/type_fwd.h Type enum. I agree this would make implementation work in the C++ library much more straightforward. One idea I proposed would be to do that, and implement the serialization of the complex metadata using Extension

Re: Complex Number support in Arrow

2021-06-10 Thread Weston Pace
> While dedicated types are not strictly required, compute functions would > be much easier to add for a first-class dedicated complex datatype > rather than for an extension type. @pitrou This is perhaps a naive question (and admittedly, I'm not up to speed on my compute kernels) but why is this

Re: Complex Number support in Arrow

2021-06-10 Thread Wes McKinney
I'd be supportive of starting with this as a "canonical" extension type so that all implementations are not expected to support complex types — this would encourage us to build sufficient integration e.g. with NumPy to get things working end-to-end with the on-wire representation being an extension

Re: Complex Number support in Arrow

2021-06-10 Thread Micah Kornfield
> > I'm convinced now that first-class types seem to be the way to go and I'm > happy to take this approach. I agree from an implementation effort it is simpler, but I'm still not convinced that we should be adding this as a first class type. As noted in the survey below it appears Complex numbe

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
On Wed, Jun 9, 2021 at 7:56 PM Antoine Pitrou wrote: > > Le 09/06/2021 à 17:52, Micah Kornfield a écrit : > > > > Adding a new first-class type in Arrow requires working integration tests > > between C++ and Java libraries (once the idea is informally agreed upon) > > and then a final vote for ap

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
On Wed, Jun 9, 2021 at 11:25 PM Wes McKinney wrote: > I think that having a top-level type for complex numbers would be > nicer than extension types Agreed. As Micha mentioned, adding these types don't seem to interfere with any existing protocol, I'd like to take this approach going forward.

Re: Complex Number support in Arrow

2021-06-10 Thread Antoine Pitrou
Le 10/06/2021 à 09:20, Simon Perkins a écrit : Ah so Arrow Structs are represented as a Struct of Arrays (SoA) vs an Array of Structs (AoS)? If you are not familiar with the Arrow format, I would suggest you start by reading https://arrow.apache.org/docs/format/Columnar.html (see "Struct

Re: Complex Number support in Arrow

2021-06-10 Thread Simon Perkins
Hi Micah Please see a recent discussion on adding new types [1] > Thanks, this is useful. > My understanding is that feather.fbs is for V1 feather files and probably > shouldn't be touched. Only updating schema.fbs should be required and the > type should be doable in a backwards/forwards com

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
On Wed, 9 Jun 2021 15:34:41 -0700 Micah Kornfield wrote: > Hi Antoine, > In regards to conceptual simplicity, I might have misinterpreted when you > wrote: > > Since complex numbers are quite common in some domains, and since they > > are conceptually simply, > > > It seemed like a justificat

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
Hi Antoine, In regards to conceptual simplicity, I might have misinterpreted when you wrote: Since complex numbers are quite common in some domains, and since they > are conceptually simply, It seemed like a justification for adding them as a first class type. Thanks, Micah On Wed, Jun 9, 202

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
Le 10/06/2021 à 00:05, Micah Kornfield a écrit : While dedicated types are not strictly required, compute functions would be much easier to add for a first-class dedicated complex datatype rather than for an extension type. It seems like maybe this is an area to focus on? I'm not sure conce

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
> > While dedicated types are not strictly required, compute functions would > be much easier to add for a first-class dedicated complex datatype > rather than for an extension type. It seems like maybe this is an area to focus on? I'm not sure conceptually simple is the right criteria to apply

Re: Complex Number support in Arrow

2021-06-09 Thread Wes McKinney
I think that having a top-level type for complex numbers would be nicer than extension types, so it would look like table Complex { precision: Precision; } and the representation is a packed tuple of two floating point numbers of the indicated precision (I think this is the standard way that pe

Re: Complex Number support in Arrow

2021-06-09 Thread Antoine Pitrou
Le 09/06/2021 à 17:52, Micah Kornfield a écrit : Adding a new first-class type in Arrow requires working integration tests between C++ and Java libraries (once the idea is informally agreed upon) and then a final vote for approval. We haven't formalized extension types but I imagine a similar

Re: Complex Number support in Arrow

2021-06-09 Thread Micah Kornfield
Hi Simon, Please see a recent discussion on adding new types [1] - Adding first class complex types seems to involve modifying >cpp/src/arrow/ipc/feather.fbs which may change the protocol and > introduce >breaking changes. I'm not sure about this and seek advice on how > invasive >t