On Mon, Jun 21, 2021 at 4:58 PM Antoine Pitrou wrote:
>
> I certainly don't think we should have extension types with a different
> type id. IMHO, it's a recipe for confusion.
>
Thanks, I think I got confused by the different perspectives in the thread.
I'll do some more exploratory coding with
I certainly don't think we should have extension types with a different
type id. IMHO, it's a recipe for confusion.
Regards
Antoine.
Le 21/06/2021 à 15:54, Simon Perkins a écrit :
To put it another way, an Extension Type technically has Type::EXTENSION,
but now there's Type::COMPLEX_FLOA
To put it another way, an Extension Type technically has Type::EXTENSION,
but now there's Type::COMPLEX_FLOAT and Type::COMPLEX_DOUBLE.
When checking enums, the code see's a Type::COMPLEX_FLOAT and seems to
mismatch on ComplexFloatType::Type::type_id, as the latter is
Type::EXTENSION?
On Mon, Jun
I did some exploratory coding adding Complex Numbers as ExtensionTypes in
this PR: https://github.com/apache/arrow/pull/10565
> My understanding is that it means having COMPLEX as an entry in the
arrow/type_fwd.h Type enum. I agree this would make implementation
work in the C++ library much more s
>
> It would still be desirable to maintain the memory layout of C/C++/NumPy to
> maintain zero-copy.
> FixedList[2] maintains this layout, while a Struct[re, im] does not.
I noted this before but there are some gaps in Parquet support for
FixedSizeList around null handling. Just something to be
Le 14/06/2021 à 10:54, Simon Perkins a écrit :
> The reason why I am being nit-picky here is I think that having a first
class type indicates that it should eventually be supported by all
reference implementations. An "well known" extension type I think offers
less guarantees which makes it
> The reason why I am being nit-picky here is I think that having a first
class type indicates that it should eventually be supported by all
reference implementations. An "well known" extension type I think offers
less guarantees which makes it seem more suitable for niche types.
What are the re
> What I conclude is that this does not seem to be a problem about a base
> in-memory representation, but rather on whether we agree on a
> representation that justifies adding associated metadata to the spec.
>
It would still be desirable to maintain the memory layout of C/C++/NumPy to
maintain z
Thanks Micah. Those criteria seem reasonable (and that discussion was
recent enough that my memory of it should have been sharper). I've created
https://issues.apache.org/jira/browse/ARROW-13055 so that we can document
this decision. IMO we don't need a vote on these criteria--seems like there
was
>
> It might help this discussion and future discussions like it if we could
> define how it is determined whether a type should be part of the Arrow
> format, an extension type (and what does it mean to say there is a
> "canonical" extension type), or just something that a language
> implementati
Isn't an array of complexes represented by what arrow already supports? In
particular, I see at least two valid in-memory representations to use, that
depend on what we are going to do with it:
* Struct[re, im]
* FixedList[2]
In the first case, we have two buffers, [x0, x1, ...] and [y0, y1, ...]
It might help this discussion and future discussions like it if we could
define how it is determined whether a type should be part of the Arrow
format, an extension type (and what does it mean to say there is a
"canonical" extension type), or just something that a language
implementation or downst
>
> My understanding is that it means having COMPLEX as an entry in the
> arrow/type_fwd.h Type enum. I agree this would make implementation
> work in the C++ library much more straightforward.
One idea I proposed would be to do that, and implement the
> serialization of the complex metadata using
My understanding is that it means having COMPLEX as an entry in the
arrow/type_fwd.h Type enum. I agree this would make implementation
work in the C++ library much more straightforward.
One idea I proposed would be to do that, and implement the
serialization of the complex metadata using Extension
> While dedicated types are not strictly required, compute functions would
> be much easier to add for a first-class dedicated complex datatype
> rather than for an extension type.
@pitrou
This is perhaps a naive question (and admittedly, I'm not up to speed
on my compute kernels) but why is this
I'd be supportive of starting with this as a "canonical" extension
type so that all implementations are not expected to support complex
types — this would encourage us to build sufficient integration e.g.
with NumPy to get things working end-to-end with the on-wire
representation being an extension
>
> I'm convinced now that first-class types seem to be the way to go and I'm
> happy to take this approach.
I agree from an implementation effort it is simpler, but I'm still not
convinced that we should be adding this as a first class type. As noted in
the survey below it appears Complex numbe
On Wed, Jun 9, 2021 at 7:56 PM Antoine Pitrou wrote:
>
> Le 09/06/2021 à 17:52, Micah Kornfield a écrit :
> >
> > Adding a new first-class type in Arrow requires working integration tests
> > between C++ and Java libraries (once the idea is informally agreed upon)
> > and then a final vote for ap
On Wed, Jun 9, 2021 at 11:25 PM Wes McKinney wrote:
> I think that having a top-level type for complex numbers would be
> nicer than extension types
Agreed. As Micha mentioned, adding these types don't seem to interfere with
any existing protocol, I'd like to take this approach going forward.
Le 10/06/2021 à 09:20, Simon Perkins a écrit :
Ah so Arrow Structs are represented as a Struct of Arrays (SoA) vs an Array
of Structs (AoS)?
If you are not familiar with the Arrow format, I would suggest you start
by reading https://arrow.apache.org/docs/format/Columnar.html
(see "Struct
Hi Micah
Please see a recent discussion on adding new types [1]
>
Thanks, this is useful.
> My understanding is that feather.fbs is for V1 feather files and probably
> shouldn't be touched. Only updating schema.fbs should be required and the
> type should be doable in a backwards/forwards com
On Wed, 9 Jun 2021 15:34:41 -0700
Micah Kornfield wrote:
> Hi Antoine,
> In regards to conceptual simplicity, I might have misinterpreted when you
> wrote:
>
> Since complex numbers are quite common in some domains, and since they
> > are conceptually simply,
>
>
> It seemed like a justificat
Hi Antoine,
In regards to conceptual simplicity, I might have misinterpreted when you
wrote:
Since complex numbers are quite common in some domains, and since they
> are conceptually simply,
It seemed like a justification for adding them as a first class type.
Thanks,
Micah
On Wed, Jun 9, 202
Le 10/06/2021 à 00:05, Micah Kornfield a écrit :
While dedicated types are not strictly required, compute functions would
be much easier to add for a first-class dedicated complex datatype
rather than for an extension type.
It seems like maybe this is an area to focus on? I'm not sure conce
>
> While dedicated types are not strictly required, compute functions would
> be much easier to add for a first-class dedicated complex datatype
> rather than for an extension type.
It seems like maybe this is an area to focus on? I'm not sure conceptually
simple is the right criteria to apply
I think that having a top-level type for complex numbers would be
nicer than extension types, so it would look like
table Complex {
precision: Precision;
}
and the representation is a packed tuple of two floating point numbers
of the indicated precision (I think this is the standard way that
pe
Le 09/06/2021 à 17:52, Micah Kornfield a écrit :
Adding a new first-class type in Arrow requires working integration tests
between C++ and Java libraries (once the idea is informally agreed upon)
and then a final vote for approval. We haven't formalized extension types
but I imagine a similar
Hi Simon,
Please see a recent discussion on adding new types [1]
- Adding first class complex types seems to involve modifying
>cpp/src/arrow/ipc/feather.fbs which may change the protocol and
> introduce
>breaking changes. I'm not sure about this and seek advice on how
> invasive
>t
Greetings Apache Dev Mailing List
I'm interested in adding complex number support to Arrow. The use case is
Radio Astronomy data, which is represented by complex values.
xref https://issues.apache.org/jira/browse/ARROW-638
xref https://github.com/apache/arrow/pull/10452
It's fairly easy to suppo
29 matches
Mail list logo