Re: [DISCUSS] Canonical alternative layout proposal

2023-08-05 Thread Felipe Oliveira Carvalho
> I think this is similar to the proposal with the exception that your > suggestion would require amending existing types that happen to be > alternatives to each other. I want to avoid electing one canonical layout for a kind (AKA "logical type"). And the existence of "alternative layouts"

Re: [DISCUSS] Canonical alternative layout proposal

2023-08-02 Thread Weston Pace
> I would welcome a draft PR showcasing the changes necessary in the IPC > format definition, and in the C Data Interface specification (no need to > actually implement them for now :-)). I've proposed something at [1]. > One sketch of an idea: define sets of types that we can call “kinds”** >

Re: [DISCUSS] Canonical alternative layout proposal

2023-08-01 Thread Felipe Oliveira Carvalho
A major difficulty in making the Arrow array types open for extension [1] is that as soon as we define an (a) universal representation* or (b) abstract interface, we close the door for vectorization. (a) prevents having new vectorization friendly formats and (b) limits the implementation of new

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-30 Thread Gang Wu
I am also in favor of the idea of an alternative layout. IIRC, a new alternative layout still goes into a process of standardization though it is the choice of each implementation to decide support now or later. I'd like to ask if we can provide the flexibility for implementations or downstream

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-18 Thread Antoine Pitrou
Hello, I'm trying to reason about the advantages and drawbacks of this proposal, but it seems to me that it lacks definition. I would welcome a draft PR showcasing the changes necessary in the IPC format definition, and in the C Data Interface specification (no need to actually implement

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-14 Thread Andrew Lamb
Thank you Neil for writing this summary and everyone whose thoughts went into the discussions -- I think the proposal, as summarized, offers a great path forward by allowing the various Arrow communities to specialize when advantageous but remain compatible. On Thu, Jul 13, 2023 at 11:59 AM Ian

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Raphael Taylor-Davies
clarify what constitutes support for a canonical alternative layout I had envisaged, perhaps naively, that we would just add a new DataType containing a string layout name, perhaps DataType::Raw(String). This would have no restrictions on the number of buffers, children, etc... and would

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Benjamin Kietzman
Canonical alternative layouts sounds like a workable path forward. Perhaps understandably, my immediate thought is how I could rephrase Utf8View as a canonical alternative layout for Utf8. In light of that, I have a few questions to clarify what constitutes support for a canonical alternative

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Aldrin
Thanks Neal and Weston! I prepared a diagram to solidify my own understanding of the context, which can be found at [1]. I think alternative layouts sounds like a nice first approach to allowing new layouts that can be supported lazily (implemented when it is beneficial) by various

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Dane Pitkin
I am in favor of this proposal. IMO the Arrow project is the right place to standardize both the interoperability *and operability* of columnar data layouts. Data engines are a core component of the Arrow ecosystem and the project should be able to grow with these data engines as they converge on

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Ian Cook
Thank you Weston for proposing this solution and Neal for describing its context and implications. I agree with the other replies here—this seems like an elegant solution to a growing need that could, if left unaddressed, increase the fragmentation of the ecosystem and reduce the centrality of the

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Raphael Taylor-Davies
I like this proposal, I think it strikes a pragmatic balance between preserving interoperability whilst still allowing new ideas to be incorporated into the standard. Thank you for writing this up. On 13/07/2023 10:22, Matt Topol wrote: I don't have much to add but I do want to second Jacob's

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Matt Topol
I don't have much to add but I do want to second Jacob's comments. I agree that this is a good way to avoid the fragmentation while keeping Arrow relevant, and likely something we need to do so that we can ensure Arrow remains the way to do this data integration and interoperability. On Wed, Jul

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-12 Thread Jacob Wujciak-Jens
Hello Everyone, Thanks for this comprehensive but concise write up Neal! I think this proposal is a good way to avoid both fragmentation of the arrow ecosystem as well as its obsolescence. In my opinion of these two problems the obsolescence is the bigger issue as (as mentioned in the proposal)

[DISCUSS] Canonical alternative layout proposal

2023-07-12 Thread Neal Richardson
Hi all, As was previously raised in [1] and surfaced again in [2], there is a proposal for representing alternative layouts. The intent, as I understand it, is to be able to support memory layouts that some (but perhaps not all) applications of Arrow find valuable, so that these nearly Arrow