subject:"RE\: \[DISCUSS\] Adding new columnar memory layouts to Arrow \(in\-memory, IPC, C ABI\)"

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-04-14 Thread Micah Kornfield

> > 1. Is there any reason to expect these will need to be batched into one new > version of the Arrow format? Or would we have no problem adding (as an > theoretical example) RLE arrays in format version 2, and then later string > views in version 3? These shouldn't require a major version bump i

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-04-14 Thread Will Jones

Hi all, I have a few questions to understand expectations for how work on these could proceed: 1. Is there any reason to expect these will need to be batched into one new version of the Arrow format? Or would we have no problem adding (as an theoretical example) RLE arrays in format version 2, an

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-19 Thread Jorge Cardoso Leitão

I have prototyped the sequence views in Rust [1], and it seems a pretty straightforward addition with a trivial representation in both IPC and FFI. I did observe a performance difference between using signed (int64) and unsigned (uint64) offsets/lengths: take/sequence/20time: [20.49

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-12 Thread Andrew Lamb

I also agree that splitting the StringView proposal into its own thing would be beneficial for discussion clarity On Wed, Jan 12, 2022 at 5:34 AM Antoine Pitrou wrote: > > Le 12/01/2022 à 01:49, Wes McKinney a écrit : > > hi all, > > > > Thank you for all the comments on this mailing list thread

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-12 Thread Antoine Pitrou

Le 12/01/2022 à 01:49, Wes McKinney a écrit : hi all, Thank you for all the comments on this mailing list thread and in the Google document. There is definitely a lot of work to take some next steps from here, so I think it would make sense to fork off each of the proposed additions into dedic

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-11 Thread Wes McKinney

hi all, Thank you for all the comments on this mailing list thread and in the Google document. There is definitely a lot of work to take some next steps from here, so I think it would make sense to fork off each of the proposed additions into dedicated discussions. The most contentious issue, it s

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2022-01-08 Thread Jorge Cardoso Leitão

Fair enough (wrt to deprecation). Think that the sequence view is a replacement for our existing (that allows O(N) selections), but I agree with the sentiment that preserving compatibility is more important than a single way of doing it. Thanks for that angle! Imo the Arrow format is already compo

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-26 Thread Antoine Pitrou

Le 23/12/2021 à 17:59, Neal Richardson a écrit : I think in this particular case, we should consider the C ABI / in-memory representation and IPC format as separate beasts. If an implementation of Arrow does not want to use this string-view array type at all (for example, if it created memory

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-23 Thread Andrew Lamb

> If we go forward with these changes, it would be a good opportunity for us to clarify in our docs/website that the "Arrow format" is not a single thing. The idea of using Arrow as a common memory format for interchange between C/C++ implementations makes lots of sense to me. What if we took a m

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-23 Thread Neal Richardson

> I think in this particular case, we should consider the C ABI / > in-memory representation and IPC format as separate beasts. If an > implementation of Arrow does not want to use this string-view array > type at all (for example, if it created memory safety issues in Rust), > then it can choose t

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-22 Thread Wes McKinney

hi Andrew, On Thu, Dec 16, 2021 at 2:40 PM Andrew Lamb wrote: > > > DuckDB and Velox are two projects which have designed themselves to be > > very nearly Arrow-compatible but have implemented alternative memory > > layouts to achieve O(# records) selections on all data types. I am > > proposing

RE: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-16 Thread Yang, Binwei

ld Subject: Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI) > DuckDB and Velox are two projects which have designed themselves to be > very nearly Arrow-compatible but have implemented alternative memory > layouts to achieve O(# records) selections on al

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-16 Thread Andrew Lamb

> DuckDB and Velox are two projects which have designed themselves to be > very nearly Arrow-compatible but have implemented alternative memory > layouts to achieve O(# records) selections on all data types. I am > proposing to adopt these innovations as additional memory layouts in > Arrow with a

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Wes McKinney

On Wed, Dec 15, 2021 at 6:22 PM Micah Kornfield wrote: >> >> In any case, having memory layouts that support O(# records) >> selections on strings and nested data will greatly benefit some data >> processing systems built on Arrow. > > > Wes, something that still isn't clear to me, are we proposin

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Micah Kornfield

> > In any case, having memory layouts that support O(# records) > selections on strings and nested data will greatly benefit some data > processing systems built on Arrow. Wes, something that still isn't clear to me, are we proposing these new encoding for ONLY the C-ABI or do we want to plumb t

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Wes McKinney

On Wed, Dec 15, 2021 at 3:56 PM Micah Kornfield wrote: > > > > > Big +1 in replacing our current representation of variable-sized arrays by > > the "sequence view". atm I am -0.5 in adding it without removing the > > [Large]Utf8Array / Binary / List, as I see the advantages as sufficiently > > lar

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Micah Kornfield

> > Big +1 in replacing our current representation of variable-sized arrays by > the "sequence view". atm I am -0.5 in adding it without removing the > [Large]Utf8Array / Binary / List, as I see the advantages as sufficiently > large to break compatibility and deprecate the previous representations

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Weston Pace

> I am -0.5 in adding it without removing the > [Large]Utf8Array / Binary / List I'm not sure about dropping List. Is SequenceView semantically equivalent to List / FixedSizeList? In other words, is SequenceView a nested type? The document seems to suggest it is but the use case you described d

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-15 Thread Jorge Cardoso Leitão

Hi, Thanks a lot for this initiative and the write up. I did a small bench for the sequence view and added a graph to the document for evidence of what Wes is writing wrt to performance of "selection / take / filter". Big +1 in replacing our current representation of variable-sized arrays by the

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-14 Thread Wes McKinney

Ultimately, the problem comes down to providing a means of O(# records) selection (take, filter) performance and memory use for non-numeric data (strings, arrays, maps, etc.). DuckDB and Velox are two projects which have designed themselves to be very nearly Arrow-compatible but have implemented a

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-14 Thread Weston Pace

Would it be simpler to change the spec so that child arrays can be chunked? This might reduce the data type growth and make the intent more clear. This will add another dimension to performance analysis. We pretty regularly get issues/tickets from users that have unknowingly created parquet file

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-14 Thread Wes McKinney

hi folks, A few things in the general discussion, before certain things will have to be split off into their own dedicated discussions. It seems that I didn't do a very good job of motivating the "sequence view" type. Let me take a step back and discuss one of the problems these new memory layout

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-14 Thread Antoine Pitrou

Hello, I think my main concern is how we can prevent the community from fragmenting too much over supported encodings. The more complex the encodings, the less likely they are to be supported by all main implementations. We see this in Parquet where the efficient "delta" encodings have ju

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-13 Thread Micah Kornfield

Hi Wes, I'm also in favor of most of this, I need to think more about the new list layout, and I think the RLE encoding as proposed contains redundancies with dictionary encoding data we might not want. A further question on this, do you expect all of this to be packaged up as a RecordBatch for IP

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-13 Thread Andrew Lamb

Thank you for writing this down Wes I think my project is very interested in the RLE encoding and constant view. The StringView, as written, seems fairly tightly tied to C/C++, though I may be mistaken. I think allowing Rust to consume such StringViews would be possible but it seems very unlikely

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-10 Thread Jacques Nadeau

I'm strongly in support of much of this. Thanks for bringing this up. It is long overdue. On initial read, my thoughts would be: Stongly inclined: - String view - constant view Weakly inclined - All null - rle Somewhat disinclined - Sequence change With dictionary and string view, I feel like

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

RE: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

26 matches

Site Navigation

Mail list logo

Footer information