Re: data-source UDFs

2022-06-04 Thread Weston Pace
> The former is about facilities (like extension points) for implementing > custom data sources in Arrow whereas the latter is about facilities for > integrating in PyArrow (existing or future) data sources written/wrapped in > Python Any C++ extension point could become a pyarrow extension poi

Re: [C++] Adding Run-Length Encoding to Arrow

2022-06-04 Thread Andrew Lamb
I think the biggest benefit of RLE is not on-the-wire compression, as that can be done via more general purpose compression schemes as Antoine mentions. The biggest benefit of RLE is that it allows operating directly and very efficiently on the "encoded" form -- for example, you can apply filters

Re: data-source UDFs

2022-06-04 Thread Yaron Gvili
Thanks for the detailed overview, Weston. I agree with David this would be very useful to have in a public doc. Weston and David's discussion is a good one, however, I see it as separate from the discussion I brought up. The former is about facilities (like extension points) for implementing cu