Thank you both for the input ! //Hinko On 17 May 2023, at 19:03, Aldrin <[email protected]> wrote:
ooh, this is cool and a great point. I wasn't thinking of the development experience with my initial response. I have used the approach I mentioned before and since I was not using the same toolchain I was having to rebuild pyarrow from source. I'll hold off on an example of that since I think Weston's suggestion is a great one (and probably something I'll try in the near future). # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene Sent with Proton Mail<https://proton.me/> secure email. ------- Original Message ------- On Wednesday, May 17th, 2023 at 06:59, Weston Pace <[email protected]> wrote: The page that Aldrin linked is possible but it requires that you use the same toolchain and version as pyarrow. I would probably advise using the C data API first. By using the C data API you don't have to couple yourself so tightly with the pyarrow build. For example, your C++ extension can pin itself to Arrow version 5 and people using pyarrow 11 will still be able to use your extension without problems. Since this question comes up fairly often I decided to create a quick minimal example of what this might look like. The example creates a C++ python module using pybind11. The C++ code relies on Arrow-C++ and interoperates with pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you can copy the C data API headers directly into your project. The example can be found at [1]. [1]: https://github.com/westonpace/arrow-cdata-example On Tue, May 16, 2023 at 9:07 AM Aldrin <[email protected]<mailto:[email protected]>> wrote: You can definitely use C++! I will see if I can find an example, but in the meantime there's also this page in the docs [1]. [1]: https://arrow.apache.org/docs/python/integration/extending.html Sent from Proton Mail for iOS On Tue, May 16, 2023 at 06:32, Hinko Kocevar <[email protected]<mailto:On+Tue,+May+16,+2023+at+06:32,+Hinko+Kocevar+%3C%3Ca+href=>> wrote: Hi, I'm trying to understand if it is possible to have a C/C++ code (homebrew code) integrated into arrow such that a user of pyArrow would be able to utilize the homebrew functions (from python script). The idea is to pass an arrow array/table (or numpy array?) to the external code, let it work on the input(s) to produce an arrow output array and return it to the user. Again, the choice of programming language for user is Python. I've noticed c data interface and c stream interface as well as user compute functions in the docs. It is not clear to me if any of those support my use case and further more how do I get to utilize that in Python once implemented in C++. For example, something like https://numpy.org/doc/stable/user/c-info.html is what I would be after. Can this be done in (py)arrow, or should I just do it in numpy ? Thank you, Hinko <publickey - [email protected] - 0x21969656.asc>
