Hi Abeykoon, Thank you for your reply. It really gives me some new thoughts about arrow.
The changes I made about pyarrow.jvm works only in my case indeed, since I was using pemja instead of jpype. They treat data differently. I'm currently thinking of creating C data interface functions that can do the same thing to avoid changing pyarrow.jvm. But it's kind of hard for me to find a way that can directly turn Java arrow data(like root, record batches, or other structures) to Python arrow data with C code, which pyarrow.jvm can. I have read a lot about Java C Data interface and how PyArrow can Integrate with Java, but I haven't found a solution to my question above. Do you have any ideas or suggestions? Thanks again for your time and reading. ________________________________ 发件人: Vibhatha Abeykoon <[email protected]> 发送时间: 2024年6月20日 8:46 收件人: [email protected] <[email protected]> 主题: Re: [Java][Python] How to pass arrow data from Java to Python using C data Interface Hi Zhang, I think you're on the correct track, but I wouldn't recommend a change to pyarrow.jvm without discussing it on the dev ML. One point, rather than passing a `VectorSchemaRoot` object directly to Python wouldn't it be good to stick to the record batches? The Java C Data interface already has functions for that. The VectorSchemaRoot is not a concept in Python, so it would be better to reconstruct the Table or Dictionary back in the PyArrow way. Just a thought. On Fri, Mar 22, 2024 at 9:14 AM Zhang Manwei <[email protected]<mailto:[email protected]>> wrote: update on myself: I have been trying and I discovered two methods to achieve this goal: call python from java and transfer arrow data to python. I use pemja as it enables java to call python methods in-process and python to call back java. So here is the code:https://github.com/shinyano/arrow-java-python-example, test code is written in src/test. Here are two methods I use: 1. **use ArrowArray**: basically I use `_import_from_c` and `_export_to_c` in pyarrow just like official examples in my original mail. But it will be java calling python not python calling java. 2. **use record_batch()**: python can use java object's function just like in java with the help of pemja. So I'm able to pass java VectorSchemaRoot object directly to python, and do a simple `jvm.record_batch(root)` to get record batch from it. However, as pemja do a auto-type-casting when python callbacks java, I have to make some minor code changes in pyarrow.jvm. ________________________________ 发件人: Zhang Manwei <[email protected]<mailto:[email protected]>> 发送时间: 2024年3月18日 11:21 收件人: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> 主题: [Java][Python] How to pass arrow data from Java to Python using C data Interface Hi, I'm tring to find a way to transfer arrow data between Java and Python without memory copying, disk file writing and socket. As plasma has been removed I'm looking for a resolution in C data interface. I went through examples here(https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface) in arrow doc, but I can't figure out how can I create schema and data from Java side then provide it to python. I was thinking letting python provide a pointer to a writable stream/memory buffer to Java, or write data into buffer in Java then pass the address to python. But I don't know whether it's possible or not. Please let me know your opinions, many thanks!
