Hi Abeykoon,

Thank you for your reply. It really gives me some new thoughts about arrow.

The changes I made about pyarrow.jvm works only in my case indeed, since I was 
using pemja instead of jpype. They treat data differently. I'm currently 
thinking of creating C data interface functions that can do the same thing to 
avoid changing pyarrow.jvm. But it's kind of hard for me to find a way that can 
directly turn Java arrow data(like root, record batches, or other structures) 
to Python arrow data with C code, which pyarrow.jvm can.

I have read a lot about Java C Data interface and how PyArrow can Integrate 
with Java, but I haven't found a solution to my question above. Do you have any 
ideas or suggestions?

Thanks again for your time and reading.
________________________________
发件人: Vibhatha Abeykoon <[email protected]>
发送时间: 2024年6月20日 8:46
收件人: [email protected] <[email protected]>
主题: Re: [Java][Python] How to pass arrow data from Java to Python using C data 
Interface

Hi Zhang,

I think you're on the correct track, but I wouldn't recommend a change to 
pyarrow.jvm without discussing it on the dev ML.
One point, rather than passing a `VectorSchemaRoot` object directly to Python 
wouldn't it be good to stick to the record batches?

The Java C Data interface already has functions for that. The VectorSchemaRoot 
is not a concept in Python, so it would be better
to reconstruct the Table or Dictionary back in the PyArrow way. Just a thought.


On Fri, Mar 22, 2024 at 9:14 AM Zhang Manwei 
<[email protected]<mailto:[email protected]>> wrote:
update on myself:

I have been trying and I discovered two methods to achieve this goal: call 
python from java and transfer arrow data to python. I use pemja as it enables 
java to call python methods in-process and python to call back java.

So here is the code:https://github.com/shinyano/arrow-java-python-example, test 
code is written in src/test. Here are two methods I use:


  1.
​**use ArrowArray**: basically I use `_import_from_c` and `_export_to_c` in 
pyarrow just like official examples in my original mail. But it will be java 
calling python not python calling java.
  2.
​**use record_batch()**: python can use java object's function just like in 
java with the help of pemja. So I'm able to pass java VectorSchemaRoot object 
directly to python, and do a simple `jvm.record_batch(root)` to get record 
batch from it.

However, as pemja do a auto-type-casting when python callbacks java, I have to 
make some minor code changes in pyarrow.jvm.
________________________________
发件人: Zhang Manwei <[email protected]<mailto:[email protected]>>
发送时间: 2024年3月18日 11:21
收件人: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
主题: [Java][Python] How to pass arrow data from Java to Python using C data 
Interface

Hi, I'm tring to find a way to transfer arrow data between Java and Python 
without memory copying, disk file writing and socket. As plasma has been 
removed I'm looking for a resolution in C data interface.

I went through examples 
here(https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface)
 in arrow doc, but I can't figure out how can I create schema and data from 
Java side then provide it to python.

I was thinking letting python provide a pointer to a writable stream/memory 
buffer to Java, or write data into buffer in Java then pass the address to 
python. But I don't know whether it's possible or not.

Please let me know your opinions, many thanks!

Reply via email to