There are some existing test cases we use for validating integration tests in Arrow Java [1]. Those test cases use `jpype` and use the C Data interface in Java. As long as we stick to the defined data structures in Apache Arrow, these interfaces can be used to move Arrow data from Python to Java or Java to Python.
[1] https://github.com/apache/arrow/blob/main/java/c/src/test/python/integration_tests.py With Regards, Vibhatha Abeykoon On Fri, Jun 21, 2024 at 8:39 AM Zhang Manwei <[email protected]> wrote: > Hi Abeykoon, > > Thank you for your reply. It really gives me some new thoughts about > arrow. > > The changes I made about pyarrow.jvm works only in my case indeed, since I > was using pemja instead of jpype. They treat data differently. I'm > currently thinking of creating C data interface functions that can do the > same thing to avoid changing pyarrow.jvm. But it's kind of hard for me to > find a way that can directly turn Java arrow data(like root, record > batches, or other structures) to Python arrow data with C code, which > pyarrow.jvm can. > > I have read a lot about Java C Data interface and how PyArrow can > Integrate with Java, but I haven't found a solution to my question above. > Do you have any ideas or suggestions? > > Thanks again for your time and reading. > ------------------------------ > *发件人:* Vibhatha Abeykoon <[email protected]> > *发送时间:* 2024年6月20日 8:46 > *收件人:* [email protected] <[email protected]> > *主题:* Re: [Java][Python] How to pass arrow data from Java to Python using > C data Interface > > Hi Zhang, > > I think you're on the correct track, but I wouldn't recommend a change to > pyarrow.jvm without discussing it on the dev ML. > One point, rather than passing a `VectorSchemaRoot` object directly to > Python wouldn't it be good to stick to the record batches? > > The Java C Data interface already has functions for that. The > VectorSchemaRoot is not a concept in Python, so it would be better > to reconstruct the Table or Dictionary back in the PyArrow way. Just a > thought. > > > On Fri, Mar 22, 2024 at 9:14 AM Zhang Manwei <[email protected]> > wrote: > > update on myself: > > I have been trying and I discovered two methods to achieve this goal: call > python from java and transfer arrow data to python. I use pemja as it > enables java to call python methods in-process and python to call back java. > > So here is the code:https://github.com/shinyano/arrow-java-python-example, > test code is written in src/test. Here are two methods I use: > > > 1. **use ArrowArray**: basically I use `_import_from_c` and > `_export_to_c` in pyarrow just like official examples in my original mail. > But it will be java calling python not python calling java. > 2. **use record_batch()**: python can use java object's function just > like in java with the help of pemja. So I'm able to pass java > VectorSchemaRoot object directly to python, and do a simple > `jvm.record_batch(root)` to get record batch from it. > > > However, as pemja do a auto-type-casting when python callbacks java, I > have to make some minor code changes in pyarrow.jvm. > ------------------------------ > *发件人:* Zhang Manwei <[email protected]> > *发送时间:* 2024年3月18日 11:21 > *收件人:* [email protected] <[email protected]> > *主题:* [Java][Python] How to pass arrow data from Java to Python using C > data Interface > > Hi, I'm tring to find a way to transfer arrow data between Java and Python > without memory copying, disk file writing and socket. As plasma has been > removed I'm looking for a resolution in C data interface. > > I went through examples here( > https://arrow.apache.org/docs/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface) > in arrow doc, but I can't figure out how can I create schema and data from > Java side then provide it to python. > > I was thinking letting python provide a pointer to a writable > stream/memory buffer to Java, or write data into buffer in Java then pass > the address to python. But I don't know whether it's possible or not. > > Please let me know your opinions, many thanks! > >
