Re: [Python/C++] The ABI compatibility of pyarrow cython/c++ api

2021-09-01 Thread Shawn Yang
ear future). I would recommend pinning the > pyarrow version you depend on and bumping the pin when new major > versions are released. > > On Wed, Sep 1, 2021 at 5:50 AM Shawn Yang wrote: > > > > I built a multi-language library based on arrow and the python > > implemen

[Python/C++] The ABI compatibility of pyarrow cython/c++ api

2021-09-01 Thread Shawn Yang
I built a multi-language library based on arrow and the python implementation used arrow cython and c++ api. I want my library to be compatible with multiple pyarrow versions. But the dynamic library in my python implementation depended on `libarrow.xxx.dylib` and for every pyarrow version the `lib

[jira] [Created] (ARROW-7848) Add doc for MapType

2020-02-13 Thread Shawn Yang (Jira)
Shawn Yang created ARROW-7848: - Summary: Add doc for MapType Key: ARROW-7848 URL: https://issues.apache.org/jira/browse/ARROW-7848 Project: Apache Arrow Issue Type: Bug Components

Re: Arrow doesn't have a MapType

2020-02-12 Thread Shawn Yang
https://github.com/apache/arrow/commit/e0c1ffe9c38d1759f1b5311f95864b0e2a406c51 > > On Wed, Feb 12, 2020 at 5:12 AM Shawn Yang > wrote: > > > > Thanks François, I didn't find it in pyarrow. I'll check again. > > > > On Fri, Feb 7, 2020 at 9:18 PM Francois

Re: Arrow doesn't have a MapType

2020-02-12 Thread Shawn Yang
4952a0af6/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L36-L47 > > On Fri, Feb 7, 2020 at 3:55 AM Shawn Yang wrote: > > > > Hi guys, > > I'm writing an cross-language row-oriented serialization framework mainly > > for java/python for

Arrow doesn't have a MapType

2020-02-07 Thread Shawn Yang
Hi guys, I'm writing an cross-language row-oriented serialization framework mainly for java/python for now. I detained many data types and schema, field, such as Byte, short, int, long, double, float, map, array, struct,. But then I find using Arrow schema is a better choice. Since my framework nee

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-05-09 Thread Shawn Yang
in-memory record format" that does not require a schema compilation > step (like Thrift and Protocol Buffers do). So it might first be > worthwhile to analyze whether Avro is a solution to the problem, and > if not why exactly not. > > - Wes > > On Tue, Apr 30, 2019 at 1:36 AM S

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-29 Thread Shawn Yang
> profiling of your benchmark script so we see where the time is spent. > This > > can be done conveniently with yep (https://github.com/fabianp/yep). > > Running > > it through the profiler and posting the image here would be a good > starting > > point,

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-25 Thread Shawn Yang
way that Ray is using UnionArray now) in a more efficient way. > > I would still like to understand in these particular benchmarks where > the performance issue is, whether in a flamegraph or something else. > Is data being copied that should not be? > > On Thu, Apr 25, 2019 at 6

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-25 Thread Shawn Yang
y case, I don't think it's reasonable to expect that there exists a > framework to magically transfer and (most importantly) convert arbitrary > data from Python to Java. > > Regards > > Antoine. > > > Le 25/04/2019 à 13:56, Shawn Yang a écrit : > > Hi Anto

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-25 Thread Shawn Yang
5 with out-of-band buffers > in your benchmark. See https://pypi.org/project/pickle5/ > > Regards > > Antoine. > > > Le 25/04/2019 à 11:23, Shawn Yang a écrit : > > Hi Antoine, > > Here are the images: > > 1. use |UnionArray| benchmark: > > > h

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-25 Thread Shawn Yang
and Numpy support it. You may > get different pickle performance using it, especially on large data. > > (*) https://www.python.org/dev/peps/pep-0574/ > > Regards > > Antoine. > > > Le 25/04/2019 à 05:19, Shawn Yang a écrit : > > > > Motivate > > &g

Use arrow as a general data serialization framework in distributed stream data processing

2019-04-24 Thread Shawn Yang
Motivate We want to use arrow as a general data serialization framework in distributed stream data processing. We are working on ray , written in c++ in low-level and java/python in high-level. We want to transfer streaming data between java/python/c++ efficient