Re: [RESULT][VOTE] Release Apache Arrow 8.0.1 - RC0

2022-07-27 Thread Sutou Kouhei
Hi, I didn't upload release notes and documentation for 6.0.2, 7.0.1 and 8.0.1 because they are irregular releases for us: * We don't recommend users expect Go users to use 6.0.2, 7.0.1 and 8.0.1. Reasons: * There are no differences with 6.0.1&6.0.2, 7.0.0&7.0.1, 8.0.0&8.0.1 except the

Re: [DISCUSS] Disable dependabot automated PRs

2022-07-27 Thread Sutou Kouhei
Hi, Why do we ignore PRs from dependabot? Generally, dependabot is useful to avoid security vulnerability. Thanks, -- kou In "[DISCUSS] Disable dependabot automated PRs" on Thu, 21 Jul 2022 15:35:57 +0200, Raul Cumplido Dominguez wrote: > Hi, > > There was a discussion on Zulip dev

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Jorge Cardoso Leitão
Hi Laurent, I agree that there is a common pattern in converting row-based formats to Arrow. Imho the difficult part is not to map the storage format to Arrow specifically - it is to map the storage format to any in-memory (row- or columnar- based) format, since it requires in-depth knowledge

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Laurent Quérel
Let me clarify the proposal a bit before replying to the various previous feedbacks. It seems to me that the process of converting a row-oriented data source (row = set of fields or something more hierarchical) into an Arrow record repeatedly raises the same challenges. A developer who must

Re: Arrow Flight usage with graph databases

2022-07-27 Thread Matthew Topol
Yea, the drawback you'll find there is that you can't effectively stream record batches as they are available with that setup as you wait for all of the results before converting to an Arrow table. The result is higher memory usage necessary for larger result sets and your time to the first byte

RE: Arrow Flight usage with graph databases

2022-07-27 Thread Lee, David
Correct more or less.. It is Arrow Flight Native end to end. The GraphQL query is a string (saved as a Flight Ticket) that is sent from a client using Arrow Flight RPC. The GraphQL query is executed on the GraphQL flight server that produces python record objects (JSON structured records).

Re: Arrow Flight usage with graph databases

2022-07-27 Thread Matthew Topol
So this is sightly different than what I was doing and spoke about. As far as I can tell from your links, you are evaluating the graphql using that graphql server and then converting the JSON response into arrow format (correct me if I'm wrong please). What I did was to hook into a graphql parser

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Wes McKinney
We had an e-mail thread about this in 2018 https://lists.apache.org/thread/35pn7s8yzxozqmgx53ympxg63vjvggvm I still think having a canonical in-memory row format (and libraries to transform to and from Arrow columnar format) is a good idea — but there is the risk of ending up in the tar pit of

RE: Arrow Flight usage with graph databases

2022-07-27 Thread Lee, David
I'm working on something similar for Ariadne which is a python graphql server package. https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_arrow_flight_server.py https://github.com/davlee1972/ariadne_arrow/blob/arrow_flight/benchmark/test_asgi_arrow_client.py I'm

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Micah Kornfield
Are there more details on what exactly an "Arrow Intermediate Representation (AIR)" is? We've talked about in the past maybe having a memory layout specification for row-based data as well as column based data. There was also a recent attempt at least in C++ to try to build utilities to do these

RE: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Lee, David
I think this has been addressed for both Parquet and Python to handle records including nested structures. Not sure about Rust and Go.. [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels https://issues.apache.org/jira/browse/ARROW-1644 [Python] Add

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-27 Thread Fiona La
Hi all, @Neal, I have also already assigned myself to ARROW-15691, for updating archery to work with a default branch named ‘master’ or ‘main’, and will prioritize this task. Thank you, Fiona From: Andrew Lamb Date: Tuesday, July 26, 2022 at 10:19 AM To: dev Subject: Re: Proposal: renaming

Is there any plan to develop 'utf8_slice_byteunits' like existed 'utf8_slice_codeunits'

2022-07-27 Thread 陈青
Dears: In some situations, exists the following requirement, get the binary array slice in byte unit, like start from 0 , end to 10, step 1, it will take 10 bytes till now , only found in code unit, is there any plan to develop it ? thanks