Re: [Rust] DataFusion + Substrait

2022-03-07 Thread QP Hou
I am also very excited for this, especially the possibility of leveraging it in Ballista. Great work Andy! On Mon, Mar 7, 2022 at 8:31 AM Andy Grove wrote: > > I created a new repo in the datafusion-contrib GitHub org over the weekend > with a starting point for supporting DataFusion as both a

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread Yijie Shen
+1 (non-binding) verified on Windows Subsystem for Linux. Thanks, Andrew! On Tue, Mar 8, 2022 at 10:43 AM QP Hou wrote: > +1 (binding). Thanks Andrew. > > On Mon, Mar 7, 2022 at 9:17 AM Chao Sun wrote: > > > > +1 (non-binding) verified on Mac. Thanks Andrew! > > > > On Mon, Mar 7, 2022 at 7:47

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread QP Hou
+1 (binding). Thanks Andrew. On Mon, Mar 7, 2022 at 9:17 AM Chao Sun wrote: > > +1 (non-binding) verified on Mac. Thanks Andrew! > > On Mon, Mar 7, 2022 at 7:47 AM Matthew Turner > wrote: > > > > +1 (non-binding) after running release verification script on M1 Mac. > > > > Thanks, Andrew. > > >

Re: [RESULT][VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1

2022-03-07 Thread Sutou Kouhei
Hi, I noticed that we use "apache-arrow-XXX" in https://dist.apache.org/repos/dist/dev/arrow/ but we use "arrow-XXX" (no "apache-" prefix) in https://dist.apache.org/repos/dist/release/arrow/ . Does someone know why we use different naming convention for them? Thanks, -- kou In

[RESULT][VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1

2022-03-07 Thread Sutou Kouhei
Hi, The vote carries with 4 +1 binding votes. I'll publish this release to https://dist.apache.org/repos/dist/release/arrow/ . Thanks, -- kou In <20220306.212600.307671827583509226@clear-code.com> "[VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1" on Sun, 06 Mar 2022 21:26:00 +0900

Re: [VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1

2022-03-07 Thread Sutou Kouhei
Hi Neal, Thanks for verifying this RC. Julia's macOS arm64 support is still experimental: https://julialang.org/downloads/ > macOS ARM (M-series Processor) [help] 64-bit (experimental) ... > macOS 11.4+ ARMv8 (64-bit) Tier 3 ... > Tier 3: Julia may or may not build. If it does, it is >

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Will Jones
Actually I think I described it backwards. This would be to convert a data fusion push down filter into an Arrow dataset expression, using substrait as the intermediate representation. On Mon, Mar 7, 2022 at 11:52 Weston Pace wrote: > > but will likely also need a method on PyArrow compute

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Weston Pace
> but will likely also need a method on PyArrow compute expressions to convert > to a Substrait expression. There is a C++ method to do this (one of the arrow::engine::ToProto overloads takes in arrow::compute::Expression and returns substrait::Expression) but at the moment the method is internal

Re: [C++] [csv] Why do I keep getting the error - "CVS parser got out of sync with chunker"

2022-03-07 Thread HK Verma
Thanks Antoine. Yes I have newlines_in_values set to false. Other configs also look ok. However I do have rows with less number of columns than the specified numbers in convert options in column types. I have my own invalid_row_handler where I currently skip these rows. It looks like the parser is

Re: [C++] Why do I keep getting the error - "CVS parser got out of sync with chunker"

2022-03-07 Thread Antoine Pitrou
Hi HK, On Mon, 7 Mar 2022 10:16:07 -0800 HK Verma wrote: > I am integrating Arrow with another C++ library. For this, I wrote an input > stream which feeds CSV data into the streaming reader. It fails for very > large files with the error messages like - "CSV parser got out of sync with >

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Micah Kornfield
> > Relaxing from {128,256} to {32,64,128,256} seems a low risk > from an integration perspective, as implementations already need to read > the bitwidth to select the appropriate physical representation (if they > support it). I think there are two reasons for having implementations first. 1.

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Jorge Cardoso Leitão
+1 adding 32 and 64 bit decimals. +0 to release it without integration tests - both IPC and the C data interface use a variable bit width to declare the appropriate size for decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk from an integration perspective, as

[C++] Why do I keep getting the error - "CVS parser got out of sync with chunker"

2022-03-07 Thread HK Verma
I am integrating Arrow with another C++ library. For this, I wrote an input stream which feeds CSV data into the streaming reader. It fails for very large files with the error messages like - "CSV parser got out of sync with chunker". I have tried various things like - * Look at the stream to see

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Will Jones
Thanks for starting that, Andy! > I also think it could be helpful with in-memory language interoperability, > such as passing query plans between Python and Rust. Yes! I prototyped a datafusion-python and pyarrow datasets integration[1] a few weeks ago that could really benefit from this. I'll

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread Chao Sun
+1 (non-binding) verified on Mac. Thanks Andrew! On Mon, Mar 7, 2022 at 7:47 AM Matthew Turner wrote: > > +1 (non-binding) after running release verification script on M1 Mac. > > Thanks, Andrew. > > From: Andy Grove > Date: Monday, March 7, 2022 at 10:00 AM > To: dev > Subject: Re:

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Wang Xudong
Thank you! This is a great idea, I'll try to contribute some code when I have time! --- xudong Gavin Ray 于2022年3月8日周二 00:36写道: > Incredibly exciting! Following along eagerly =) > > On Mon, Mar 7, 2022 at 11:31 AM Andy Grove wrote: > > > I created a new repo in the datafusion-contrib GitHub

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Gavin Ray
Incredibly exciting! Following along eagerly =) On Mon, Mar 7, 2022 at 11:31 AM Andy Grove wrote: > I created a new repo in the datafusion-contrib GitHub org over the weekend > with a starting point for supporting DataFusion as both a producer and > consumer of Substrait plans. > >

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
Ahh got it, perfectly clear now -- thank you! On Mon, Mar 7, 2022 at 11:20 AM David Li wrote: > So "Flight" and "Flight SQL" are distinct projects. Flight defines RPC > methods, and "Flight SQL" defines higher-level methods on top of the Flight > methods. The optimization proposed is for

[Rust] DataFusion + Substrait

2022-03-07 Thread Andy Grove
I created a new repo in the datafusion-contrib GitHub org over the weekend with a starting point for supporting DataFusion as both a producer and consumer of Substrait plans. https://github.com/datafusion-contrib/datafusion-substrait I am hopeful that we can eventually use Substrait in Ballista

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread David Li
So "Flight" and "Flight SQL" are distinct projects. Flight defines RPC methods, and "Flight SQL" defines higher-level methods on top of the Flight methods. The optimization proposed is for Flight. Once/if that gets accepted and implemented, Flight SQL servers could then use it to optimize

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
Sure, will use that JIRA issue for whatever thoughts/feedback =) On that note, filed the above bug here: https://issues.apache.org/jira/browse/ARROW-15861 About the "two-step" thing, I guess what I mean is code like this where you make the initial op, then get the stream: val catalogs:

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread Matthew Turner
+1 (non-binding) after running release verification script on M1 Mac. Thanks, Andrew. From: Andy Grove Date: Monday, March 7, 2022 at 10:00 AM To: dev Subject: Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1 +1 (binding) Verified on Ubuntu 20.04.3 LTS On Mon, Mar 7, 2022 at 6:52 AM Kun

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread David Li
(responses inline) On Mon, Mar 7, 2022, at 10:37, Gavin Ray wrote: >> >> Another contributor is currently working on some Java >> tutorials/documentation so any feedback would be helpful. > > > Ah, yeah this would be incredibly useful. Will compile some thoughts, where > should I share them? >

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
> > Another contributor is currently working on some Java > tutorials/documentation so any feedback would be helpful. Ah, yeah this would be incredibly useful. Will compile some thoughts, where should I share them? Didn't know about the Cookbook, definitely going to be tonight's reading! Ah, I

Re: [FlightSQL] "flightsql-kotlin" submodule for Kotlin protobuf/gRPC codegen?

2022-03-07 Thread Gavin Ray
> Is there a problem with generating those inside your own project? No no, not at all -- just wasn't sure if it was something that would be useful enough to be upstream. Sounds like probably not, I will just add the gRPC/Protobuf plugin to my gradle build On Mon, Mar 7, 2022 at 9:57 AM David Li

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread Andy Grove
+1 (binding) Verified on Ubuntu 20.04.3 LTS On Mon, Mar 7, 2022 at 6:52 AM Kun Liu wrote: > I have tested it in the mac and got "Release candidate looks good!" > message. > The ut passed in my mac. > > +1 non-binding. > > Thanks, > Kun > > R > > Wang Xudong 于2022年3月5日周六 22:00写道: > > > +1

Re: [FlightSQL] "flightsql-kotlin" submodule for Kotlin protobuf/gRPC codegen?

2022-03-07 Thread David Li
This would be just the generated Protobuf sources but with a Kotlin API? Is there a problem with generating those inside your own project? (At least in C++ we also try to hide the Protobuf messages, I suppose we can't quite do that in Java easily.) On Mon, Mar 7, 2022, at 09:13, Gavin Ray

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread David Li
Cool - if you have API questions, feel free to send them here or u...@arrow.apache.org. Another contributor is currently working on some Java tutorials/documentation so any feedback would be helpful. There's also some basic recipes here: https://github.com/apache/arrow-cookbook/ Ah, I suppose

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
Ah brilliant! Yeah, Websockets (or anything that's a basic transport and doesn't require a language-specific SDK) would be fantastic. In my case, streaming wouldn't be a requirement, at least not for some time (more of a nice-to-have). It'd be mostly OLTP-style workloads, with small response

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread David Li
No worries about questions, it's always good to see how people are using Arrow. For tunneling Flight/gRPC over HTTP: this has been a long-standing question. I believe some people have had success with one of the various gRPC-HTTP proxies. In particular, I recall Deephaven has done this

Re: [VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1

2022-03-07 Thread Neal Richardson
I failed to verify this on macOS 11.6 arm64. 3 integration tests crashed with "ArgumentError: unsafe_wrap: pointer 0x13b640ef8 is not properly aligned to 16 bytes". I don't know enough to know whether this is a problem with the integration test setup (and thus probably not release blocking) or

[FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
Due to the current implementation status of FlightSQL (C++/Rust/JVM only) I am trying to see whether it's possible to allow FlightSQL over something like HTTP/REST so that arbitrary languages can be used. In the codebase, I saw these (and their deserialize counterparts): /// \brief Get the

[FlightSQL] "flightsql-kotlin" submodule for Kotlin protobuf/gRPC codegen?

2022-03-07 Thread Gavin Ray
I'm curious whether folks think it would be reasonable to upstream an optional Kotlin submodule that uses the Kotlin code generator for FlightSQL? Or would this be better off as a personal repository? The Rust FlightSQL API is a fair bit nicer due to the syntax. The Kotlin Protobuf plugin

Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread Kun Liu
I have tested it in the mac and got "Release candidate looks good!" message. The ut passed in my mac. +1 non-binding. Thanks, Kun R Wang Xudong 于2022年3月5日周六 22:00写道: > +1 non-binding > > Test on macOS, "Release candidate looks good!" > Thank you alamb! > > --- > xudong > > > > Andrew Lamb

Re: [VOTE][Julia] Release Apache Arrow Julia 2.2.1 RC1

2022-03-07 Thread Andrew Lamb
+1 (binding) Mac OS Intel 12.0.1 I checked the signatures and shasums manually and ran ./dev/release/verify_rc.sh 2.2.1 1 ``` * Testing* Arrow tests passed + popd /var/folders/s3/h5hgj43j0bv83shtmz_t_w40gn/T/arrow-julia-2.2.1-1.X.PMczTpdj + VERIFY_SUCCESS=yes + echo 'RC looks

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-07 Thread Antoine Pitrou
Le 03/03/2022 à 18:05, Micah Kornfield a écrit : I think this makes sense to add these. Typically when adding new types, we've waited on the official vote until there are two reference implementations demonstrating compatibility. You are right, I had forgotten about that. Though in this