Re: [Format] Semantics for dictionary batches in streams

2019-09-09 Thread Micah Kornfield
Yes, I opened a JIRA, I'm going to try to make a proposal that consolidates all the recent dictionary discussions. On Mon, Sep 9, 2019 at 12:21 PM Wes McKinney wrote: > hi Micah, > > I think we should formulate changes to format/Columnar.rst and have a > vote, what do you think? > > On Thu, Aug

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-09 Thread Bryan Cutler
Sounds good to me also and I don't think we need a vote either. On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield wrote: > +1 on this, I also don't think a vote is necessary as long as we make the > change before 0.15.0 > > On Saturday, September 7, 2019, Wes McKinney wrote: > > > I see, thank

[Discuss] [Java] DateMilliVector.getObject() return type (LocalDateTime vs LocalDate)

2019-09-09 Thread Micah Kornfield
Yongbo Zhang, Opened up a pull request to have DateMilliVector return a LocalDate instead of a LocalDateTime object. Do people have opinions if this breaking change is worth the correctness? Thanks, Micah [1] https://github.com/apache/arrow/pull/5315 On Sat, Sep 7, 2019 at 4:14 PM Yongbo Zhang

[jira] [Created] (ARROW-6504) [Python][Packaging] Add mimalloc to Windows conda packages for better performance

2019-09-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6504: --- Summary: [Python][Packaging] Add mimalloc to Windows conda packages for better performance Key: ARROW-6504 URL: https://issues.apache.org/jira/browse/ARROW-6504

Re: Plasma scenarios

2019-09-09 Thread Sutou Kouhei
If we build the GLib-based library with MSVC, it doesn't require MSYS nor Cygwin. It just requires MSVC. In "RE: Plasma scenarios" on Mon, 9 Sep 2019 22:05:26 +, Eric Erhardt wrote: > I don't think the C# bindings would use the Glib-based libraries on Windows > if it requires

[jira] [Created] (ARROW-6503) [C++] Add an argument of memory pool object to SparseTensorConverter

2019-09-09 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-6503: --- Summary: [C++] Add an argument of memory pool object to SparseTensorConverter Key: ARROW-6503 URL: https://issues.apache.org/jira/browse/ARROW-6503 Project: Apache

[jira] [Created] (ARROW-6502) [GLib][CI] MinGW failure in CI

2019-09-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6502: --- Summary: [GLib][CI] MinGW failure in CI Key: ARROW-6502 URL: https://issues.apache.org/jira/browse/ARROW-6502 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6501) [Format][C++] Remove non_zero_length field from SparseIndex

2019-09-09 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-6501: --- Summary: [Format][C++] Remove non_zero_length field from SparseIndex Key: ARROW-6501 URL: https://issues.apache.org/jira/browse/ARROW-6501 Project: Apache Arrow

Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Wes McKinney
I'm referring to the arrow-devel and parquet-devel packages, which are C++ packages. If you built the R library (using install.package) against version 0.14.0 and then upgraded arrow-devel to 0.14.1 without rebuilding the R library, you could have this issue. I would recommend reinstalling the R

Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Daniel Feenberg
On Mon, 9 Sep 2019, Wes McKinney wrote: I'm a bit confused by the error message " Error in write_parquet_file(to_arrow(table), file) : Arrow error: IOError: Metadata contains Thrift LogicalType that is not recognized. " This error comes from

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-09-09 Thread David Li
I'm happy to start a new thread to focus on DoPut specifically. Middleware for Java has been in review. Best, David On 9/9/19, Wes McKinney wrote: > Ah, I think I'm referring to the format change around DoPut, for which > there is not a PR yet. Sorry for my confusion > > Do we want to start a

[jira] [Created] (ARROW-6500) [Java] How to use RootAllocator in a low memory setting?

2019-09-09 Thread Andong Zhan (Jira)
Andong Zhan created ARROW-6500: -- Summary: [Java] How to use RootAllocator in a low memory setting? Key: ARROW-6500 URL: https://issues.apache.org/jira/browse/ARROW-6500 Project: Apache Arrow

RE: Plasma scenarios

2019-09-09 Thread Eric Erhardt
I don't think the C# bindings would use the Glib-based libraries on Windows if it requires installing MSYS2 or Cygwin on the end-user's Windows machine. So don't go through the work building the Glib-based libraries with MSVC on account of the C# library. -Original Message- From: Sutou

[jira] [Created] (ARROW-6499) [C++] Add support for bundled Boost with MSVC

2019-09-09 Thread Sutou Kouhei (Jira)
Sutou Kouhei created ARROW-6499: --- Summary: [C++] Add support for bundled Boost with MSVC Key: ARROW-6499 URL: https://issues.apache.org/jira/browse/ARROW-6499 Project: Apache Arrow Issue Type:

Re: Plasma scenarios

2019-09-09 Thread Sutou Kouhei
Hi, > In theory you could use the GLib-based library with MSVC, the main > requirement is gobject-introspection > > https://github.com/GNOME/gobject-introspection/blob/master/MSVC.README.rst Generally, we can use the GLib-based library without GObject Introspection if we write bindings by hand.

Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Wes McKinney
I'm a bit confused by the error message " Error in write_parquet_file(to_arrow(table), file) : Arrow error: IOError: Metadata contains Thrift LogicalType that is not recognized. " This error comes from https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455 This

Re: Plasma scenarios

2019-09-09 Thread Sutou Kouhei
Hi, > I know Plasma today is not supported on Windows, but I think support could be > added since Windows supports memory mapped files (through a different API > than mmap) and it now supports Unix Domain Sockets [1]. > ... > [1]

Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Daniel Feenberg
On Mon, 9 Sep 2019, Neal Richardson wrote: Hi Daniel, This works on my machine: library(arrow) write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= "string.parquet") read_parquet("string.parquet") y 1 a 2 b 3 c (The function masking warnings are all from

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-09-09 Thread Wes McKinney
Ah, I think I'm referring to the format change around DoPut, for which there is not a PR yet. Sorry for my confusion Do we want to start a separate discussion thread about that? https://docs.google.com/document/d/1hrwxNwPU1aOD_1ciRUOaGeUCyXYOmu6IxxCfY6Stj6w/edit?usp=sharing On Mon, Sep 9, 2019

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-09-09 Thread Antoine Pitrou
Isn't a middleware an implementation-specific concern? Does it need a formal vote? Regards Antoine. Le 09/09/2019 à 22:49, Wes McKinney a écrit : > It seems like there is positive feedback on the PR. Do we want to have > a vote about this? > > On Mon, Aug 12, 2019 at 7:54 AM David Li

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-09-09 Thread Wes McKinney
It seems like there is positive feedback on the PR. Do we want to have a vote about this? On Mon, Aug 12, 2019 at 7:54 AM David Li wrote: > > I've (finally) put up a draft implementation of middleware for Java: > https://github.com/apache/arrow/pull/5068 > > Hopefully this helps clarify how the

[jira] [Created] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness

2019-09-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6498: --- Summary: [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness Key: ARROW-6498 URL: https://issues.apache.org/jira/browse/ARROW-6498

Re: [Format] Semantics for dictionary batches in streams

2019-09-09 Thread Wes McKinney
hi Micah, I think we should formulate changes to format/Columnar.rst and have a vote, what do you think? On Thu, Aug 29, 2019 at 2:23 AM Micah Kornfield wrote: >> >> >> > I was thinking the file format must satisfy one of two conditions: >> > 1. Exactly one dictionarybatch per encoded column

[jira] [Created] (ARROW-6497) [Website] On change to master branch, automatically make PR to asf-site

2019-09-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6497: -- Summary: [Website] On change to master branch, automatically make PR to asf-site Key: ARROW-6497 URL: https://issues.apache.org/jira/browse/ARROW-6497 Project:

[jira] [Created] (ARROW-6496) [Python] Fix ARROW_ORC=ON build in Python wheels on macOS

2019-09-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6496: --- Summary: [Python] Fix ARROW_ORC=ON build in Python wheels on macOS Key: ARROW-6496 URL: https://issues.apache.org/jira/browse/ARROW-6496 Project: Apache Arrow

Re: Plasma scenarios

2019-09-09 Thread Wes McKinney
hi Eric, On Fri, Sep 6, 2019 at 5:09 PM Eric Erhardt wrote: > > I was looking for the high level scenarios for the Plasma In-Memory Object > Store. A colleague of mine suggested we could use it to pass data between a > C# process and a Python process. > > I've read the intro blog [0] on

Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Neal Richardson
Hi Daniel, This works on my machine: > library(arrow) > write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= > "string.parquet") > read_parquet("string.parquet") y 1 a 2 b 3 c > (The function masking warnings are all from library(tidyverse) and aren't relevant here.)

[jira] [Created] (ARROW-6495) [Plasma] Use xxh3 for object hashing

2019-09-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6495: - Summary: [Plasma] Use xxh3 for object hashing Key: ARROW-6495 URL: https://issues.apache.org/jira/browse/ARROW-6495 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6494) [C++] Implement basic PartitionScheme

2019-09-09 Thread Benjamin Kietzman (Jira)
Benjamin Kietzman created ARROW-6494: Summary: [C++] Implement basic PartitionScheme Key: ARROW-6494 URL: https://issues.apache.org/jira/browse/ARROW-6494 Project: Apache Arrow Issue

[jira] [Created] (ARROW-6492) [Python] file written with latest fastparquet cannot be read with latest pyarrow

2019-09-09 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6492: Summary: [Python] file written with latest fastparquet cannot be read with latest pyarrow Key: ARROW-6492 URL: https://issues.apache.org/jira/browse/ARROW-6492

[jira] [Created] (ARROW-6491) [Java] fix master build failure caused by ErrorProne

2019-09-09 Thread Pindikura Ravindra (Jira)
Pindikura Ravindra created ARROW-6491: - Summary: [Java] fix master build failure caused by ErrorProne Key: ARROW-6491 URL: https://issues.apache.org/jira/browse/ARROW-6491 Project: Apache Arrow

[jira] [Created] (ARROW-6490) [Java] log error for leak in allocator close

2019-09-09 Thread Pindikura Ravindra (Jira)
Pindikura Ravindra created ARROW-6490: - Summary: [Java] log error for leak in allocator close Key: ARROW-6490 URL: https://issues.apache.org/jira/browse/ARROW-6490 Project: Apache Arrow

Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson

2019-09-09 Thread Joris Van den Bossche
Congratulations! On Sat, 7 Sep 2019 at 20:54, Rok Mihevc wrote: > Congrats all! > > On Sat, Sep 7, 2019 at 5:02 AM Bryan Cutler wrote: > > > Congrats Ben, Kenta and Neal! > > > > On Fri, Sep 6, 2019, 12:15 PM Krisztián Szűcs > > > wrote: > > > > > Congratulations! > > > > > > On Fri, Sep 6,