[jira] [Created] (ARROW-8372) [C++] Add Result to table / record batch APIs

2020-04-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8372: - Summary: [C++] Add Result to table / record batch APIs Key: ARROW-8372 URL: https://issues.apache.org/jira/browse/ARROW-8372 Project: Apache Arrow Issue

Re: [DRAFT] Arrow Board Report April 2020

2020-04-08 Thread Antoine Pitrou
Parquet should be mentioned here? is the Parquet C++ implementation formally part of the Apache Arrow project) Le 08/04/2020 à 15:22, Wes McKinney a écrit : > Yes, definitely, can you propose a paragraph for the Project Activity section? > > On Wed, Apr 8, 2020 at 8:10 AM Antoine Pitr

Re: [DRAFT] Arrow Board Report April 2020

2020-04-08 Thread Antoine Pitrou
Is it worth mentioning the OSS-Fuzz integration (and "success story")? Le 08/04/2020 à 15:05, Wes McKinney a écrit : > The report is due today. Are there any more comments? > > On Sat, Apr 4, 2020 at 4:08 PM Wes McKinney wrote: >> >> ## Description: >> >> The mission of Apache Arrow is the cre

[jira] [Created] (ARROW-8370) [C++] Add Result to type / schema APIs

2020-04-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8370: - Summary: [C++] Add Result to type / schema APIs Key: ARROW-8370 URL: https://issues.apache.org/jira/browse/ARROW-8370 Project: Apache Arrow Issue Type

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 07/04/2020 à 19:39, Wes McKinney a écrit : > > Re-orienting the discussion on something more concrete, suppose that an > ArrowArray is used to convey a result set from a database query, and > suppose that the resources associated with each column in the result set > are independent of the oth

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 07/04/2020 à 18:49, Todd Lipcon a écrit : >> >> Hmm, the spec may not be clear enough on this, but if you move a child >> and release the parent, then the other children are not usable anymore. >> >> In your case, you don't call release() on every child. You just call >> release() on the pare

Re: C interface clarifications

2020-04-07 Thread Antoine Pitrou
Le 06/04/2020 à 19:22, Todd Lipcon a écrit : > > The spec should also probably cover thread-safety: if the consumer gets an > ArrowArray, is it safe to pass off the children to multiple threads and > have them call release() concurrently? In other words, do I need to use a > thread-safe referenc

[jira] [Created] (ARROW-8361) [C++] Add Result APIs to Buffer methods and functions

2020-04-07 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8361: - Summary: [C++] Add Result APIs to Buffer methods and functions Key: ARROW-8361 URL: https://issues.apache.org/jira/browse/ARROW-8361 Project: Apache Arrow

Re: C interface clarifications

2020-04-06 Thread Antoine Pitrou
Hello Todd, Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > I had a couple questions / items that should be clarified in the spec. Wes > suggested I raise them here on dev@: > > *1) Should producers expect callers to zero-init structs?* IMO, they shouldn't. They should fill the structure ex

[jira] [Created] (ARROW-8347) [C++] Add Result to Array methods

2020-04-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8347: - Summary: [C++] Add Result to Array methods Key: ARROW-8347 URL: https://issues.apache.org/jira/browse/ARROW-8347 Project: Apache Arrow Issue Type: Sub

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Also nice to have perhaps (PR available and several back-and-forths already): * ARROW-7610: [Java] Finish support for 64 bit int allocations Needs a Java committer to decide... Regards Antoine. Le 06/04/2020 à 00:24, Wes McKinney a écrit : > We are getting close to the 0.17.0 endgame. > >

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Hi, I added the following issue to the cpp-1.6.0 milestone: * PARQUET-1835 [C++] Fix crashes on invalid input (OSS-Fuzz) There's a PR up for it and it's simple enough to be validated quickly, IMHO. Regards Antoine. Le 06/04/2020 à 00:24, Wes McKinney a écrit : > We are getting close to the

Re: Preparing for 0.17.0 Arrow release

2020-04-06 Thread Antoine Pitrou
Hmm, if downstream libraries were expecting a dict, perhaps we'll need to revert that change... Regards Antoine. Le 06/04/2020 à 08:50, Joris Van den Bossche a écrit : > We also have a recent regression related to the KeyValueMetadata wrapping > python that is causing failures in downstream l

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Antoine Pitrou
Le 02/04/2020 à 20:58, Joris Van den Bossche a écrit : > > Yes, both autopep8 and black can fix up linting issues to ensure your code > passes the PEP8 checks (although autopep8 can not fix all issues > automatically). > But with autopep8 you *still* need to think about how to format your code,

Re: CPP : arrow symbols.map issue

2020-04-02 Thread Antoine Pitrou
Hi, On Thu, 2 Apr 2020 16:56:06 + Brian Bowman wrote: > A new high-performance file system we are working with returns an error while > writing a .parquet file. The following arrow symbol does not resolve > properly and the error is masked. > > libparquet.so: undefined symbol: _ZNK

Re: [Python] black vs. autopep8

2020-04-02 Thread Antoine Pitrou
t; and "void* ptr" to "void * ptr") Regards Antoine. Le 02/04/2020 à 15:30, Antoine Pitrou a écrit : > > Hello, > > I've put up two PRs to compare the effect of running black vs. autopep8 > on the Python codebase. > > * black: https://github.com

[Python] black vs. autopep8

2020-04-02 Thread Antoine Pitrou
Hello, I've put up two PRs to compare the effect of running black vs. autopep8 on the Python codebase. * black: https://github.com/apache/arrow/pull/6810 65 files changed, 7855 insertions(+), 5215 deletions(-) * autopep8: https://github.com/apache/arrow/pull/6811 20 files changed, 137 insert

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Antoine Pitrou
I have looked at the kind of reformatting used by black and I've become -1 on this. `black` is much too aggressive and actually makes the code less readable. `autopep8` seems much better and less aggressive. Let's use that instead. Regards Antoine. On Thu, 26 Mar 2020 20:37:01 +0100 Joris V

Re: Clarification regarding the `CDataInterface.rst`

2020-04-02 Thread Antoine Pitrou
case persists for six. Can you please > look into it? > > Thanks, > Anish Biswas > > On 2020/03/30 16:15:53, Antoine Pitrou wrote: >> On Mon, 30 Mar 2020 15:17:02 - >> Anish Biswas wrote: >>> Thanks! I'll probably build the Arrow Library from so

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-04-01 Thread Antoine Pitrou
gt;>>>>>>>>> >>>>>>>>>>> On Wed, Mar 4, 2020 at 11:39 PM Wes McKinney < >>>>>> wesmck...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>&

[jira] [Created] (ARROW-8298) [C++][CI] MinGW builds fail building grpc

2020-03-31 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8298: - Summary: [C++][CI] MinGW builds fail building grpc Key: ARROW-8298 URL: https://issues.apache.org/jira/browse/ARROW-8298 Project: Apache Arrow Issue Type

Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Antoine Pitrou
On Mon, 30 Mar 2020 15:17:02 - Anish Biswas wrote: > Thanks! I'll probably build the Arrow Library from source. Thanks again! You should be able to get a nightly build using: $ pip install -U --extra-index-url \ https://pypi.fury.io/arrow-nightlies/ --pre pyarrow Regards Antoine.

[jira] [Created] (ARROW-8272) [CI][Python] Test failure on Ubuntu 16.04

2020-03-30 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8272: - Summary: [CI][Python] Test failure on Ubuntu 16.04 Key: ARROW-8272 URL: https://issues.apache.org/jira/browse/ARROW-8272 Project: Apache Arrow Issue Type

Re: [VOTE] Accept "DoExchange" RPC to Arrow Flight protocol

2020-03-28 Thread Antoine Pitrou
+1 (binding) Le 28/03/2020 à 01:44, Wes McKinney a écrit : > Hello, > > David M Li has proposed adding a "bidirectional" DoExchange RPC [1] to > the Arrow Flight Protocol [2]. In this client call, datasets (possibly > having different schemas) are sent by both the > client and server in a sing

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Antoine Pitrou
I don't want to be the small minority opposing this so let's go for it. One question though: will we continue to check Cython files using flake8? Regards Antoine. On Thu, 26 Mar 2020 20:37:01 +0100 Joris Van den Bossche wrote: > Hi all, > > I would like to propose adopting Black as code for

[jira] [Created] (ARROW-8234) [CI] Build timeouts on "AMD64 Windows RTools 35"

2020-03-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8234: - Summary: [CI] Build timeouts on "AMD64 Windows RTools 35" Key: ARROW-8234 URL: https://issues.apache.org/jira/browse/ARROW-8234 Project: Ap

[jira] [Created] (ARROW-8233) [CI] Build timeouts on "AMD64 Windows MinGW 64 GLib & Ruby "

2020-03-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8233: - Summary: [CI] Build timeouts on "AMD64 Windows MinGW 64 GLib & Ruby " Key: ARROW-8233 URL: https://issues.apache.org/jira/browse/ARROW-8233 Project

[jira] [Created] (ARROW-8198) [C++] Diffing should handle null arrays

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8198: - Summary: [C++] Diffing should handle null arrays Key: ARROW-8198 URL: https://issues.apache.org/jira/browse/ARROW-8198 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-8195) [CI] Remove Boost download step in Github Actions

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8195: - Summary: [CI] Remove Boost download step in Github Actions Key: ARROW-8195 URL: https://issues.apache.org/jira/browse/ARROW-8195 Project: Apache Arrow

[jira] [Created] (ARROW-8194) [CI] Github Actions Windows job should run tests in parallel

2020-03-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8194: - Summary: [CI] Github Actions Windows job should run tests in parallel Key: ARROW-8194 URL: https://issues.apache.org/jira/browse/ARROW-8194 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-23 Thread Antoine Pitrou
Le 24/03/2020 à 00:39, Wes McKinney a écrit : > > As far as what Micah said about having a limited number of > compressors: I would be in favor of having just LZ4 and ZSTD. +1, exactly my thought as well. Regards Antoine.

Re: [C++][Compute] RFC: add SIMD support to C++ kernel

2020-03-20 Thread Antoine Pitrou
On Fri, 20 Mar 2020 10:56:51 +0800 Yibo Cai wrote: > I'm revisiting this old thread as I see some avx512 code merged recently[1]. > Code maintenance will be non-trivial if we want to cover more > hardware(sse/avx/avx512/neon/sve/...) and optimize more code in the future. > #ifdef is obviously no

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Antoine Pitrou
Le 18/03/2020 à 18:30, David Li a écrit : >> Instead of S3, you can use the Slow streams and Slow filesystem >> implementations. It may better protect against varying external conditions. > > I think we'd want several different benchmarks - we want to ensure we > don't regress local filesystem

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-18 Thread Antoine Pitrou
Le 18/03/2020 à 17:36, David Li a écrit : > Hi all, > > Thanks to Antoine for implementing the core read coalescing logic. > > We've taken a look at what else needs to be done to get this working, > and it sounds like the following changes would be worthwhile, > independent of the rest of the o

[jira] [Created] (ARROW-8146) [C++] Add per-filesystem facility to sanitize a path

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8146: - Summary: [C++] Add per-filesystem facility to sanitize a path Key: ARROW-8146 URL: https://issues.apache.org/jira/browse/ARROW-8146 Project: Apache Arrow

[jira] [Created] (ARROW-8145) [C++] Rename GetTargetInfos

2020-03-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8145: - Summary: [C++] Rename GetTargetInfos Key: ARROW-8145 URL: https://issues.apache.org/jira/browse/ARROW-8145 Project: Apache Arrow Issue Type: Wish

Re: [DISCUSS] Leveraging cloud computing resources for Arrow test workloads

2020-03-15 Thread Antoine Pitrou
Le 15/03/2020 à 04:57, Wes McKinney a écrit : > On Sat, Mar 14, 2020, 10:52 PM Micah Kornfield > wrote: > >> Hi Antoine, >> Could you clarify what you mean by: >> >>> Given our current resource utilization on Github Actions, it seems that >>> even a non-auto-scaling setup could be useful. >> >>

Re: [DISCUSS] Leveraging cloud computing resources for Arrow test workloads

2020-03-13 Thread Antoine Pitrou
Le 13/03/2020 à 01:45, Brian Hulette a écrit : > * What kind of devops tooling would be appropriate to provision and > manage the instances, scaling up and down based on need? > * What CI/CD platform would be appropriate to dispatch work to the > cloud nodes (taking into consideration the high co

Re: [DISCUSS] Semantics of custom_metadata

2020-03-11 Thread Antoine Pitrou
On Wed, 11 Mar 2020 12:44:26 -0500 Wes McKinney wrote: > On this note, in Python we should probably re-evaluate the data > structure returned when accessing the "metadata" field. I think it's ok for the convenience API to return a dict, if we also expose e.g. a "metadata_items" that returns an it

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Antoine Pitrou
Hi, Le 11/03/2020 à 06:31, Micah Kornfield a écrit : > > I still think we should be careful on what is added to the spec, in > particular, we should be focused on encodings that can be used to improve > computational efficiency rather than just smaller size. Also, it is > important to note that

[jira] [Created] (ARROW-8036) [C++] Compilation failure with gtest 1.10.0

2020-03-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8036: - Summary: [C++] Compilation failure with gtest 1.10.0 Key: ARROW-8036 URL: https://issues.apache.org/jira/browse/ARROW-8036 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8023) [Website] Write a blog post about the C data interface

2020-03-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8023: - Summary: [Website] Write a blog post about the C data interface Key: ARROW-8023 URL: https://issues.apache.org/jira/browse/ARROW-8023 Project: Apache Arrow

Re: Making a patch 0.16.1 Arrow release

2020-03-05 Thread Antoine Pitrou
On Thu, 5 Mar 2020 10:06:30 -0600 Wes McKinney wrote: > hi folks, > > There have been a number of critical issues reported (many of them > fixed already) since 0.16.0 was released. Is there interest in > preparing a patch 0.16.1 release (with backported patches onto a > maint-0.16.x branch as wit

[jira] [Created] (ARROW-8013) [Python][Packaging] Fix manylinux wheels

2020-03-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8013: - Summary: [Python][Packaging] Fix manylinux wheels Key: ARROW-8013 URL: https://issues.apache.org/jira/browse/ARROW-8013 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-8011) [C++] Some buffers not resized when reading from Parquet

2020-03-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8011: - Summary: [C++] Some buffers not resized when reading from Parquet Key: ARROW-8011 URL: https://issues.apache.org/jira/browse/ARROW-8011 Project: Apache Arrow

Re: Dedicated hardware for Arrow CI / benchmarking [was Re: CI setup on dedicated arm hardware]

2020-03-05 Thread Antoine Pitrou
It sounds like this would be a good reason to use BuildKite, which AFAIU can automatically provision and operate cloud resources for us? Le 04/03/2020 à 16:21, Wes McKinney a écrit : > hi folks, > > The tornado the night before last in Nashville, Tennessee temporarily > disabled the physical h

[jira] [Created] (ARROW-7999) [C++] Fix crash on corrupt Map array input (OSS-Fuzz)

2020-03-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7999: - Summary: [C++] Fix crash on corrupt Map array input (OSS-Fuzz) Key: ARROW-7999 URL: https://issues.apache.org/jira/browse/ARROW-7999 Project: Apache Arrow

[jira] [Created] (ARROW-7995) [C++] IO: coalescing and caching read ranges

2020-03-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7995: - Summary: [C++] IO: coalescing and caching read ranges Key: ARROW-7995 URL: https://issues.apache.org/jira/browse/ARROW-7995 Project: Apache Arrow Issue

[jira] [Created] (ARROW-7994) [CI][C++] Move AppVeyor MinGW builds to Github Actions

2020-03-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7994: - Summary: [CI][C++] Move AppVeyor MinGW builds to Github Actions Key: ARROW-7994 URL: https://issues.apache.org/jira/browse/ARROW-7994 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-03 Thread Antoine Pitrou
> > I hope we can support such highly customized compression strategies. > > Best, > Liya Fan > > > > On Tue, Mar 3, 2020 at 8:15 PM Antoine Pitrou wrote: > >> >> If we want to use a HTTP header, it would be more of a Accept-Encoding >> header,

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-03 Thread Antoine Pitrou
ething like flight it would be good to have a standard mechanism for >> negotiating server/client capabilities (e.g. client doesn't support >> compression or only supports a subset). >> >> >> Thanks, >> Micah >> >> On Sun, Mar 1, 2020 at 1:24 PM Wes

[jira] [Created] (ARROW-7982) [C++] Let ArrayDataVisitor accept void-returning functions

2020-03-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7982: - Summary: [C++] Let ArrayDataVisitor accept void-returning functions Key: ARROW-7982 URL: https://issues.apache.org/jira/browse/ARROW-7982 Project: Apache Arrow

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-01 Thread Antoine Pitrou
Le 01/03/2020 à 22:01, Wes McKinney a écrit : > In the context of a "next version of the Feather format" ARROW-5510 > (which is consumed only by Python and R at the moment), I have been > looking at compressing buffers using fast compressors like ZSTD when > writing the RecordBatch bodies. This c

Re: Crash with 0.15.1 when transposing dicts with nulls values

2020-02-29 Thread Antoine Pitrou
Hi Pierre, While the Arrow format doesn't mandate particular values under null slots, the Arrow C++ implementation should not create "undefined" values (for security reasons: failing to initialize data could lead to reveal confidential information that was previously at the same memory location)

[jira] [Created] (ARROW-7948) [Go][Integration] Decimal integration failures

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7948: - Summary: [Go][Integration] Decimal integration failures Key: ARROW-7948 URL: https://issues.apache.org/jira/browse/ARROW-7948 Project: Apache Arrow Issue

Re: Python 2.7 support removed

2020-02-26 Thread Antoine Pitrou
personally I'd > prefer the former to reduce the maintenance cost for the wheels. > > Opinions? > > Regards, Krisztian > > > On Thu, Feb 20, 2020 at 9:56 AM Antoine Pitrou wrote: >> >> >> Hi Micah, >> >> Unlike 2.7, it's not oner

[jira] [Created] (ARROW-7944) [Python] Test failures without Pandas

2020-02-26 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7944: - Summary: [Python] Test failures without Pandas Key: ARROW-7944 URL: https://issues.apache.org/jira/browse/ARROW-7944 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-7931) [C++] Fix crash on corrupt Map array input (OSS-Fuzz)

2020-02-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7931: - Summary: [C++] Fix crash on corrupt Map array input (OSS-Fuzz) Key: ARROW-7931 URL: https://issues.apache.org/jira/browse/ARROW-7931 Project: Apache Arrow

[jira] [Created] (ARROW-7930) [Python][CI] Test jpype integration in CI

2020-02-24 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7930: - Summary: [Python][CI] Test jpype integration in CI Key: ARROW-7930 URL: https://issues.apache.org/jira/browse/ARROW-7930 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-7915) [CI] [Python] Run tests with Python development mode enabled

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7915: - Summary: [CI] [Python] Run tests with Python development mode enabled Key: ARROW-7915 URL: https://issues.apache.org/jira/browse/ARROW-7915 Project: Apache Arrow

[jira] [Created] (ARROW-7911) [C++] Gandiva tests crash when compiled with clang

2020-02-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7911: - Summary: [C++] Gandiva tests crash when compiled with clang Key: ARROW-7911 URL: https://issues.apache.org/jira/browse/ARROW-7911 Project: Apache Arrow

[RESULT] [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-21 Thread Antoine Pitrou
Hello, The vote succeeds with 3 +1 (binding) and 2 +1 (non-binding). I'll soon open a JIRA for the specification and the C++ implementation, so that we can merge those timely. Regards Antoine. On Tue, 11 Feb 2020 20:06:33 +0100 Antoine Pitrou wrote: > Hello, > > We have b

Re: Integration testing

2020-02-21 Thread Antoine Pitrou
Hi, I don't think float16 support is required for 1.0. On the C++ side at least, it will require integrating a dedicated library (probably in other languages as well). Regards Antoine. Le 21/02/2020 à 00:33, Neal Richardson a écrit : > Hi all, > To help us reach 1.0 with as complete and thor

Re: Python 2.7 support removed

2020-02-20 Thread Antoine Pitrou
it until after the next release or has it become onerous to maintain? > > Thanks, > Micah > > On Wed, Feb 19, 2020 at 1:24 AM Antoine Pitrou wrote: > >> >> Hello, >> >> Following the previous discussions on this mailing-list, we have >> entirely r

Re: Using GitHub Actions to automate style and other fixes

2020-02-19 Thread Antoine Pitrou
Hi, On Wed, 19 Feb 2020 09:59:04 -0800 > > It doesn't have to be this way. With GitHub Actions, we can run workflows > that fix style and other violations and push the fix in a commit back to > the branch. I'm rather opposed to this. Doing automated pushes behind the user's back will feel con

[jira] [Created] (ARROW-7890) [C++] Add Promise / Future implementation

2020-02-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7890: - Summary: [C++] Add Promise / Future implementation Key: ARROW-7890 URL: https://issues.apache.org/jira/browse/ARROW-7890 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-7884) [C++][Python] Crash in pq.read_table()

2020-02-19 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7884: - Summary: [C++][Python] Crash in pq.read_table() Key: ARROW-7884 URL: https://issues.apache.org/jira/browse/ARROW-7884 Project: Apache Arrow Issue Type

Python 2.7 support removed

2020-02-19 Thread Antoine Pitrou
Hello, Following the previous discussions on this mailing-list, we have entirely removed Python 2.7 support from the codebase (see ARROW-5757 on JIRA). This deleted a lot of compatibility code that was spread around the C++ and Python codebases. As a reminder, Python 2.7 has stopped being supp

[jira] [Created] (ARROW-7879) [C++][Doc] Add doc for the Device API

2020-02-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7879: - Summary: [C++][Doc] Add doc for the Device API Key: ARROW-7879 URL: https://issues.apache.org/jira/browse/ARROW-7879 Project: Apache Arrow Issue Type

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-18 Thread Antoine Pitrou
loyed here but not much harm in waiting a few more days) > > >> > > >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > > >> wrote: > > >> > > > >> > +1 > > >> > > > >> &

[jira] [Created] (ARROW-7869) [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels

2020-02-17 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7869: - Summary: [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels Key: ARROW-7869 URL: https://issues.apache.org/jira/browse/ARROW-7869

Re: Schemaless serialization

2020-02-17 Thread Antoine Pitrou
Hi Tewfik, It would be good to step back a bit and explain what your data is, and what the consumer is going to do with it. Regards Antoine. On Fri, 14 Feb 2020 15:08:57 -0800 Tewfik Zeghmi wrote: > Hi Micah, > > The primary language is Python. I'm hoping the that the small overhead of >

Re: Arrow doesn't have a MapType

2020-02-13 Thread Antoine Pitrou
On Thu, 13 Feb 2020 13:58:13 +0800 Shawn Yang wrote: > Thanks Wes. I was using 0.14 before. BTW, it seems the doc for data types > didn't updated fully. I'll submit a PR for this. The PR is integrated. Thank you Shawn! Regards Antoine.

[jira] [Created] (ARROW-7847) [Web] Write a blog post about fuzzing

2020-02-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7847: - Summary: [Web] Write a blog post about fuzzing Key: ARROW-7847 URL: https://issues.apache.org/jira/browse/ARROW-7847 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-7846) [Python][Dev] Remove last dependencies on six

2020-02-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7846: - Summary: [Python][Dev] Remove last dependencies on six Key: ARROW-7846 URL: https://issues.apache.org/jira/browse/ARROW-7846 Project: Apache Arrow Issue

[jira] [Created] (ARROW-7840) [Java] [Integration] Java executables fail

2020-02-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7840: - Summary: [Java] [Integration] Java executables fail Key: ARROW-7840 URL: https://issues.apache.org/jira/browse/ARROW-7840 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-7838) [C++] Installed plasma-store-server fails finding Boost

2020-02-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7838: - Summary: [C++] Installed plasma-store-server fails finding Boost Key: ARROW-7838 URL: https://issues.apache.org/jira/browse/ARROW-7838 Project: Apache Arrow

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-11 Thread Antoine Pitrou
the right one. Which open PR > contains the specification now? > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou wrote: >> >> >> Hello, >> >> We have been discussing the creation of a minimalist C-based data >> interface for applications to exchange Arrow

[VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-11 Thread Antoine Pitrou
Hello, We have been discussing the creation of a minimalist C-based data interface for applications to exchange Arrow columnar data structures with each other. Some notable features of this interface include: * A small amount of header-only C code can be copied independently into third-party li

[jira] [Created] (ARROW-7815) [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)

2020-02-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7815: - Summary: [C++] Fix crashes on corrupt IPC input (OSS-Fuzz) Key: ARROW-7815 URL: https://issues.apache.org/jira/browse/ARROW-7815 Project: Apache Arrow

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
Le 06/02/2020 à 20:20, Wes McKinney a écrit : >> Actually, on a more high-level basis, is the goal to prefetch for >> sequential consumption of row groups? >> > > Essentially yes. One "easy" optimization is to prefetch the entire > serialized row group. This is an evolution of that idea where we

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
Le 06/02/2020 à 19:40, Antoine Pitrou a écrit : > > Le 06/02/2020 à 19:37, Wes McKinney a écrit : >> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote: >> >>> Le 06/02/2020 à 16:26, Wes McKinney a écrit : >>>> >>>> This seems useful,

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
Le 06/02/2020 à 19:37, Wes McKinney a écrit : > On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou wrote: > >> Le 06/02/2020 à 16:26, Wes McKinney a écrit : >>> >>> This seems useful, too. It becomes a question of where do you want to >>> manage the cached m

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
Le 06/02/2020 à 16:26, Wes McKinney a écrit : > > This seems useful, too. It becomes a question of where do you want to > manage the cached memory segments, however you obtain them. I'm > arguing that we should not have much custom code in the Parquet > library to manage the prefetched segments

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
a has, but that's my opinion :-) Regards Antoine. > On Thu, Feb 6, 2020 at 9:26 AM Wes McKinney wrote: >> >> On Thu, Feb 6, 2020 at 2:46 AM Antoine Pitrou wrote: >>> >>> On Wed, 5 Feb 2020 15:46:15 -0600 >>> Wes McKinney wrote: >>>> >>&

[jira] [Created] (ARROW-7785) [C++] sparse_tensor.cc is extremely slow to compile

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7785: - Summary: [C++] sparse_tensor.cc is extremely slow to compile Key: ARROW-7785 URL: https://issues.apache.org/jira/browse/ARROW-7785 Project: Apache Arrow

[jira] [Created] (ARROW-7784) [C++] diff.cc is extremely slow to compile

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7784: - Summary: [C++] diff.cc is extremely slow to compile Key: ARROW-7784 URL: https://issues.apache.org/jira/browse/ARROW-7784 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-7783) [C++] ARROW_DATASET should enable ARROW_COMPUTE

2020-02-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7783: - Summary: [C++] ARROW_DATASET should enable ARROW_COMPUTE Key: ARROW-7783 URL: https://issues.apache.org/jira/browse/ARROW-7783 Project: Apache Arrow Issue

Re: [C++] Arrow added to OSS-Fuzz

2020-02-06 Thread Antoine Pitrou
. Regards Antoine. On Wed, 15 Jan 2020 19:59:24 +0100 Antoine Pitrou wrote: > Hello, > > I would like to announce that Arrow has been accepted on the OSS-Fuzz > infrastructure (a continuous fuzzing infrastructure operated by Google): > https://github.com/google/oss-fuzz/pull/3233 >

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
On Wed, 5 Feb 2020 16:37:17 -0500 David Li wrote: > > As a separate step, prefetching/caching should also make use of a > global (or otherwise shared) IO thread pool, so that parallel reads of > different files implicitly coordinate work with each other as well. > Then, you could queue up reads o

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-06 Thread Antoine Pitrou
On Wed, 5 Feb 2020 15:46:15 -0600 Wes McKinney wrote: > > I'll comment in more detail on some of the other items in due course, > but I think this should be handled by an implementation of > RandomAccessFile (that wraps a naked RandomAccessFile) with some > additional methods, rather than adding

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-02-05 Thread Antoine Pitrou
Hi David, I think we should discuss this as individual features. > Read Coalescing: from Parquet metadata, we know exactly> which byte ranges of > a file will be read, and can “cheatin the S3 IO layer by fetching them in advance It seems there are two things here: coalescing individual reads,

[jira] [Created] (ARROW-7754) [C++] Result is slow

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7754: - Summary: [C++] Result is slow Key: ARROW-7754 URL: https://issues.apache.org/jira/browse/ARROW-7754 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-7749) [C++] Link some tests together

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7749: - Summary: [C++] Link some tests together Key: ARROW-7749 URL: https://issues.apache.org/jira/browse/ARROW-7749 Project: Apache Arrow Issue Type

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-03 Thread Antoine Pitrou
On Fri, 31 Jan 2020 04:13:12 +0100 Krisztián Szűcs wrote: > Hi, > > I would like to propose the following release candidate (RC2) of Apache > Arrow version 0.16.0. This is a release consisting of 728 > resolved JIRA issues[1]. > > This release candidate is based on commit: > 729a7689fd87572e6a14

[jira] [Created] (ARROW-7748) [C++] [Cuda] Cache CUDA contexts

2020-02-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7748: - Summary: [C++] [Cuda] Cache CUDA contexts Key: ARROW-7748 URL: https://issues.apache.org/jira/browse/ARROW-7748 Project: Apache Arrow Issue Type

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-01-31 Thread Antoine Pitrou
Fri, Jan 31, 2020 at 6:19 AM Antoine Pitrou wrote: >> >> >> On Ubuntu 18.04, the source verification is successful until the go >> step, which fails: >> https://gist.github.com/pitrou/7e089ac146197b1141585c271cb39866 >> >> Side note: the JS verificat

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-01-31 Thread Antoine Pitrou
On Ubuntu 18.04, the source verification is successful until the go step, which fails: https://gist.github.com/pitrou/7e089ac146197b1141585c271cb39866 Side note: the JS verification step should avoid spamming the terminal with tons of useless information. Regards Antoine. Le 31/01/2020 à 04:

Re: [C++] Device and MemoryManager API

2020-01-30 Thread Antoine Pitrou
Le 30/01/2020 à 17:53, Wes McKinney a écrit : > Hi Antoine, > > This sounds like a reasonable path forward -- thank you for working on > this. I'm at a conference this week, but I will review the PR and give > feedback as soon as I can, probably tomorrow (Friday). > > My main couple of initial

[jira] [Created] (ARROW-7726) [CI] [C++] Use boost binaries on Windows GHA build

2020-01-30 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7726: - Summary: [CI] [C++] Use boost binaries on Windows GHA build Key: ARROW-7726 URL: https://issues.apache.org/jira/browse/ARROW-7726 Project: Apache Arrow

<    5   6   7   8   9   10   11   12   13   14   >