Re: RE: [Go] expose ability to write arrow.Table to JSON

2021-04-23 Thread Francois Saint-Jacques
You can either use the provided server facility found in flight [1], or use stream directly via ipc [2]. You can look at the tests on how to use both facilities. François [1] https://github.com/apache/arrow/tree/master/go/arrow/flight [2] https://github.com/apache/arrow/tree/master/go/arrow/ipc

Re: [VOTE] Accept donation of Rust Ballista project

2021-03-23 Thread Francois Saint-Jacques
+1 On Mon, Mar 22, 2021 at 8:33 AM Andrew Lamb wrote: > > +1 > > On Sun, Mar 21, 2021 at 7:08 PM paddy horan wrote: > > > +1 (non-binding) > > > > > > > > From: Sutou Kouhei > > Sent: Sunday, March 21, 2021 4:34:43 PM > > To: dev@arrow.apache.org > > Subject: R

Re: [C++] 0x00 in Binary type

2020-11-18 Thread Francois Saint-Jacques
I would say at first sight that it's due to your usage of char[] and builder.Append(d) implicitly does a strlen. François On Wed, Nov 18, 2020 at 2:00 PM Ying Zhou wrote: > > Sure! > > BinaryBuilder builder; > char d[] = "\x00\x01\xbf\x5b”; > (void)(builder.Append(d)); > std::shared_ptr array; >

Re: Ursabot Benchmark framework for other languages

2020-08-27 Thread Francois Saint-Jacques
Hello Kazuaki! I recommend you read and take a look at the benchmark sub-library [1] of archery and how it's glued [2]. You will need to implement: - A runner for the framework you intend to use [3] and [4], it also implies capturing the output into a class that implements the "Benchmark" interfa

Re: [VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-30 Thread Francois Saint-Jacques
+1 (binding) On Tue, Jun 30, 2020 at 10:55 AM Neal Richardson wrote: > > +1 (binding) > > On Tue, Jun 30, 2020 at 2:52 AM Antoine Pitrou wrote: > > > > > +1 (binding) > > > > Le 29/06/2020 à 23:59, Wes McKinney a écrit : > > > Hi, > > > > > > As discussed on the mailing list [1], it has been pro

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Francois Saint-Jacques
OTOH, how do we handle NullType -> UnionType cast conversion? Do we require some convention like the first children ArrayData null bitmap to be set and all tags set to 0? François On Wed, Jun 24, 2020 at 1:09 PM Antoine Pitrou wrote: > > > Le 24/06/2020 à 18:34, Wes McKinney a écrit : > > On We

Re: [VOTE] Add Decimal::bitWidth field to Schema.fbs for forward compatibility

2020-06-24 Thread Francois Saint-Jacques
+1 (binding)

Re: Feather v2 random access

2020-06-24 Thread Francois Saint-Jacques
AM Francois Saint-Jacques wrote: > > Hello Yue, > > FeatherV2 is just a facade for the Arrow IPC file format. You can find > the implementation here [1]. I will try to answer your question with > inline comments. On a high level, the file format writes a schema and > then mul

Re: Feather v2 random access

2020-06-24 Thread Francois Saint-Jacques
Hello Yue, FeatherV2 is just a facade for the Arrow IPC file format. You can find the implementation here [1]. I will try to answer your question with inline comments. On a high level, the file format writes a schema and then multiple "chunks" called RecordBatch. Your lowest level of granularity

Re: Generate random arrow table

2020-06-23 Thread Francois Saint-Jacques
If you configured CMake to build tests (-DARROW_BUILD_TESTS=ON) and install locally, there should be a `libarrow_testing.so` that you need to link against. What I meant is that this library is _not_ part of pip/conda/dpkg/rpm. François

Re: Generate random arrow table

2020-06-23 Thread Francois Saint-Jacques
> something like RandomTableGenerator before implementing myself one > > > using RandomArrayGenerator. > > > > > > On Mon, Jun 22, 2020 at 4:49 PM Francois Saint-Jacques > > > wrote: > > > > > > > > Hello, > > > > > > > > We

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Francois Saint-Jacques
We should aim to improve the performance of the most widely used *default* packages, which are python pip, python conda and R (all platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose (haven't checked) that clang is

Re: Generate random arrow table

2020-06-22 Thread Francois Saint-Jacques
Hello, We use this extensively in unit tests, see [1] François [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/testing/random.h On Mon, Jun 22, 2020 at 9:51 AM Kirill Lykov wrote: > > Hi, > > I wonder if there is existing C++ code which allows to generate a > random arrow table by

Re: Using gdb on a test

2020-06-15 Thread Francois Saint-Jacques
As Antoine said, debug mode is probably the most important configuration. You can also try the `relwithdebinfo` if you're trying to debug the optimized code. I'd also add the following: 1. Building out of conda provides a much better integration with gdb and the system's libstdc++ due to the prett

[jira] [Created] (ARROW-9108) [C++][Dataset] Add Parquet Statistics conversion for timestamp columns

2020-06-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9108: - Summary: [C++][Dataset] Add Parquet Statistics conversion for timestamp columns Key: ARROW-9108 URL: https://issues.apache.org/jira/browse/ARROW-9108

[jira] [Created] (ARROW-9107) [C++][Dataset] Time-based types support

2020-06-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9107: - Summary: [C++][Dataset] Time-based types support Key: ARROW-9107 URL: https://issues.apache.org/jira/browse/ARROW-9107 Project: Apache Arrow

[jira] [Created] (ARROW-9068) [C++][Dataset] Simplify Partitioning interface

2020-06-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9068: - Summary: [C++][Dataset] Simplify Partitioning interface Key: ARROW-9068 URL: https://issues.apache.org/jira/browse/ARROW-9068 Project: Apache Arrow

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-04 Thread Francois Saint-Jacques
I documented [1] the behaviors by experimentation or by reading the documentation. My experiments were mostly about checking INT64_MAX + 1. My preference would be to use the platform defined behavior by default and provide a safety option that errors. Feel free to add more databases/systems. Fra

[jira] [Created] (ARROW-9028) [R] Should be able to convert an empty table

2020-06-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9028: - Summary: [R] Should be able to convert an empty table Key: ARROW-9028 URL: https://issues.apache.org/jira/browse/ARROW-9028 Project: Apache Arrow

[jira] [Created] (ARROW-8997) [Archery] Benchmark formatter should have friendly units

2020-06-01 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8997: - Summary: [Archery] Benchmark formatter should have friendly units Key: ARROW-8997 URL: https://issues.apache.org/jira/browse/ARROW-8997 Project

[jira] [Created] (ARROW-8986) [Archery][ursabot] Fix benchmark diff checkout of origin/master

2020-05-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8986: - Summary: [Archery][ursabot] Fix benchmark diff checkout of origin/master Key: ARROW-8986 URL: https://issues.apache.org/jira/browse/ARROW-8986

[jira] [Created] (ARROW-8890) [R] Fix C++ lint issue

2020-05-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8890: - Summary: [R] Fix C++ lint issue Key: ARROW-8890 URL: https://issues.apache.org/jira/browse/ARROW-8890 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-8884) [C++] Listing files with S3FileSystem is slow

2020-05-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8884: - Summary: [C++] Listing files with S3FileSystem is slow Key: ARROW-8884 URL: https://issues.apache.org/jira/browse/ARROW-8884 Project: Apache Arrow

[jira] [Created] (ARROW-8874) [C++][Dataset] Scanner::ToTable race when ScanTask exit early with an error

2020-05-20 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8874: - Summary: [C++][Dataset] Scanner::ToTable race when ScanTask exit early with an error Key: ARROW-8874 URL: https://issues.apache.org/jira/browse/ARROW-8874

Re: [VOTE] Release Apache Arrow 0.17.1 - RC1

2020-05-15 Thread Francois Saint-Jacques
+1 binding, verified sources and binaries locally (no exclusions). On Fri, May 15, 2020 at 10:38 AM Neal Richardson wrote: > > +1 (binding) > > Verification here: https://github.com/apache/arrow/pull/7170 > > Still haven't worked out the Windows source verification job, but > everything else look

Re: [DISCUSS] Need for Arrow 0.17.1 patch release (binary only?)

2020-05-07 Thread Francois Saint-Jacques
I'll add https://issues.apache.org/jira/browse/ARROW-8726 to the list. On Tue, May 5, 2020 at 6:52 PM Wes McKinney wrote: > > Sorry I haven't had enough coffee today. > > The patches that still need to be resolved AFAICT are ARROW-8684 and > ARROW-8706 (AKA PARQUET-1857), so it will take a little

[jira] [Created] (ARROW-8720) [C++] Fix checked_pointer_cast

2020-05-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8720: - Summary: [C++] Fix checked_pointer_cast Key: ARROW-8720 URL: https://issues.apache.org/jira/browse/ARROW-8720 Project: Apache Arrow Issue

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-04-30 Thread Francois Saint-Jacques
stems since on linux it can call `readahead` and/or `madvise`. François On Thu, Apr 30, 2020 at 8:56 AM Francois Saint-Jacques wrote: > > Hello David, > > I think that what you ask is achievable with the dataset API without > much effort. You'd have to insert the pre-bufferi

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-04-30 Thread Francois Saint-Jacques
Hello David, I think that what you ask is achievable with the dataset API without much effort. You'd have to insert the pre-buffering at ParquetFileFormat::ScanFile [1]. The top-level Scanner::Scan method is essentially a generator that looks like flatmap(Iterator>). It consumes the fragment in-or

[jira] [Created] (ARROW-8604) [R] Windows compilation failure

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8604: - Summary: [R] Windows compilation failure Key: ARROW-8604 URL: https://issues.apache.org/jira/browse/ARROW-8604 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8603: - Summary: [Documentation] Fix Sphinx doxygen comment Key: ARROW-8603 URL: https://issues.apache.org/jira/browse/ARROW-8603 Project: Apache Arrow

[jira] [Created] (ARROW-8602) [CMake] Fix ws2_32 link issue when cross-compiling on Linux

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8602: - Summary: [CMake] Fix ws2_32 link issue when cross-compiling on Linux Key: ARROW-8602 URL: https://issues.apache.org/jira/browse/ARROW-8602 Project

[jira] [Created] (ARROW-8601) [Go][Flight] Implement Flight Writer interface

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8601: - Summary: [Go][Flight] Implement Flight Writer interface Key: ARROW-8601 URL: https://issues.apache.org/jira/browse/ARROW-8601 Project: Apache Arrow

Re: [VOTE] Add "trivial" RecordBatch body compression to Arrow IPC protocol

2020-04-24 Thread Francois Saint-Jacques
+1 (binding) On Fri, Apr 24, 2020 at 5:41 AM Krisztián Szűcs wrote: > > +1 (binding) > > On 2020. Apr 24., Fri at 1:51, Micah Kornfield > wrote: > > > +1 (binding) > > > > On Thu, Apr 23, 2020 at 2:35 PM Sutou Kouhei wrote: > > > > > +1 (binding) > > > > > > In > > > "[VOTE] Add "trivial" Re

Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-17 Thread Francois Saint-Jacques
+1 (binding) Verified all sources locally on Ubuntu 18.04 (including Javascript). Verified the binaries, wheels verification matches the one found in https://github.com/apache/arrow/pull/6961 François On Fri, Apr 17, 2020 at 8:12 AM Antoine Pitrou wrote: > > > Hi, > > I tested the sources on Ub

[jira] [Created] (ARROW-8497) [Archery] Add missing component to builds

2020-04-17 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8497: - Summary: [Archery] Add missing component to builds Key: ARROW-8497 URL: https://issues.apache.org/jira/browse/ARROW-8497 Project: Apache Arrow

[jira] [Created] (ARROW-8488) [R] Replace VALUE_OR_STOP with ValueOrStop

2020-04-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8488: - Summary: [R] Replace VALUE_OR_STOP with ValueOrStop Key: ARROW-8488 URL: https://issues.apache.org/jira/browse/ARROW-8488 Project: Apache Arrow

[jira] [Created] (ARROW-8448) [Package] Can't build apt packages with ubuntu-focal

2020-04-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8448: - Summary: [Package] Can't build apt packages with ubuntu-focal Key: ARROW-8448 URL: https://issues.apache.org/jira/browse/ARROW-8448 Project: A

[jira] [Created] (ARROW-8447) [C++][Dataset] Ensure Scanner::ToTable preserve ordering

2020-04-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8447: - Summary: [C++][Dataset] Ensure Scanner::ToTable preserve ordering Key: ARROW-8447 URL: https://issues.apache.org/jira/browse/ARROW-8447 Project

[jira] [Created] (ARROW-8382) [C++][Dataset] Refactor WritePlan to decouple from Fragment/Scan/Partition classes

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8382: - Summary: [C++][Dataset] Refactor WritePlan to decouple from Fragment/Scan/Partition classes Key: ARROW-8382 URL: https://issues.apache.org/jira/browse/ARROW

[jira] [Created] (ARROW-8381) [C++][Dataset] Dataset writing should require a writer schema

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8381: - Summary: [C++][Dataset] Dataset writing should require a writer schema Key: ARROW-8381 URL: https://issues.apache.org/jira/browse/ARROW-8381

[jira] [Created] (ARROW-8374) [R] Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-04-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8374: - Summary: [R] Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array Key: ARROW-8374 URL: https://issues.apach

[jira] [Created] (ARROW-8354) [C++][R] Segfault in test-dataset.r

2020-04-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8354: - Summary: [C++][R] Segfault in test-dataset.r Key: ARROW-8354 URL: https://issues.apache.org/jira/browse/ARROW-8354 Project: Apache Arrow

Re: Attn: Wes, Re: Masked Arrays

2020-04-06 Thread Francois Saint-Jacques
It does make sense, I would go a little further and make this field/property a single value of the same type than the array. This would allow using any arbitrary sentinel value for unknown values (0 in your suggested case). The end result is zero-copy for R bindings (if stars are aligned). I create

[jira] [Created] (ARROW-8348) [C++] Support optional sentinel values in primitive Array for nulls

2020-04-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8348: - Summary: [C++] Support optional sentinel values in primitive Array for nulls Key: ARROW-8348 URL: https://issues.apache.org/jira/browse/ARROW-8348

[jira] [Created] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment

2020-04-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8318: - Summary: [C++][Dataset] Dataset should instantiate Fragment Key: ARROW-8318 URL: https://issues.apache.org/jira/browse/ARROW-8318 Project: Apache

Re: Join operation on attributes from arrow structs

2020-04-02 Thread Francois Saint-Jacques
They're mapped with the StructType/StructArray, which is also columnar representation, e.g. one buffer per field in the sub-object. If you have varying/incompatible types, a field will be promoted to a UnionType. François On Thu, Apr 2, 2020 at 12:54 AM Micah Kornfield wrote: > > Hi Hasara, > Th

[DISCUSS] Field reference ambiguity

2020-03-13 Thread Francois Saint-Jacques
Hello, the recent dataset and compute work has forced us to think about schema projection. One problem that surfaced is referencing fields in nested schemas and/or schemas where duplicate column names exists. We currently have (C++) APIs that either pass a vector or a vector to represent fields su

[jira] [Created] (ARROW-8065) [C++][Dataset] Untangle Dataset, Fragment and ScanOptions

2020-03-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8065: - Summary: [C++][Dataset] Untangle Dataset, Fragment and ScanOptions Key: ARROW-8065 URL: https://issues.apache.org/jira/browse/ARROW-8065 Project

[jira] [Created] (ARROW-7964) [C++] Add short representation string to common classes

2020-02-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7964: - Summary: [C++] Add short representation string to common classes Key: ARROW-7964 URL: https://issues.apache.org/jira/browse/ARROW-7964 Project

[jira] [Created] (ARROW-7917) [CMake] FindPythonInterp should check for python3

2020-02-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7917: - Summary: [CMake] FindPythonInterp should check for python3 Key: ARROW-7917 URL: https://issues.apache.org/jira/browse/ARROW-7917 Project: Apache

[jira] [Created] (ARROW-7878) [C++] Implement LogicalPlan and LogicalPlanBuilder

2020-02-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7878: - Summary: [C++] Implement LogicalPlan and LogicalPlanBuilder Key: ARROW-7878 URL: https://issues.apache.org/jira/browse/ARROW-7878 Project: Apache

[jira] [Created] (ARROW-7861) [C++][Parquet] Add fuzz regression corpus for parquet reader

2020-02-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7861: - Summary: [C++][Parquet] Add fuzz regression corpus for parquet reader Key: ARROW-7861 URL: https://issues.apache.org/jira/browse/ARROW-7861 Project

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-13 Thread Francois Saint-Jacques
+1 On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > +1 (binding) > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > > > +1 (binding) > > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > > > > Ah, you're right, it's PR 6040: > > > https://github.com/apache/arrow/p

[jira] [Created] (ARROW-7821) [Gandiva] Add support for literal variables

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7821: - Summary: [Gandiva] Add support for literal variables Key: ARROW-7821 URL: https://issues.apache.org/jira/browse/ARROW-7821 Project: Apache Arrow

[jira] [Created] (ARROW-7820) [C++][Gandiva] Add CMake support for compiling LLVM's IR into a library

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7820: - Summary: [C++][Gandiva] Add CMake support for compiling LLVM's IR into a library Key: ARROW-7820 URL: https://issues.apache.org/jira/browse/ARROW

[jira] [Created] (ARROW-7819) [C++][Gandiva] Implement gandiva-dump-ir tool to output llvm IR to a file

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7819: - Summary: [C++][Gandiva] Implement gandiva-dump-ir tool to output llvm IR to a file Key: ARROW-7819 URL: https://issues.apache.org/jira/browse/ARROW-7819

[jira] [Created] (ARROW-7818) [C++][Gandiva] Generate Filter kernels from gandiva code at compile time

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7818: - Summary: [C++][Gandiva] Generate Filter kernels from gandiva code at compile time Key: ARROW-7818 URL: https://issues.apache.org/jira/browse/ARROW-7818

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Francois Saint-Jacques
; > > > > >>>> >> > > > > > > [1] > > > >>>> >> https://bintray.com/apache/arrow/python-rc/0.16.0-rc2#files > > > >>>> >> > > > > > > [2] > > > >>>> >> > > > > > > > > >>>> >>

Re: Arrow Datasets Functionality for Python

2020-02-10 Thread Francois Saint-Jacques
Hello Matthew, The dplyr binding is just syntactic sugar on top of the dataset API. There's no analytics capabilities yet [1], other than the select and the limited projection supported by the dataset API. It looks like it is doing analytics due to properly placed `collect()` calls, which converts

[jira] [Created] (ARROW-7798) [R] Refactor vector to Array conversion

2020-02-07 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7798: - Summary: [R] Refactor vector to Array conversion Key: ARROW-7798 URL: https://issues.apache.org/jira/browse/ARROW-7798 Project: Apache Arrow

Re: Arrow doesn't have a MapType

2020-02-07 Thread Francois Saint-Jacques
Arrow does have a Map type [1][2][3]. It is represented as a list of pairs. François [1] https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/format/Schema.fbs#L60-L87 [2] https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/cpp/src/arrow/type.h

[jira] [Created] (ARROW-7767) [C++] Add a facility to create a Bitmap buffer from an data pointer with a specified sentinel

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7767: - Summary: [C++] Add a facility to create a Bitmap buffer from an data pointer with a specified sentinel Key: ARROW-7767 URL: https://issues.apache.org/jira

[jira] [Created] (ARROW-7765) [C++] Add Result to the Visitor pattern

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7765: - Summary: [C++] Add Result to the Visitor pattern Key: ARROW-7765 URL: https://issues.apache.org/jira/browse/ARROW-7765 Project: Apache Arrow

[jira] [Created] (ARROW-7764) [C++] Builders allocate a null bitmap buffer even if there is no nulls

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7764: - Summary: [C++] Builders allocate a null bitmap buffer even if there is no nulls Key: ARROW-7764 URL: https://issues.apache.org/jira/browse/ARROW-7764

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-04-0

2020-02-04 Thread Francois Saint-Jacques
This is a first! On Tue, Feb 4, 2020 at 8:47 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-02-04-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-04-0 > > Succeeded Tasks: > - centos-6: > URL: > https://github.com/ursa-labs/crossbo

[jira] [Created] (ARROW-7761) [C++] Add S3 support to fs::FileSystemFromUri

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7761: - Summary: [C++] Add S3 support to fs::FileSystemFromUri Key: ARROW-7761 URL: https://issues.apache.org/jira/browse/ARROW-7761 Project: Apache Arrow

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-03 Thread Francois Saint-Jacques
Tested on ubuntu 18.04 for the source release. On Mon, Feb 3, 2020 at 10:07 PM Francois Saint-Jacques wrote: > > +1 > > Binaries verification didn't have any issues. > Sources verification worked with some local environment hiccups > > François > > On Mon, F

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-03 Thread Francois Saint-Jacques
+1 Binaries verification didn't have any issues. Sources verification worked with some local environment hiccups François On Mon, Feb 3, 2020 at 8:46 PM Andy Grove wrote: > > +1 (binding) based on running the Rust tests > > Thanks. > > On Thu, Jan 30, 2020 at 8:13 PM Krisztián Szűcs > wrote: >

[jira] [Created] (ARROW-7759) [C++][Dataset] Add CsvFileFormat for CSV support

2020-02-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7759: - Summary: [C++][Dataset] Add CsvFileFormat for CSV support Key: ARROW-7759 URL: https://issues.apache.org/jira/browse/ARROW-7759 Project: Apache

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-03-0

2020-02-03 Thread Francois Saint-Jacques
The debian buster failure seems to be a network issue with github upload, we'll see tomorrow. The gandiva-jar will be gone in the next nightly (https://github.com/apache/arrow/pull/6342). On Mon, Feb 3, 2020 at 8:48 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-02-03-0 > > All

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-02-0

2020-02-03 Thread Francois Saint-Jacques
Whelp, gmail didn't help with the thread folding. I'll just approve Krisz' patch :). On Mon, Feb 3, 2020 at 8:22 AM Francois Saint-Jacques wrote: > > Opened https://github.com/apache/arrow/pull/6342 to silence the OSX jar issue. > > On Sun, Feb 2, 2020

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-02-0

2020-02-03 Thread Francois Saint-Jacques
Opened https://github.com/apache/arrow/pull/6342 to silence the OSX jar issue. On Sun, Feb 2, 2020 at 8:31 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-02-02-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-02-0 > > Failed Tasks: > -

[jira] [Created] (ARROW-7673) [C++][Dataset] Revisit File discovery failure mode

2020-01-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7673: - Summary: [C++][Dataset] Revisit File discovery failure mode Key: ARROW-7673 URL: https://issues.apache.org/jira/browse/ARROW-7673 Project: Apache

Re: [Format] Array/RowBatch filters

2020-01-24 Thread Francois Saint-Jacques
By filter, you mean a filter expression, or a selection vector/bitmap? On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield wrote: > > One of the things that I think got overlooked in the conversation on having > a slice offset in the C API was a suggestion from Jacques of perhaps > generalizing the

Re: [DISCUSS] Format additions for encoding/compression

2020-01-23 Thread Francois Saint-Jacques
What's the point of having zero copy if the OS is doing the decompression in kernel (which trumps the zero-copy argument)? You might as well just use parquet without filesystem compression. I prefer to have compression algorithm where the columnar engine can benefit from it [1] than marginally impr

[jira] [Created] (ARROW-7653) [C++][Dataset] Handle DictType index mismatch better

2020-01-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7653: - Summary: [C++][Dataset] Handle DictType index mismatch better Key: ARROW-7653 URL: https://issues.apache.org/jira/browse/ARROW-7653 Project: Apache

[jira] [Created] (ARROW-7602) [Archery] Add more build options

2020-01-17 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7602: - Summary: [Archery] Add more build options Key: ARROW-7602 URL: https://issues.apache.org/jira/browse/ARROW-7602 Project: Apache Arrow

Re: Human-readable version of Arrow Schema?

2020-01-09 Thread Francois Saint-Jacques
The desired goal for this feature is trivial modifications, e.g. within an editor, by data-scientists and researchers. I'd go for the flatbuffer's json representation as it is stable and has native support in almost any language or editor due to the ubiquity of JSON. The C interface schema string

[jira] [Created] (ARROW-7523) [Tools] Ignore modernize-use-trailing-return-type clang-tidy check

2020-01-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7523: - Summary: [Tools] Ignore modernize-use-trailing-return-type clang-tidy check Key: ARROW-7523 URL: https://issues.apache.org/jira/browse/ARROW-7523

[jira] [Created] (ARROW-7498) [C++][Dataset] Rename DataFragment/DataSource/PartitionScheme

2020-01-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7498: - Summary: [C++][Dataset] Rename DataFragment/DataSource/PartitionScheme Key: ARROW-7498 URL: https://issues.apache.org/jira/browse/ARROW-7498

Re: [DISCUSS][C++] Pointer name aliasing

2019-12-19 Thread Francois Saint-Jacques
nk we can probably take an incremental approach of: > 1. Eliminate *Ptr in src/arrow code (discuss similar changes in > parquet/gandiva). > 2. Decide on the Iterator/Vector. > > On Fri, Nov 22, 2019 at 10:47 AM Wes McKinney wrote: > > > hi Francois > > > > On Fri, No

[jira] [Created] (ARROW-7441) [C++] Remove compute pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7441: - Summary: [C++] Remove compute pointer aliases Key: ARROW-7441 URL: https://issues.apache.org/jira/browse/ARROW-7441 Project: Apache Arrow

[jira] [Created] (ARROW-7439) [C++][Dataset] Remove dataset pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7439: - Summary: [C++][Dataset] Remove dataset pointer aliases Key: ARROW-7439 URL: https://issues.apache.org/jira/browse/ARROW-7439 Project: Apache Arrow

[jira] [Created] (ARROW-7440) [C++][Gandiva] Remove gandiva pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7440: - Summary: [C++][Gandiva] Remove gandiva pointer aliases Key: ARROW-7440 URL: https://issues.apache.org/jira/browse/ARROW-7440 Project: Apache Arrow

[jira] [Created] (ARROW-7438) [C++] Remove pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7438: - Summary: [C++] Remove pointer aliases Key: ARROW-7438 URL: https://issues.apache.org/jira/browse/ARROW-7438 Project: Apache Arrow Issue

[jira] [Created] (ARROW-7436) [Archery] Fix benchmark default configuration

2019-12-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7436: - Summary: [Archery] Fix benchmark default configuration Key: ARROW-7436 URL: https://issues.apache.org/jira/browse/ARROW-7436 Project: Apache Arrow

[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project

2019-12-13 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7390: - Summary: [C++][Dataset] Concurrency race in Projector::Project Key: ARROW-7390 URL: https://issues.apache.org/jira/browse/ARROW-7390 Project

[jira] [Created] (ARROW-7380) [C++][Dataset] Implement DatasetDiscovery

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7380: - Summary: [C++][Dataset] Implement DatasetDiscovery Key: ARROW-7380 URL: https://issues.apache.org/jira/browse/ARROW-7380 Project: Apache Arrow

[jira] [Created] (ARROW-7379) [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7379: - Summary: [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith Key: ARROW-7379 URL: https://issues.apache.org/jira/browse/ARROW-7379

[jira] [Created] (ARROW-7377) [C++][Dataset] Simplify parquet column projection

2019-12-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7377: - Summary: [C++][Dataset] Simplify parquet column projection Key: ARROW-7377 URL: https://issues.apache.org/jira/browse/ARROW-7377 Project: Apache

Re: [Gandiva] question about IR optimization

2019-12-11 Thread Francois Saint-Jacques
Missing [1] link. [1] https://godbolt.org/z/S8tixP On Wed, Dec 11, 2019 at 12:58 PM Francois Saint-Jacques wrote: > > So, llvm _can_ auto-vectorize, I was just missing the `-mtripple` > option [1]. That still requires to hoist the buffer juggling. > > François > > On Wed,

Re: [Gandiva] question about IR optimization

2019-12-11 Thread Francois Saint-Jacques
functionality. > > On Wed, Dec 11, 2019 at 10:06 PM Francois Saint-Jacques < > fsaintjacq...@gmail.com> wrote: > > > It seems that LLVM can't auto vectorize. I don't have a debug build, > > so I can't get the `-debug-only` information from llvm-opt/opt ab

Re: Arrow sync call December 11 at 12:00 US/Eastern, 17:00 UTC

2019-12-11 Thread Francois Saint-Jacques
Attendees: - Antoine Pitrou, Ursa Labs/RStudio - Francois Saint-Jaques, Ursa Labs/RStudio - Ravindra Pindikura, Dremio - Neville Dipale - Rok Mihevc Subjects: - Arrow 1.0 release: - Neville has been working on the Rust IPC bindings (https://github.com/apache/arrow/pull/6013) - Antoine is worki

Re: [Gandiva] question about IR optimization

2019-12-11 Thread Francois Saint-Jacques
It seems that LLVM can't auto vectorize. I don't have a debug build, so I can't get the `-debug-only` information from llvm-opt/opt about why it can't vectorize. The buffer address mangling should be hoisted out of the loop (still doesn't enable auto vectorization) [1]. The buffer juggling should b

[jira] [Created] (ARROW-7360) [R] Can't use dataset's filter with non-literal expression

2019-12-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7360: - Summary: [R] Can't use dataset's filter with non-literal expression Key: ARROW-7360 URL: https://issues.apache.org/jira/browse/ARROW-7360

Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Francois Saint-Jacques
Bravo! On Mon, Dec 9, 2019 at 6:55 AM Wes McKinney wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Joris has > accepted an invitation to become a committer on Apache Arrow. > > Welcome, and thank you for your contributions!

[jira] [Created] (ARROW-7339) [CMake] Thrift version not respected in CMake configuration version.txt

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7339: - Summary: [CMake] Thrift version not respected in CMake configuration version.txt Key: ARROW-7339 URL: https://issues.apache.org/jira/browse/ARROW-7339

[jira] [Created] (ARROW-7338) [C++] Rename SimpleDataSource to InMemoryDataSource

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7338: - Summary: [C++] Rename SimpleDataSource to InMemoryDataSource Key: ARROW-7338 URL: https://issues.apache.org/jira/browse/ARROW-7338 Project: Apache

Re: Datasets and Java

2019-11-27 Thread Francois Saint-Jacques
Hello Hongze, The C++ implementation of dataset, notably Dataset, DataSource, DataSourceDiscovery, and Scanner classes are not ready/designed for distributed computing. They don't serialize and they reference by pointer all around, thus I highly doubt that you can implement parts in Java, and some

  1   2   3   >