[jira] [Created] (ARROW-8132) [C++] arrow-s3fs-test failing on master

2020-03-16 Thread Hatem Helal (Jira)
Hatem Helal created ARROW-8132: -- Summary: [C++] arrow-s3fs-test failing on master Key: ARROW-8132 URL: https://issues.apache.org/jira/browse/ARROW-8132 Project: Apache Arrow Issue Type

[DISCUSS] Apache Arrow manylinux1 support

2019-08-16 Thread Hatem Helal
Hi all, I ran into a surprising (to me) limitation when working on an issue [1]. To summarize, supporting the manylinux1 standard ties Arrow development to gcc 4.8.x which is technically not C++11 complete. This brought on few questions for me: * What are the pre-conditions for dropping many

Re: [C++][Python] Direct Arrow DictionaryArray reads from Parquet files

2019-08-05 Thread Hatem Helal
Thanks for sharing this very illustrative benchmark. Really nice to see the huge benefit for languages that have a type for modelling categorical data. I'm interested in whether we can make the parquet/arrow integration automatically handle the round-trip for Arrow DictionaryArrays. We've had

[jira] [Created] (ARROW-6096) [C++] Remove dependency on boost regex library

2019-08-01 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-6096: -- Summary: [C++] Remove dependency on boost regex library Key: ARROW-6096 URL: https://issues.apache.org/jira/browse/ARROW-6096 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-6061) [C++] Cannot build libarrow without rapidjson

2019-07-29 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-6061: -- Summary: [C++] Cannot build libarrow without rapidjson Key: ARROW-6061 URL: https://issues.apache.org/jira/browse/ARROW-6061 Project: Apache Arrow Issue Type

Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-12 Thread Hatem Helal
Thanks François, I closed PARQUET-1623 this morning. It would be nice to include the PR in the patch release: https://github.com/apache/arrow/pull/4857 This bug has been around for a few releases but I think it should be a low risk change to include. Hatem On 7/12/19, 2:27 AM, "Francois Sa

[jira] [Created] (ARROW-5676) [CI] hadolint failing on r/Dockerfile causing Travis "Lint, Release tests" failure

2019-06-21 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5676: -- Summary: [CI] hadolint failing on r/Dockerfile causing Travis "Lint, Release tests" failure Key: ARROW-5676 URL: https://issues.apache.org/jira/browse/ARROW-5676

[jira] [Created] (ARROW-5675) [Doc] Fix typo in documentation describing compile/debug workflow on macOS with Xcode IDE

2019-06-21 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5675: -- Summary: [Doc] Fix typo in documentation describing compile/debug workflow on macOS with Xcode IDE Key: ARROW-5675 URL: https://issues.apache.org/jira/browse/ARROW-5675

[jira] [Created] (ARROW-5638) [C++] cmake fails to generate Xcode project when Gandiva JNI bindings are enabled

2019-06-18 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5638: -- Summary: [C++] cmake fails to generate Xcode project when Gandiva JNI bindings are enabled Key: ARROW-5638 URL: https://issues.apache.org/jira/browse/ARROW-5638 Project

[jira] [Created] (ARROW-5632) [Doc] Add some documentation describing compile/debug workflow on macOS with Xcode IDE

2019-06-17 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5632: -- Summary: [Doc] Add some documentation describing compile/debug workflow on macOS with Xcode IDE Key: ARROW-5632 URL: https://issues.apache.org/jira/browse/ARROW-5632

[jira] [Created] (ARROW-5608) [C++][parquet] Invalid memory access when using parquet::arrow::ColumnReader

2019-06-14 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5608: -- Summary: [C++][parquet] Invalid memory access when using parquet::arrow::ColumnReader Key: ARROW-5608 URL: https://issues.apache.org/jira/browse/ARROW-5608 Project

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-05-01 Thread Hatem Helal
Thanks Wes, your proposed additional data type makes more sense to me. > As a first use case for this I would be personally looking to address > reads of encoded data from > Parquet format without an intermediate pass through dense format > (which can be slow and wasteful for heavily

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-04-30 Thread Hatem Helal
Hi Wes, Thanks for the detailed writeup and I think this an important problem to solve. I spent some time thinking about this when working on ARROW-3769 and came to a similar conclusion that the current dictionary type was limiting when doing partial reads of parquet files. I'm not sure if th

[jira] [Created] (ARROW-5157) [Website] Add MATLAB to powered by Apache Arrow page

2019-04-10 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-5157: -- Summary: [Website] Add MATLAB to powered by Apache Arrow page Key: ARROW-5157 URL: https://issues.apache.org/jira/browse/ARROW-5157 Project: Apache Arrow Issue

Re: MATLAB, Arrow, ABI's and Linux

2019-03-13 Thread Hatem Helal
Hi Joris, Nice to hear from you! I'd like the understand the need to use LD_PRELOAD for picking up a different libstdc++. In case haven't found this already, MATLAB even vendors a libstdc++ as per: (pwd = matlabroot) % more sys/os/glnxa64/README.libstdc++ The GCC runtime libraries included h

[jira] [Created] (ARROW-4785) [CI] Make Travis CI resilient against GPG errors

2019-03-06 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4785: -- Summary: [CI] Make Travis CI resilient against GPG errors Key: ARROW-4785 URL: https://issues.apache.org/jira/browse/ARROW-4785 Project: Apache Arrow Issue Type

Re: Parquet Shared Library Versioning

2019-03-04 Thread Hatem Helal
s a C++ release. - Wes On Mon, Feb 25, 2019 at 1:35 PM Hatem Helal wrote: > > Hi all, > > I’d like to discuss the versioning of the parquet shared libs that are built when you use -DARROW_PARQUET=ON. My observation is that back when parquet-cpp was a

Parquet Shared Library Versioning

2019-02-25 Thread Hatem Helal
Hi all, I’d like to discuss the versioning of the parquet shared libs that are built when you use -DARROW_PARQUET=ON. My observation is that back when parquet-cpp was a separate project the shared libs were versioned using the parquet-cpp version number (e.g 1.4.0). Since moving to a single r

Re: [VOTE] Release Apache Arrow 0.12.1 RC0

2019-02-25 Thread Hatem Helal
+1 (non-binding) Built on macOS 10.13 and ran unittests. On 2/24/19, 1:43 PM, "Wes McKinney" wrote: +1 (binding) Verified release candidate with Windows 10 MSVC 2015 On Fri, Feb 22, 2019 at 4:14 PM Kouhei Sutou wrote: > > +1 (binding) > > I ran the follo

[jira] [Created] (ARROW-4661) [C++] Consolidate random string generators for use in benchmarks and unittests

2019-02-22 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4661: -- Summary: [C++] Consolidate random string generators for use in benchmarks and unittests Key: ARROW-4661 URL: https://issues.apache.org/jira/browse/ARROW-4661 Project

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
not how to /visualize/ where time is spent waiting, but how to /measure/ it. Normal profiling only tells you where CPU time is spent, not what the process is idly waiting for. Regards Antoine. On Thu, 21 Feb 2019 16:29:15 + Hatem Helal wrote:

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
I like flamegraphs for investigating this sort of problem: https://github.com/brendangregg/FlameGraph There are likely many other techniques for inspecting where time is being spent but that can at least help narrow down the search space. On 2/21/19, 4:03 PM, "Francois Saint-Jacques" wrote:

Re: Flight / gRPC scalability issue

2019-02-21 Thread Hatem Helal
I like flamegraphs for investigating this sort of problem: https://github.com/brendangregg/FlameGraph There are likely many other techniques for inspecting where time is being spent but that can at least help narrow down the search space. On 2/21/19, 4:29 PM, "Wes McKinney" wrote: Hi Fr

Re: Enabling MATLAB testing in Travis CI

2019-02-15 Thread Hatem Helal
rsa Labs) is setting up some physical build infrastructure right now and we would be happy to look at setting up MATLAB testing jobs there and make them available to the community. Thanks Wes On Fri, Feb 15, 2019 at 11:11 AM Hatem Helal wrote: > > H

Enabling MATLAB testing in Travis CI

2019-02-15 Thread Hatem Helal
Hi everyone, As was mentioned in the most recent sync call, MathWorks has an ongoing pilot for running MATLAB jobs in Travis CI. I'd like to get a discussion going on having Arrow participate in the pilot so that we can qualify the MATLAB code already in the Arrow project. The integration is

Re: UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
#L52 AFAIK we do respect that if it is set, otherwise we do not guess https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/schema.cc#L65 - Wes On Wed, Feb 6, 2019 at 7:07 AM Hatem Helal wrote: > > Thanks Antoine, that makes good sense.

Re: UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
you store ASCII text data, I would recommend using the utf8 type, since the UTF-8 encoding is a superset of ASCII. Regards Antoine. Le 06/02/2019 à 11:34, Hatem Helal a écrit : > Hi all, > > I wanted to make sure I understood the distinction/u

UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
Hi all, I wanted to make sure I understood the distinction/use cases for choosing between the utf8 and binary logical types. Based on this doc * Utf8 data is Unicode values with UTF-8 encoding * Binary is any other variable l

Re: Round-trip of categorical data with Arrow and Parquet

2019-01-24 Thread Hatem Helal
rquet-cpp for Arrow users. - Wes On Thu, Jan 24, 2019 at 9:59 AM Hatem Helal wrote: > > Hi everyone, > > I wanted to gauge interest and feasibility for adding support for natively reading an arrow::DictionaryArray from a parquet file. Currently, writing an a

Round-trip of categorical data with Arrow and Parquet

2019-01-24 Thread Hatem Helal
Hi everyone, I wanted to gauge interest and feasibility for adding support for natively reading an arrow::DictionaryArray from a parquet file. Currently, writing an arrow::DictionaryArray is read back as the native index type [0]. I came across a prior discussion for this problem in the conte

[jira] [Created] (ARROW-4260) [Python] test_serialize_deserialize_pandas is failing on OSX with Xcode 6.4

2019-01-14 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4260: -- Summary: [Python] test_serialize_deserialize_pandas is failing on OSX with Xcode 6.4 Key: ARROW-4260 URL: https://issues.apache.org/jira/browse/ARROW-4260 Project

Re: Reporting Travis CI Failures

2019-01-14 Thread Hatem Helal
c86f95107849cbcc, for example, which was caused by an upstream package (pandas) having a release. - Wes On Mon, Jan 14, 2019 at 10:27 AM Hatem Helal wrote: > > Hi everyone, > > I’ve had some trouble getting a clean CI build for a recent pu

Reporting Travis CI Failures

2019-01-14 Thread Hatem Helal
Hi everyone, I’ve had some trouble getting a clean CI build for a recent pull request and wanted to understand how this project manages CI failures. I don’t have much experience with Travis CI to draw on but my own approach so far has been to try and correlate failures on my own PR against oth

Re: Building arrow using Xcode on Mac OS

2019-01-04 Thread Hatem Helal
lso run into this issue. Uwe On Fri, Jan 4, 2019, at 12:49 PM, Hatem Helal wrote: > Hi all, > > I wonder if anyone on this list has tried building arrow using Xcode on > Mac OS? I've used "cmake -G Xcode" to generate a project but ca

[jira] [Created] (ARROW-4156) [C++] xcodebuild failure for cmake generated project

2019-01-04 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-4156: -- Summary: [C++] xcodebuild failure for cmake generated project Key: ARROW-4156 URL: https://issues.apache.org/jira/browse/ARROW-4156 Project: Apache Arrow Issue

Building arrow using Xcode on Mac OS

2019-01-04 Thread Hatem Helal
Hi all, I wonder if anyone on this list has tried building arrow using Xcode on Mac OS? I've used "cmake -G Xcode" to generate a project but calling xcodebuild fails. I've copied the syndrome below in case anyone has seen this before. Another observation is that the dylib's aren't generated

[jira] [Created] (ARROW-3564) pyarrow: writing version 2.0 parquet format with dictionary encoding enabled

2018-10-19 Thread Hatem Helal (JIRA)
Hatem Helal created ARROW-3564: -- Summary: pyarrow: writing version 2.0 parquet format with dictionary encoding enabled Key: ARROW-3564 URL: https://issues.apache.org/jira/browse/ARROW-3564 Project