[pyarrow] Parquet page header size limit

2019-04-16 Thread shyam narayan singh
Hi While reading a custom parquet file that has extra information embedded (some custom stats), pyarrow is failing to read it. Traceback (most recent call last): File "/tmp/pytest.py", line 19, in table = dataset.read() File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py"

Re: [VOTE] Proposal accepted: change to Arrow Flight protocol: endpoint URIs

2019-04-16 Thread David Li
Thanks all for the comments! I am on vacation, and will refresh the draft PR as soon as I return. Best, David On Wed, Apr 17, 2019, 00:49 Antoine Pitrou wrote: > > Hello, > > This vote closes with 4 binding approvals (+1) and zero disapprovals. > There were also several non-binding approvals.

[jira] [Created] (ARROW-5176) [Python] Automate formatting of python files

2019-04-16 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-5176: Summary: [Python] Automate formatting of python files Key: ARROW-5176 URL: https://issues.apache.org/jira/browse/ARROW-5176 Project: Apache Arrow Iss

Re: [Discuss] Benchmarking infrastructure

2019-04-16 Thread Francois Saint-Jacques
Hello, A small status update, I recently implemented archery [1] a tool for Arrow benchmarks comparison [2]. The documentation ([3] and [4]) is in the pull-request. The primary goal is to compare 2 commits (and/or build directories) for performance regressions. For now, it supports C++ benchmarks.

[jira] [Created] (ARROW-5175) [Benchmarking] Decide which benchmarks are part of regression checks

2019-04-16 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5175: - Summary: [Benchmarking] Decide which benchmarks are part of regression checks Key: ARROW-5175 URL: https://issues.apache.org/jira/browse/ARROW-5175

Re: [VOTE] Proposal accepted: change to Arrow Flight protocol: endpoint URIs

2019-04-16 Thread Antoine Pitrou
Hello, This vote closes with 4 binding approvals (+1) and zero disapprovals. There were also several non-binding approvals. The proposal is therefore accepted. Congrats to David Li and everyone who participated in the discussion. Now the corresponding PR should be refreshed and reviewed: http

[jira] [Created] (ARROW-5174) [Go] implement Stringer for DataTypes

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5174: -- Summary: [Go] implement Stringer for DataTypes Key: ARROW-5174 URL: https://issues.apache.org/jira/browse/ARROW-5174 Project: Apache Arrow Issue Type: Bu

[jira] [Created] (ARROW-5173) [Go] handle multiple concatenated streams back-to-back

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5173: -- Summary: [Go] handle multiple concatenated streams back-to-back Key: ARROW-5173 URL: https://issues.apache.org/jira/browse/ARROW-5173 Project: Apache Arrow

[jira] [Created] (ARROW-5172) [Go] implement reading fixed-size binary arrays from Arrow file

2019-04-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5172: -- Summary: [Go] implement reading fixed-size binary arrays from Arrow file Key: ARROW-5172 URL: https://issues.apache.org/jira/browse/ARROW-5172 Project: Apache Arr

Re: What's the proper procedure to publish a docker image to dockerhub?

2019-04-16 Thread Alberto Ramón
Then in this scenario the most easy is this: 1- Create repository from dockerHub (ask to Wes about the name)(mine is albertozgz) 2-Create a local image with the name of your repository [image: image.png] 3- Upload Image to docker Hub (this process will require Login / password) docker