Re: [DISCUSS] Passing the torch on Python wheel (binary) maintenance

2019-07-15 Thread Wes McKinney
On Mon, Jul 15, 2019 at 10:52 PM Suvayu Ali wrote: > > Hi Wes, others, > > A few thoughts from a user. Firstly, I completely understand your > frustration. I myself have delved into a bit of packaging for many > scientific computing packages, like ROOT from CERN, although not at the scale > o

Re: [DISCUSS] Passing the torch on Python wheel (binary) maintenance

2019-07-15 Thread Suvayu Ali
Hi Wes, others, A few thoughts from a user. Firstly, I completely understand your frustration. I myself have delved into a bit of packaging for many scientific computing packages, like ROOT from CERN, although not at the scale of users that you face here. AIU, wheels are a Python-first spec,

[DISCUSS] Passing the torch on Python wheel (binary) maintenance

2019-07-15 Thread Wes McKinney
hi folks, TL;DR I can't afford for me or my colleagues to continue spending time maintaining the Python binary wheel builds. They have sucked a completely unreasonable amount of time the last few months for reasons that are difficult to completely articulate in an e-mail, so I'm going to lay out t

[jira] [Created] (ARROW-5956) Undefined symbol GetFieldByName

2019-07-15 Thread Jeffrey Wong (JIRA)
Jeffrey Wong created ARROW-5956: --- Summary: Undefined symbol GetFieldByName Key: ARROW-5956 URL: https://issues.apache.org/jira/browse/ARROW-5956 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation

2019-07-15 Thread Eric Liang (JIRA)
Eric Liang created ARROW-5955: - Summary: [Plasma] Support setting memory quotas per plasma client for better isolation Key: ARROW-5955 URL: https://issues.apache.org/jira/browse/ARROW-5955 Project: Apache

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-15 Thread Wes McKinney
On Mon, Jul 15, 2019 at 12:01 PM Antoine Pitrou wrote: > > On Mon, 15 Jul 2019 11:49:56 -0500 > Wes McKinney wrote: > > > > For example, suppose we had a thread pool with a limit of 8 concurrent > > tasks. Now 4 of them perform IO calls. Hypothetically this should > > happen: > > > > * Thread poo

[jira] [Created] (ARROW-5954) Organize source and binary dependency licenses into directories

2019-07-15 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5954: -- Summary: Organize source and binary dependency licenses into directories Key: ARROW-5954 URL: https://issues.apache.org/jira/browse/ARROW-5954 Project: Apache Arr

[jira] [Created] (ARROW-5953) Thrift download ERRORS with apache-arrow-0.14.0

2019-07-15 Thread Brian (JIRA)
Brian created ARROW-5953: Summary: Thrift download ERRORS with apache-arrow-0.14.0 Key: ARROW-5953 URL: https://issues.apache.org/jira/browse/ARROW-5953 Project: Apache Arrow Issue Type: Bug

Re: Workaround for Thrift download ERRORs

2019-07-15 Thread Brian Bowman
Wes, Here are the cmake thrift log lines from a build of apache-arrow git clone on 06Jul2019 where cmake successfully downloads thrift. -- Checking for module 'thrift' -- No package 'thrift' found -- Could NOT find Thrift (missing: THRIFT_STATIC_LIB) Building Apache Thrift from source Downloa

[jira] [Created] (ARROW-5952) [Python] Segfault when reading empty table with category as pandas dataframe

2019-07-15 Thread Daniel Nugent (JIRA)
Daniel Nugent created ARROW-5952: Summary: [Python] Segfault when reading empty table with category as pandas dataframe Key: ARROW-5952 URL: https://issues.apache.org/jira/browse/ARROW-5952 Project: A

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-15 Thread Antoine Pitrou
On Mon, 15 Jul 2019 11:49:56 -0500 Wes McKinney wrote: > > For example, suppose we had a thread pool with a limit of 8 concurrent > tasks. Now 4 of them perform IO calls. Hypothetically this should > happen: > > * Thread pool increments a "soft limit" to allow 4 more tasks to > spawn, so at this

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-15 Thread Wes McKinney
On Mon, Jul 15, 2019 at 11:38 AM Antoine Pitrou wrote: > > > Hi Anton, > > Le 12/07/2019 à 23:21, Malakhov, Anton a écrit : > > > > The result is that all these execution nodes scale well enough and run > > under 100 milliseconds on my 2 x Xeon E5-2650 v4 @ 2.20GHz, 128Gb RAM while > > CSV reade

Re: Workaround for Thrift download ERRORs

2019-07-15 Thread Wes McKinney
hi Brian, Can you please open a JIRA issue? Does running the "get_apache_mirror.py" script work for you by itself? $ python cpp/build-support/get_apache_mirror.py https://www-eu.apache.org/dist/ - Wes On Mon, Jul 15, 2019 at 10:54 AM Brian Bowman wrote: > > Is there a workaround for the follo

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-07-15 Thread Antoine Pitrou
Hi Anton, Le 12/07/2019 à 23:21, Malakhov, Anton a écrit : > > The result is that all these execution nodes scale well enough and run under > 100 milliseconds on my 2 x Xeon E5-2650 v4 @ 2.20GHz, 128Gb RAM while CSV > reader takes several seconds to complete even reading from in-memory file

[jira] [Created] (ARROW-5951) [Python][Wheel] Request UCS4 wheels in the packaging tasks

2019-07-15 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5951: -- Summary: [Python][Wheel] Request UCS4 wheels in the packaging tasks Key: ARROW-5951 URL: https://issues.apache.org/jira/browse/ARROW-5951 Project: Apache Arrow

Workaround for Thrift download ERRORs

2019-07-15 Thread Brian Bowman
Is there a workaround for the following error? requests.exceptions.SSLError: hostname 'www.apache.org' doesn't match either of '*.openoffice.org', 'openoffice.org'/thrift/0.12.0/thrift-0.12.0.tar.gz I’ve inflated apache-arrow-0.14.0.tar and the thrift-0.12.0.tar.gz is not being found curing cma

Re: [DISCUSS][FORMAT] Data Integrity

2019-07-15 Thread Antoine Pitrou
Le 15/07/2019 à 16:15, Wes McKinney a écrit : > If we adopt the position (as we already are in practice, I think) that > the encapsulated IPC message format is the main way that we expose > data from one process to another, then having digests at the message > level seems like the simplest and mo

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-07-15 Thread Wes McKinney
We could start a contrib/ folder in the root of the project, for lack of an obvious place to put code that isn't part of any core library On Mon, Jul 15, 2019 at 9:19 AM David Li wrote: > > Hi all, > > Thanks for the comments so far. Does anyone else have thoughts on > either proposal? For middle

Re: [Discuss][FlightRPC] Extensions to Flight: middleware and DoPut tickets

2019-07-15 Thread David Li
Hi all, Thanks for the comments so far. Does anyone else have thoughts on either proposal? For middleware, I can put up an implementation draft if that's more useful. I could also include an example OpenTracing integration, though I'm not sure how to structure "contrib" modules (presumably we don'

Re: [DISCUSS][FORMAT] Data Integrity

2019-07-15 Thread Wes McKinney
If we adopt the position (as we already are in practice, I think) that the encapsulated IPC message format is the main way that we expose data from one process to another, then having digests at the message level seems like the simplest and most useful thing. FWIW, the Parquet format technically p

[jira] [Created] (ARROW-5950) [Rust] [DataFusion] Add logger dependency

2019-07-15 Thread Andy Grove (JIRA)
Andy Grove created ARROW-5950: - Summary: [Rust] [DataFusion] Add logger dependency Key: ARROW-5950 URL: https://issues.apache.org/jira/browse/ARROW-5950 Project: Apache Arrow Issue Type: Improvem