Open PR Count Creeping Up

2020-02-17 Thread Micah Kornfield
Hi everyone, Just thought I'd send a ping out to remind people to try to review/merge/close out PRs. We are currently at 87 open PRs. Thanks, Micah

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-17 Thread Micah Kornfield
I reviewed the spec again (not the implementation). I'm +1 on this. I was wondering if we shared/received feedback on this with any other communities? Thanks, Micah On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield wrote: > I will try to review tomorrow and cast a vote. > > On Fri, Feb 14, 20

RE: Arrow Datasets Functionality for Python

2020-02-17 Thread Matthew Turner
Hi Francois, Thanks for the response - the explanation definitely helped and I will review the provided documents. Hi Wes, I am interested in helping but I have two constraints: - With my current schedule I wont have free time for another 2-3 months - My skillset is more on the

[jira] [Created] (ARROW-7871) [Python] Expose more compute kernels

2020-02-17 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7871: -- Summary: [Python] Expose more compute kernels Key: ARROW-7871 URL: https://issues.apache.org/jira/browse/ARROW-7871 Project: Apache Arrow Issue Type: Imp

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-17 Thread Wes McKinney
On Mon, Feb 17, 2020 at 10:19 AM Neal Richardson wrote: > > Ok, I made https://issues.apache.org/jira/browse/ARROW-7870 to look into > this further. > > If the flaky nightly builds persist, maybe we should suspend the uploading > to GitHub releases until we sort it out? The noise will block out re

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-17 Thread Neal Richardson
Ok, I made https://issues.apache.org/jira/browse/ARROW-7870 to look into this further. If the flaky nightly builds persist, maybe we should suspend the uploading to GitHub releases until we sort it out? The noise will block out real failures in the nightlies that we need to address. Neal On Mon,

[jira] [Created] (ARROW-7870) [CI][Packaging] Host nightly wheels on Apache bintray

2020-02-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7870: -- Summary: [CI][Packaging] Host nightly wheels on Apache bintray Key: ARROW-7870 URL: https://issues.apache.org/jira/browse/ARROW-7870 Project: Apache Arrow

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-17 Thread Wes McKinney
On Mon, Feb 17, 2020 at 9:57 AM Neal Richardson wrote: > > Maybe the new nightly wheel GitHub Page is triggering rate limits, as > KrisztiƔn suggested it might: https://github.com/apache/arrow/pull/6366. > The increase in timeouts seems correlated with when that PR merged. > > I wonder if there's

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-17 Thread Neal Richardson
Maybe the new nightly wheel GitHub Page is triggering rate limits, as KrisztiƔn suggested it might: https://github.com/apache/arrow/pull/6366. The increase in timeouts seems correlated with when that PR merged. I wonder if there's a better solution to making nightly wheels available. GitHub releas

[jira] [Created] (ARROW-7869) [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels

2020-02-17 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7869: - Summary: [Python] Boost::system and boost::filesystem not necessary anymore in Python wheels Key: ARROW-7869 URL: https://issues.apache.org/jira/browse/ARROW-7869 P

Re: Basic question on Apache Arrow

2020-02-17 Thread Wes McKinney
hi Subash, I'm only familiar with Question 1. Spark only makes use of Arrow for accelerating Python and R UDF evaluation and sending data to and from those language APIs (see our blog posts for some discussion about this). So I would guess for what you're saying there aren't any speedups unless th

Re: Schemaless serialization

2020-02-17 Thread Antoine Pitrou
Hi Tewfik, It would be good to step back a bit and explain what your data is, and what the consumer is going to do with it. Regards Antoine. On Fri, 14 Feb 2020 15:08:57 -0800 Tewfik Zeghmi wrote: > Hi Micah, > > The primary language is Python. I'm hoping the that the small overhead of >

Re: Schemaless serialization

2020-02-17 Thread Wes McKinney
hi Micah and Tewfik, The functionality is exposed in Python, see e.g. https://github.com/apache/arrow/blob/apache-arrow-0.16.0/python/pyarrow/tests/test_ipc.py#L685 As Micah said, very small batches aren't necessarily optimized for compactness (for example buffers are padded to multiples of 8).

[jira] [Created] (ARROW-7868) [Crossbow] Reduce GitHub API query parallelism

2020-02-17 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7868: -- Summary: [Crossbow] Reduce GitHub API query parallelism Key: ARROW-7868 URL: https://issues.apache.org/jira/browse/ARROW-7868 Project: Apache Arrow Issu

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-17 Thread Wes McKinney
It seems like GitHub release uploads continue to be a bit flaky. Is there a JIRA about adding some retry logic with delay (e.g. waiting 30-60 seconds in between tries) to attempt to reduce the flakiness? https://github.com/apache/arrow/blob/master/dev/tasks/crossbow.py#L1517 On Sun, Feb 16, 2020