Using ReadAt (hdfsPread) from Python

2019-05-18 Thread Yevgeni Litvin
I'd like to use HadoopFileSystem to open a file and then a read a block of data specified by the offset and the length. I see that for libhdfs driver, arrow supports hdfsPread API. It also exposes it as ReadAt method of the RandomAccessFile interface. I did not find don'ta way to access it from Pyt

[jira] [Created] (ARROW-5368) [C++] Disable jemalloc by default with MinGW

2019-05-18 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5368: --- Summary: [C++] Disable jemalloc by default with MinGW Key: ARROW-5368 URL: https://issues.apache.org/jira/browse/ARROW-5368 Project: Apache Arrow Issue Type: I

[jira] [Created] (ARROW-5369) [C++] Add support for glog on Windows

2019-05-18 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5369: --- Summary: [C++] Add support for glog on Windows Key: ARROW-5369 URL: https://issues.apache.org/jira/browse/ARROW-5369 Project: Apache Arrow Issue Type: Improvem

[jira] [Created] (ARROW-5370) [C++] Detect system uriparser by default

2019-05-18 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5370: --- Summary: [C++] Detect system uriparser by default Key: ARROW-5370 URL: https://issues.apache.org/jira/browse/ARROW-5370 Project: Apache Arrow Issue Type: Impro

[jira] [Created] (ARROW-5371) [Release] Add tests for dev/release/00-prepare.sh

2019-05-18 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5371: --- Summary: [Release] Add tests for dev/release/00-prepare.sh Key: ARROW-5371 URL: https://issues.apache.org/jira/browse/ARROW-5371 Project: Apache Arrow Issue Ty

Re: Using ReadAt (hdfsPread) from Python

2019-05-18 Thread Wes McKinney
hi Yevgeni, Would you like to add a `read_at` method to `pyarrow.NativeFile`? This shouldn't be a huge patch. Can you open a JIRA issue? Thanks On Sat, May 18, 2019 at 3:18 AM Yevgeni Litvin wrote: > > I'd like to use HadoopFileSystem to open a file and then a read a block of > data specified b

Re: [DISCUSS] Formalizing "extension type" metadata in the Arrow binary protocol

2019-05-18 Thread Micah Kornfield
Hi Wes, This approach seems reasonable to me. I'm a little concerned we haven't validated many use-cases against the approach (but I don't see any obvious flaws). Thanks, Micah On Fri, May 17, 2019 at 5:16 AM Wes McKinney wrote: > As Micah brought up, as part of this we would like to formalize

Re: [DISCUSS] Formalizing "extension type" metadata in the Arrow binary protocol

2019-05-18 Thread Wes McKinney
Hi Micah, The use cases I'm aware of are mostly coming from proprietary applications. My idea was for the extension metadata to be as unobtrusive as possible. The only alternative as I see it would be to have an Extension value in the Type union which would be more intrusive to applications handli

Re: [DISCUSS] Formalizing "extension type" metadata in the Arrow binary protocol

2019-05-18 Thread Wes McKinney
On Sat, May 18, 2019, 1:58 PM Wes McKinney wrote: > Hi Micah, > > The use cases I'm aware of are mostly coming from proprietary > applications. My idea was for the extension metadata to be as unobtrusive > as possible. The only alternative as I see it would be to have an Extension > value in the

[jira] [Created] (ARROW-5372) [GLib] Add support for null/boolean values CSV read option

2019-05-18 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5372: --- Summary: [GLib] Add support for null/boolean values CSV read option Key: ARROW-5372 URL: https://issues.apache.org/jira/browse/ARROW-5372 Project: Apache Arrow

[jira] [Created] (ARROW-5373) [Add missing details for Gandiva Java Build

2019-05-18 Thread Daniel Slutsky (JIRA)
Daniel Slutsky created ARROW-5373: - Summary: [Add missing details for Gandiva Java Build Key: ARROW-5373 URL: https://issues.apache.org/jira/browse/ARROW-5373 Project: Apache Arrow Issue Type

Re: [DISCUSS] Formalizing "extension type" metadata in the Arrow binary protocol

2019-05-18 Thread Micah Kornfield
Hi Wes, Like I said I think this approach looks good, I think what I'm looking for is a little more documentation/examples on how additional types would be handled. I think Tensor would be a good example, we also had questions about INET addresses previously, maybe this would be a another good ill