[jira] [Created] (ARROW-5164) [Gandiva] [C++] Introduce 32bit hash functions

2019-04-12 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-5164: -- Summary: [Gandiva] [C++] Introduce 32bit hash functions Key: ARROW-5164 URL: https://issues.apache.org/jira/browse/ARROW-5164 Project: Apache Arrow

[jira] [Created] (ARROW-5165) [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE

2019-04-12 Thread Rok Mihevc (JIRA)
Rok Mihevc created ARROW-5165: - Summary: [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE Key: ARROW-5165 URL: https://issues.apache.org/jira/browse/ARROW-5165 Project: Apache

[jira] [Created] (ARROW-5166) [Python] Statistics for uint64 columns may overflow

2019-04-12 Thread Marco Neumann (JIRA)
Marco Neumann created ARROW-5166: Summary: [Python] Statistics for uint64 columns may overflow Key: ARROW-5166 URL: https://issues.apache.org/jira/browse/ARROW-5166 Project: Apache Arrow Issu

Re: [DISCUSS] 64-bit offset variable width types (i.e.Large List, Last String, Large bytes)

2019-04-12 Thread Jacques Nadeau
Definitely prefer option 1. I'm a -0.5 on the change in general. I think that early on users may want to pattern things this way but as you start trying to parallelize work, pipeline work, etc, moving beyond moderate batch sizes is ultimately a different use case and won't be supported well within

[jira] [Created] (ARROW-5167) Upgrade string-view-light to latest

2019-04-12 Thread Lawrence Chan (JIRA)
Lawrence Chan created ARROW-5167: Summary: Upgrade string-view-light to latest Key: ARROW-5167 URL: https://issues.apache.org/jira/browse/ARROW-5167 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] 64-bit offset variable width types (i.e.Large List, Last String, Large bytes)

2019-04-12 Thread Wes McKinney
Hi Jacques, I think there are different use cases. What we are seeing now in many places is the desire to use the Arrow format to represent very large on-disk datasets (eg memory-mapped). If we don't support this use case it will continue to cause tension in adoption and result in some application

Re: [DISCUSS] 64-bit offset variable width types (i.e.Large List, Last String, Large bytes)

2019-04-12 Thread Jacques Nadeau
Hey Wes, I appreciate your comments and want to be clear that I am not blocking this addition. Memory mapping itself is not in conflict with my comments. However, since Arrow datasets do not exist frequently on disk today, a user can make choices as to whether to use smaller or larger batches. Wh

[jira] [Created] (ARROW-5168) [GLib] Add garrow_array_take()

2019-04-12 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5168: --- Summary: [GLib] Add garrow_array_take() Key: ARROW-5168 URL: https://issues.apache.org/jira/browse/ARROW-5168 Project: Apache Arrow Issue Type: New Feature