[GitHub] [arrow] emkornfield commented on pull request #7288: ARROW-8963: [C++][Parquet] optimize LeafReader::NextBatch to save memory

2020-06-01 Thread GitBox
emkornfield commented on pull request #7288: URL: https://github.com/apache/arrow/pull/7288#issuecomment-637320051 @hn5092 thank you for the PR. Could you add some benchmarks on your machine that shows this improves things? I might be looking at the wrong place but it appears memory is r

[GitHub] [arrow] emkornfield commented on pull request #5883: ARROW-7213: [Java] Represent a data element of a vector as a tree of ArrowBufPointer

2020-06-01 Thread GitBox
emkornfield commented on pull request #5883: URL: https://github.com/apache/arrow/pull/5883#issuecomment-637309666 Going to close until the memory PRs are merged and we can figure out if we want this/where we want it. This i

[GitHub] [arrow] emkornfield closed pull request #5883: ARROW-7213: [Java] Represent a data element of a vector as a tree of ArrowBufPointer

2020-06-01 Thread GitBox
emkornfield closed pull request #5883: URL: https://github.com/apache/arrow/pull/5883 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] emkornfield closed pull request #7313: ARROW-8972: [Java] Support range value comparison for large varchar/varbinary vectors

2020-06-01 Thread GitBox
emkornfield closed pull request #7313: URL: https://github.com/apache/arrow/pull/7313 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] emkornfield commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-06-01 Thread GitBox
emkornfield commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-637308504 @BryanCutler would you have time to review? This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] emkornfield commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-06-01 Thread GitBox
emkornfield commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-637308552 @rymurr looks like this needs a rebase This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] emkornfield commented on a change in pull request #7289: ARROW-8948: [Java][Integration] enable duplicate field names integration tests

2020-06-01 Thread GitBox
emkornfield commented on a change in pull request #7289: URL: https://github.com/apache/arrow/pull/7289#discussion_r433648119 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/AbstractStructVector.java ## @@ -40,11 +40,64 @@ public abstract class Abstrac

[GitHub] [arrow] emkornfield commented on a change in pull request #7289: ARROW-8948: [Java][Integration] enable duplicate field names integration tests

2020-06-01 Thread GitBox
emkornfield commented on a change in pull request #7289: URL: https://github.com/apache/arrow/pull/7289#discussion_r433647662 ## File path: java/vector/src/main/java/org/apache/arrow/vector/VectorSchemaRoot.java ## @@ -54,7 +54,7 @@ private Schema schema; private int row

[GitHub] [arrow] emkornfield commented on a change in pull request #7289: ARROW-8948: [Java][Integration] enable duplicate field names integration tests

2020-06-01 Thread GitBox
emkornfield commented on a change in pull request #7289: URL: https://github.com/apache/arrow/pull/7289#discussion_r433647053 ## File path: java/README.md ## @@ -80,8 +80,12 @@ variable are set, the system property takes precedence. ## Java Properties -For java 9 or later,

[GitHub] [arrow] emkornfield commented on a change in pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-01 Thread GitBox
emkornfield commented on a change in pull request #7231: URL: https://github.com/apache/arrow/pull/7231#discussion_r433645139 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowFooter.java ## @@ -96,17 +126,40 @@ public Schema getSchema() { r

[GitHub] [arrow] emkornfield closed pull request #7317: ARROW-9000: [Java] Update errorprone to 2.4.0

2020-06-01 Thread GitBox
emkornfield closed pull request #7317: URL: https://github.com/apache/arrow/pull/7317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] pprudhvi commented on pull request #7323: ARROW-9004: [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread GitBox
pprudhvi commented on pull request #7323: URL: https://github.com/apache/arrow/pull/7323#issuecomment-637272293 Didn't we recently upgrade to llvm 9? Why jump to llvm10 so early? This is an automated message from the Apache G

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637225192 > OK. > I'll work on the GLib part as a separated pull request and add `ARROW_DEPRECATED` in the pull request. Perfect

[GitHub] [arrow] wesm commented on pull request #7310: ARROW-6052: [C++] Split up arrow/array.h/cc into multiple files under arrow/array/, move ArrayData to separate header, make ArrayData::dictionary

2020-06-01 Thread GitBox
wesm commented on pull request #7310: URL: https://github.com/apache/arrow/pull/7310#issuecomment-637224951 Probably be good to merge this soon to limit rebase headaches This is an automated message from the Apache Git Servic

[GitHub] [arrow] wesm closed pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
wesm closed pull request #7300: URL: https://github.com/apache/arrow/pull/7300 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] jianxind removed a comment on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

2020-06-01 Thread GitBox
jianxind removed a comment on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-636677517 Benchmark data: Before: ``` SumKernelFloat/32768/0 2.96 us 2.96 us 236912 bytes_per_second=10.3227G/s null_percent=0 size=32.768k Sum

[GitHub] [arrow] wesm edited a comment on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

2020-06-01 Thread GitBox
wesm edited a comment on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-637218125 I'll try to review this in the next couple days. @pitrou may be able to help also This is an automated messag

[GitHub] [arrow] wesm commented on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

2020-06-01 Thread GitBox
wesm commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-637218125 I'll try to review this in the next couple days This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] ursabot commented on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

2020-06-01 Thread GitBox
ursabot commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-637216678 [AMD64 Ubuntu 18.04 C++ Benchmark (#108899)](https://ci.ursalabs.org/#builders/73/builds/70) builder has been succeeded. Revision: d789801a5e83b9717b8f36d72189198f5527b757

[GitHub] [arrow] wesm commented on pull request #7322: ARROW-8929: [C++] Set the default for compute::Arity::VarArgs to 0

2020-06-01 Thread GitBox
wesm commented on pull request #7322: URL: https://github.com/apache/arrow/pull/7322#issuecomment-637216454 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] wesm closed pull request #7322: ARROW-8929: [C++] Set the default for compute::Arity::VarArgs to 0

2020-06-01 Thread GitBox
wesm closed pull request #7322: URL: https://github.com/apache/arrow/pull/7322 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] houqp edited a comment on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-01 Thread GitBox
houqp edited a comment on pull request #7324: URL: https://github.com/apache/arrow/pull/7324#issuecomment-637209800 @nevi-me do you prefer to move concat kernel logic into append_data in this set of change or as a follow up after https://github.com/apache/arrow/pull/7306 gets merged? ---

[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

2020-06-01 Thread GitBox
jianxind commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-637212463 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache

[GitHub] [arrow] github-actions[bot] commented on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7324: URL: https://github.com/apache/arrow/pull/7324#issuecomment-637210669 https://issues.apache.org/jira/browse/ARROW-9005 This is an automated message from the Apache Git Serv

[GitHub] [arrow] houqp commented on pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-01 Thread GitBox
houqp commented on pull request #7324: URL: https://github.com/apache/arrow/pull/7324#issuecomment-637209800 @nevi-me do you prefer to move concat kernel logic into append_data in this set of change or as a follow up after https://github.com/apache/arrow/pull/7306 get merged? ---

[GitHub] [arrow] houqp opened a new pull request #7324: ARROW-9005: [Rust] [Datafusion] support sort expression

2020-06-01 Thread GitBox
houqp opened a new pull request #7324: URL: https://github.com/apache/arrow/pull/7324 The only missing piece is updating sqlparser to parse null ordering expression, which is going to be a rather big change due to major refactoring from upstream sqlparer crate. So I am going to leave that

[GitHub] [arrow] github-actions[bot] commented on pull request #7323: ARROW-9004: [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7323: URL: https://github.com/apache/arrow/pull/7323#issuecomment-637200610 Revision: 1b4fb8b0c8d18916360d9bc10b85e3762aad790f Submitted crossbow builds: [ursa-labs/crossbow @ actions-279](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] kou commented on pull request #7323: ARROW-9004: [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread GitBox
kou commented on pull request #7323: URL: https://github.com/apache/arrow/pull/7323#issuecomment-637200038 @github-actions crossbow submit -g nightly This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mrkn commented on a change in pull request #6578: ARROW-7371: WIP: [GLib] Add GLib binding of Dataset

2020-06-01 Thread GitBox
mrkn commented on a change in pull request #6578: URL: https://github.com/apache/arrow/pull/6578#discussion_r433558571 ## File path: c_glib/test/test-in-memory-scan-task.rb ## @@ -0,0 +1,50 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

[GitHub] [arrow] kou commented on a change in pull request #6578: ARROW-7371: WIP: [GLib] Add GLib binding of Dataset

2020-06-01 Thread GitBox
kou commented on a change in pull request #6578: URL: https://github.com/apache/arrow/pull/6578#discussion_r433556782 ## File path: c_glib/test/test-in-memory-scan-task.rb ## @@ -0,0 +1,50 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor l

[GitHub] [arrow] github-actions[bot] commented on pull request #7323: ARROW-9004: [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7323: URL: https://github.com/apache/arrow/pull/7323#issuecomment-637190482 https://issues.apache.org/jira/browse/ARROW-9004 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kou opened a new pull request #7323: ARROW-9004: [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread GitBox
kou opened a new pull request #7323: URL: https://github.com/apache/arrow/pull/7323 LLVM 7 and 8 are still supported. Clang Tools still use 8 because Clang Tools 10 reports an error for RaipdJSON in sanitizer build. We should work on this as a separated task. e.g.:

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-01 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-637187135 As long as @pitrou is happy with my workaround to avoid inflicting pain on AMD processor users. I think I need to fix one signed conversion bug to make CI happy. I shou

[GitHub] [arrow] kou commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
kou commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637182726 OK. I'll work on the GLib part as a separated pull request and add `ARROW_DEPRECATED` in the pull request. This is

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-01 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-637180635 I'll kick the tires on this re: performance. Is this ready to merge otherwise? This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #6590: optimization debian package manager tweaks

2020-06-01 Thread GitBox
wesm commented on pull request #6590: URL: https://github.com/apache/arrow/pull/6590#issuecomment-637180294 This has grown stale. Per @kou's comment this should be split into smaller tasks and submitted as other PRs This is

[GitHub] [arrow] wesm closed pull request #6590: optimization debian package manager tweaks

2020-06-01 Thread GitBox
wesm closed pull request #6590: URL: https://github.com/apache/arrow/pull/6590 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm closed pull request #6220: ARROW-7605: [C++] Bundle private jemalloc symbols into static library libarrow.a

2020-06-01 Thread GitBox
wesm closed pull request #6220: URL: https://github.com/apache/arrow/pull/6220 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm commented on pull request #6220: ARROW-7605: [C++] Bundle private jemalloc symbols into static library libarrow.a

2020-06-01 Thread GitBox
wesm commented on pull request #6220: URL: https://github.com/apache/arrow/pull/6220#issuecomment-637180097 Closing this PR for now. Hopefully someone can pick up this project This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7322: ARROW-8929: [C++] Set the default for compute::Arity::VarArgs to 0

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7322: URL: https://github.com/apache/arrow/pull/7322#issuecomment-637170521 https://issues.apache.org/jira/browse/ARROW-8929 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7322: ARROW-8929: [C++] Set the default for compute::Arity::VarArgs to 0

2020-06-01 Thread GitBox
wesm opened a new pull request #7322: URL: https://github.com/apache/arrow/pull/7322 As Micah pointed out, 0 is a more reasonable default for the minimum number of arguments than 1 is. This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
wesm commented on pull request #7300: URL: https://github.com/apache/arrow/pull/7300#issuecomment-637166114 The RTools 4.0 build was hanging so I triggered a new build This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7321: URL: https://github.com/apache/arrow/pull/7321#issuecomment-637164899 https://issues.apache.org/jira/browse/ARROW-8985 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-01 Thread GitBox
wesm opened a new pull request #7321: URL: https://github.com/apache/arrow/pull/7321 See mailing list discussion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637161793 Done. So GLib can be refactored later. This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] github-actions[bot] commented on pull request #7320: ARROW-8896: [C++] Use Take to implement dictionary to T casts

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7320: URL: https://github.com/apache/arrow/pull/7320#issuecomment-637159969 https://issues.apache.org/jira/browse/ARROW-8896 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7320: ARROW-8896: [C++] Use Take to implement dictionary to T casts

2020-06-01 Thread GitBox
wesm opened a new pull request #7320: URL: https://github.com/apache/arrow/pull/7320 In addition to removing duplicated logic, this decreases the size of -O3 libarrow.so about 250K on Linux. This is an automated message fro

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637156379 @kou thanks. I'll try to add deprecated APIs for the functions that were removed so that merging need not be blocked on the refactoring -

[GitHub] [arrow] kou commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
kou commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637141114 I'll take a look this in a few days. This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637130350 Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] mrkn commented on a change in pull request #6578: ARROW-7371: WIP: [GLib] Add GLib binding of Dataset

2020-06-01 Thread GitBox
mrkn commented on a change in pull request #6578: URL: https://github.com/apache/arrow/pull/6578#discussion_r433506325 ## File path: c_glib/test/test-in-memory-scan-task.rb ## @@ -0,0 +1,50 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

[GitHub] [arrow] wesm edited a comment on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm edited a comment on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637112653 I figured out the R issue, I'm trying to fix. It's this hack here https://github.com/apache/arrow/blob/master/r/src/compute.cpp#L180 ---

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637112653 I figured out the R issue, I'm trying to fix This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] nevi-me commented on pull request #7297: ARROW-6945: [Rust] [Integration Tests] Try to run rust integration tests

2020-06-01 Thread GitBox
nevi-me commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-637103723 I have made progress on large arrays (https://github.com/apache/arrow/compare/master...nevi-me:rust-large-lists), the code is very repetitive so I still need to clean it up.

[GitHub] [arrow] mrkn commented on a change in pull request #6578: ARROW-7371: WIP: [GLib] Add GLib binding of Dataset

2020-06-01 Thread GitBox
mrkn commented on a change in pull request #6578: URL: https://github.com/apache/arrow/pull/6578#discussion_r433490389 ## File path: c_glib/arrow-dataset-glib/scanner.cpp ## @@ -0,0 +1,524 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

[GitHub] [arrow] nevi-me commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-06-01 Thread GitBox
nevi-me commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-637100922 There's some open comments/questions on this. I'll make time to work on them before `1.0.0` release, so we can merge this. ---

[GitHub] [arrow] github-actions[bot] commented on pull request #7319: [DRAFT] [Rust] Parquet Arrow writer with nested support

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7319: URL: https://github.com/apache/arrow/pull/7319#issuecomment-637100265 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] github-actions[bot] commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637100263 https://issues.apache.org/jira/browse/ARROW-8917 This is an automated message from the Apache Git Serv

[GitHub] [arrow] nevi-me commented on pull request #7252: ARROW-8906: [Rust] [DataFusion] support schema inference from multiple CSV files

2020-06-01 Thread GitBox
nevi-me commented on pull request #7252: URL: https://github.com/apache/arrow/pull/7252#issuecomment-637099283 I'll merge this tomorrow evening GMT if there haven't been any further reviews. This is an automated message fro

[GitHub] [arrow] nevi-me closed pull request #7265: ARROW-8931: [Rust] add lexical sort support to arrow compute kernel

2020-06-01 Thread GitBox
nevi-me closed pull request #7265: URL: https://github.com/apache/arrow/pull/7265 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] wesm edited a comment on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm edited a comment on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637095245 @nealrichardson I will need your help on the one failing R test @kou I removed 6 of the 8 `Take` APIs -- I think it would be better for GLib to use the Datum/Datum or Ca

[GitHub] [arrow] nevi-me opened a new pull request #7319: [DRAFT] [Rust] Parquet Arrow writer with nested support

2020-06-01 Thread GitBox
nevi-me opened a new pull request #7319: URL: https://github.com/apache/arrow/pull/7319 **Note**: I started making changes to #6785, and ended up deviating a lot. ___ This is a draft to implement an arrow writer for parquet. It supports the following (no complete test coverage ye

[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm commented on pull request #7318: URL: https://github.com/apache/arrow/pull/7318#issuecomment-637095245 @nealrichardson I will need your help on the one failing R test @kou I removed 6 of the 8 `Take` APIs -- I think it would be better for GLib to use the Datum/Datum API if possi

[GitHub] [arrow] wesm opened a new pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

2020-06-01 Thread GitBox
wesm opened a new pull request #7318: URL: https://github.com/apache/arrow/pull/7318 A "metafunction" is one that dispatches to other functions based on the argument types. It does not contain any kernels. Other stuff in this PR: * Make "take" and "filter" metafunctions that a

[GitHub] [arrow] wesm commented on pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
wesm commented on pull request #7300: URL: https://github.com/apache/arrow/pull/7300#issuecomment-637070516 Needs rebase This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] fsaintjacques closed pull request #7285: ARROW-8843: [C++] Compare bitmaps in words

2020-06-01 Thread GitBox
fsaintjacques closed pull request #7285: URL: https://github.com/apache/arrow/pull/7285 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] houqp commented on pull request #7309: ARROW-8993: [Rust] support reading gzipped json files

2020-06-01 Thread GitBox
houqp commented on pull request #7309: URL: https://github.com/apache/arrow/pull/7309#issuecomment-637059093 Looks like we have to remove Seek constraints in order to support compressed format across the board. If this is the plan here as well, then I recommend just implement Read for json

[GitHub] [arrow] zeapo commented on pull request #7309: ARROW-8993: [Rust] support reading gzipped json files

2020-06-01 Thread GitBox
zeapo commented on pull request #7309: URL: https://github.com/apache/arrow/pull/7309#issuecomment-637046642 Would it make sense to implement for json reader the same interfaces that are on csv reader, i.e. `Read + Seek` for `infer_file_schema`, and a `json::Reader::from_buf_reader()`.

[GitHub] [arrow] houqp commented on pull request #7309: ARROW-8993: [Rust] support reading gzipped json files

2020-06-01 Thread GitBox
houqp commented on pull request #7309: URL: https://github.com/apache/arrow/pull/7309#issuecomment-637040555 > One dramatic alternative would be to always require a schema, and leave inference to the user. We could then consume the buffer reader (reader: mut BufReader instead of reader: &m

[GitHub] [arrow] fsaintjacques edited a comment on pull request #7285: ARROW-8843: [C++] Compare bitmaps in words

2020-06-01 Thread GitBox
fsaintjacques edited a comment on pull request #7285: URL: https://github.com/apache/arrow/pull/7285#issuecomment-637034980 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] fsaintjacques commented on pull request #7285: ARROW-8843: [C++] Compare bitmaps in words

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7285: URL: https://github.com/apache/arrow/pull/7285#issuecomment-637034980 @cyb70289 Thank for this. For future reference, if you add a new benchmark, do it first as a seperate commit, and then add the improvement in a following (chronologically) s

[GitHub] [arrow] fsaintjacques commented on pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7300: URL: https://github.com/apache/arrow/pull/7300#issuecomment-637030938 once https://github.com/ursa-labs/ursabot/pull/197 is merged, we shouldn't need to specify `origin/master` as a baseline. --

[GitHub] [arrow] ursabot commented on pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
ursabot commented on pull request #7300: URL: https://github.com/apache/arrow/pull/7300#issuecomment-637030201 [AMD64 Ubuntu 18.04 C++ Benchmark (#108807)](https://ci.ursalabs.org/#builders/73/builds/69) builder has been succeeded. Revision: be1ecf709e862a84284ed354239859a15a32d702

[GitHub] [arrow] lidavidm commented on pull request #7224: ARROW-8858: [FlightRPC] ensure binary/multi-valued headers are properly exposed

2020-06-01 Thread GitBox
lidavidm commented on pull request #7224: URL: https://github.com/apache/arrow/pull/7224#issuecomment-637026817 @pitrou following up, any other comments here? Thanks! This is an automated message from the Apache Git Service.

[GitHub] [arrow] lidavidm closed pull request #7270: ARROW-8485: [Integration][Java] Implement extension types integration

2020-06-01 Thread GitBox
lidavidm closed pull request #7270: URL: https://github.com/apache/arrow/pull/7270 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] github-actions[bot] commented on pull request #7317: ARROW-9000: [Java] Update errorprone to 2.4.0

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7317: URL: https://github.com/apache/arrow/pull/7317#issuecomment-637024588 https://issues.apache.org/jira/browse/ARROW-9000 This is an automated message from the Apache Git Serv

[GitHub] [arrow] fsaintjacques commented on pull request #7300: ARROW-8844: [C++] Transfer bitmap in words

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7300: URL: https://github.com/apache/arrow/pull/7300#issuecomment-637024748 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master ---

[GitHub] [arrow] fsaintjacques closed pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques closed pull request #7316: URL: https://github.com/apache/arrow/pull/7316 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] laurentgo opened a new pull request #7317: ARROW-9000: [Java] Update errorprone to 2.4.0

2020-06-01 Thread GitBox
laurentgo opened a new pull request #7317: URL: https://github.com/apache/arrow/pull/7317 Errorprone 2.3.3 is not compatible with JDK14 and crashes while building the project. Update to 2.4.0 version which is compatible with latest JDK. ---

[GitHub] [arrow] bkietz closed pull request #7311: ARROW-7784: [C++] Improve compilation time of arrow/array/diff.cc and reduce code size

2020-06-01 Thread GitBox
bkietz closed pull request #7311: URL: https://github.com/apache/arrow/pull/7311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] ursabot commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
ursabot commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-637005626 [AMD64 Ubuntu 18.04 C++ Benchmark (#108788)](https://ci.ursalabs.org/#builders/73/builds/68) builder has been succeeded. Revision: 82ebbe815f899daece4b7186197b1519f1274d4b

[GitHub] [arrow] kszucs commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
kszucs commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-63653 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master T

[GitHub] [arrow] kszucs commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
kszucs commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636998101 @ursabot build This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] kszucs commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
kszucs commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636997158 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master T

[GitHub] [arrow] github-actions[bot] commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
github-actions[bot] commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636995741 https://issues.apache.org/jira/browse/ARROW-8997 This is an automated message from the Apache Git Serv

[GitHub] [arrow] fsaintjacques removed a comment on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques removed a comment on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991533 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master ---

[GitHub] [arrow] fsaintjacques commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636993138 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master ---

[GitHub] [arrow] fsaintjacques removed a comment on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques removed a comment on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991119 @ursabot benchmark diff --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master --

[GitHub] [arrow] ursabot removed a comment on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
ursabot removed a comment on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991141 ``` Got unexpected extra argument (origin/master) ``` This is an automated message from the Apache G

[GitHub] [arrow] fsaintjacques commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991119 @ursabot benchmark diff --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master --

[GitHub] [arrow] fsaintjacques commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991533 @ursabot benchmark --suite-filter=arrow-bit-util-benchmark --benchmark-filter=CopyBitmapWithOffset origin/master ---

[GitHub] [arrow] ursabot commented on pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
ursabot commented on pull request #7316: URL: https://github.com/apache/arrow/pull/7316#issuecomment-636991141 ``` Got unexpected extra argument (origin/master) ``` This is an automated message from the Apache Git Servi

[GitHub] [arrow] fsaintjacques opened a new pull request #7316: ARROW-8997: [Archery] Improve benchmark comparison formatting

2020-06-01 Thread GitBox
fsaintjacques opened a new pull request #7316: URL: https://github.com/apache/arrow/pull/7316 This adds human readable values to the compare values. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] sonthonaxrk commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-06-01 Thread GitBox
sonthonaxrk commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-636979018 @jorisvandenbossche yes it would affect how `read` works as it changes the reader properties. I'm not too fussed about adding it into read as you can use `set_batch_size

[GitHub] [arrow] wesm commented on a change in pull request #7311: ARROW-7784: [C++] Improve compilation time of arrow/array/diff.cc and reduce code size

2020-06-01 Thread GitBox
wesm commented on a change in pull request #7311: URL: https://github.com/apache/arrow/pull/7311#discussion_r433349705 ## File path: cpp/src/arrow/array/diff.cc ## @@ -342,106 +311,74 @@ class QuadraticSpaceMyersDiff { {field("insert", boolean()), field("run_length", i

[GitHub] [arrow] wesm edited a comment on pull request #7281: WIP Put more things in type_fwds

2020-06-01 Thread GitBox
wesm edited a comment on pull request #7281: URL: https://github.com/apache/arrow/pull/7281#issuecomment-636918597 I can help with this if needed This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] bkietz commented on pull request #7181: ARROW-8799: [C++][Parquet] NestedListReader needs to handle empty item batches

2020-06-01 Thread GitBox
bkietz commented on pull request #7181: URL: https://github.com/apache/arrow/pull/7181#issuecomment-636937045 @emkornfield @wesm In adding a unit test I've become uncertain of the `ColumnReader` contract and whether my solution upholds it - [ColumnReader::NextBatch's doccomment](htt

[GitHub] [arrow] wesm commented on pull request #7308: ARROW-6978: [R] Add bindings for sum and mean compute kernels

2020-06-01 Thread GitBox
wesm commented on pull request #7308: URL: https://github.com/apache/arrow/pull/7308#issuecomment-636919225 OK, we should introduce a `ScalarAggregateOptions` providing null handling behavior, then This is an automated messa

[GitHub] [arrow] bkietz commented on a change in pull request #7311: ARROW-7784: [C++] Improve compilation time of arrow/array/diff.cc and reduce code size

2020-06-01 Thread GitBox
bkietz commented on a change in pull request #7311: URL: https://github.com/apache/arrow/pull/7311#discussion_r433297868 ## File path: cpp/src/arrow/array/diff.cc ## @@ -181,17 +146,37 @@ internal::LazyRange> MakeNullOrViewRange( /// representation is minimal in the common ca

[GitHub] [arrow] wesm commented on pull request #7281: WIP Put more things in type_fwds

2020-06-01 Thread GitBox
wesm commented on pull request #7281: URL: https://github.com/apache/arrow/pull/7281#issuecomment-636918597 I can help with this This is an automated message from the Apache Git Service. To respond to the message, please log

  1   2   >