[GitHub] [arrow] arthursunbao commented on issue #10885: Does arrow intends to support IDL schema like protobuf?

2021-08-09 Thread GitBox
arthursunbao commented on issue #10885: URL: https://github.com/apache/arrow/issues/10885#issuecomment-895778664 OK. Thanks. That all I want to ask -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow-datafusion] Dandandan closed issue #843: Incorrect results for joins on hash collisions

2021-08-09 Thread GitBox
Dandandan closed issue #843: URL: https://github.com/apache/arrow-datafusion/issues/843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-uns

[GitHub] [arrow-datafusion] Dandandan merged pull request #845: Fix right, full join handling when having multiple non-matching rows at the left side

2021-08-09 Thread GitBox
Dandandan merged pull request #845: URL: https://github.com/apache/arrow-datafusion/pull/845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: githu

[GitHub] [arrow] aocsa commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r685732700 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2146,6 +2147,203 @@ class TakeMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] aocsa edited a comment on pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa edited a comment on pull request #10802: URL: https://github.com/apache/arrow/pull/10802#issuecomment-895768466 I updated this PR addressing feedback comments. Main change, the following test cases were added. - 0-lengthed inputs (to test the early termination code paths)

[GitHub] [arrow] aocsa edited a comment on pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa edited a comment on pull request #10802: URL: https://github.com/apache/arrow/pull/10802#issuecomment-895768466 I updated the PR addressing feedback. The following test cases were added. - 0-lengthed inputs (to test the early termination code paths) - non-zero but all nul

[GitHub] [arrow] aocsa commented on pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa commented on pull request #10802: URL: https://github.com/apache/arrow/pull/10802#issuecomment-895768466 I updated the PR addressing feedback. The following test cases were added. - 0-lengthed inputs (to test the early termination code paths) - non-zero but all null value

[GitHub] [arrow-datafusion] houqp commented on pull request #688: run ballista integration test in CI

2021-08-09 Thread GitBox
houqp commented on pull request #688: URL: https://github.com/apache/arrow-datafusion/pull/688#issuecomment-895764931 converting PR back to draft mode since I noticed buildx just released a native github action backend that we can leverage to keep layer cache size from growing unbounded. i

[GitHub] [arrow-datafusion] sundy-li commented on issue #846: Improve grouping performance by special casing small / fixed size keys

2021-08-09 Thread GitBox
sundy-li commented on issue #846: URL: https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895763475 > With grouping the values in one value I am wondering whether it's good enough for the hashtable? Or would you hash that again? We don't care about the rehash in hash

[GitHub] [arrow] aocsa commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r684369347 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2146,6 +2147,203 @@ class TakeMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] aocsa commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r685724753 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2146,6 +2147,203 @@ class TakeMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] aocsa commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
aocsa commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r684373896 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2146,6 +2147,203 @@ class TakeMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] kou closed pull request #10900: ARROW-13585: [GLib] Add support for C ABI interface

2021-08-09 Thread GitBox
kou closed pull request #10900: URL: https://github.com/apache/arrow/pull/10900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #10900: ARROW-13585: [GLib] Add support for C ABI interface

2021-08-09 Thread GitBox
kou commented on pull request #10900: URL: https://github.com/apache/arrow/pull/10900#issuecomment-895749575 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow-datafusion] Dandandan commented on pull request #844: Add ScalarValue::eq_array optimized comparison function

2021-08-09 Thread GitBox
Dandandan commented on pull request #844: URL: https://github.com/apache/arrow-datafusion/pull/844#issuecomment-895743451 I think it's pretty hard as @alamb mentions to vectorize this part, as it also depends on the hashtable data structure (check collision on insert). I think a fully vect

[GitHub] [arrow-datafusion] Dandandan commented on pull request #844: Add ScalarValue::eq_array optimized comparison function

2021-08-09 Thread GitBox
Dandandan commented on pull request #844: URL: https://github.com/apache/arrow-datafusion/pull/844#issuecomment-895739487 I agree vectorizing that part can be hard I think it means somehow delaying the collision handling and doing it for the full batch instead. That might require impleme

[GitHub] [arrow-datafusion] Dandandan commented on issue #846: Improve grouping performance by special casing small / fixed size keys

2021-08-09 Thread GitBox
Dandandan commented on issue #846: URL: https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895733077 > If a column is nullable, we can use another byte to store the nullable bits. > > If [u8, u8, u16] are all nullable, u64 key can be used. How do you avoid tha

[GitHub] [arrow] emkornfield commented on pull request #10603: ARROW-13191: [Go] allow external schema in ipc readers

2021-08-09 Thread GitBox
emkornfield commented on pull request #10603: URL: https://github.com/apache/arrow/pull/10603#issuecomment-895723052 @shollyman any thoughts on were you want to take this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] emkornfield closed pull request #10600: ARROW-13172: [Java] Make TYPE_WIDTH publicly accessible

2021-08-09 Thread GitBox
emkornfield closed pull request #10600: URL: https://github.com/apache/arrow/pull/10600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-uns

[GitHub] [arrow] emkornfield commented on pull request #10600: ARROW-13172: [Java] Make TYPE_WIDTH publicly accessible

2021-08-09 Thread GitBox
emkornfield commented on pull request #10600: URL: https://github.com/apache/arrow/pull/10600#issuecomment-895720293 Sorry for the delay. Looks OK to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

2021-08-09 Thread GitBox
emkornfield commented on a change in pull request #10789: URL: https://github.com/apache/arrow/pull/10789#discussion_r685679951 ## File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorTypeVisitor.java ## @@ -114,6 +114,25 @@ private void validate

[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

2021-08-09 Thread GitBox
emkornfield commented on a change in pull request #10789: URL: https://github.com/apache/arrow/pull/10789#discussion_r685679475 ## File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [arrow] emkornfield commented on a change in pull request #10789: ARROW-5926: [Java] Test fuzzer inputs

2021-08-09 Thread GitBox
emkornfield commented on a change in pull request #10789: URL: https://github.com/apache/arrow/pull/10789#discussion_r685679286 ## File path: java/tools/src/test/java/org/apache/arrow/tools/TestIpcFuzz.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [arrow] dongjoon-hyun commented on pull request #10838: ARROW-13506: [C++][Java] Upgrade ORC to 1.6.9

2021-08-09 Thread GitBox
dongjoon-hyun commented on pull request #10838: URL: https://github.com/apache/arrow/pull/10838#issuecomment-895716743 Thank you so much, @emkornfield ! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] emkornfield commented on pull request #10838: ARROW-13506: [C++][Java] Upgrade ORC to 1.6.9

2021-08-09 Thread GitBox
emkornfield commented on pull request #10838: URL: https://github.com/apache/arrow/pull/10838#issuecomment-895714092 Sorry for the delay, JIRA is now assigned to @dongjoon-hyun thanks for the update! -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] emkornfield commented on pull request #10864: ARROW-13544 [Java]: Remove APIs that have been deprecated for long

2021-08-09 Thread GitBox
emkornfield commented on pull request #10864: URL: https://github.com/apache/arrow/pull/10864#issuecomment-895707729 @liyafan82 thanks for doing this. i think we should split this work as follows: 1. Changes to ArrowBuf 2. Changes to Vectors 3. Rollback removal of "deprecatd" m

[GitHub] [arrow-datafusion] andygrove closed issue #658: Aggregate queries produce different results between runs

2021-08-09 Thread GitBox
andygrove closed issue #658: URL: https://github.com/apache/arrow-datafusion/issues/658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-uns

[GitHub] [arrow-datafusion] andygrove commented on issue #658: Aggregate queries produce different results between runs

2021-08-09 Thread GitBox
andygrove commented on issue #658: URL: https://github.com/apache/arrow-datafusion/issues/658#issuecomment-895699718 @Dandandan This does seem to be resolved now. I ran it half a dozen times just now and got consistent results: ``` ++-+

[GitHub] [arrow] lilixiang commented on a change in pull request #10893: ARROW-13577: [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error

2021-08-09 Thread GitBox
lilixiang commented on a change in pull request #10893: URL: https://github.com/apache/arrow/pull/10893#discussion_r685647708 ## File path: python/pyarrow/tests/test_flight.py ## @@ -1545,6 +1573,33 @@ def test_roundtrip_errors(): with pytest.raises(flight.FlightIntern

[GitHub] [arrow] jvictorhuguenin commented on pull request #10425: ARROW-12910: [Gandiva][C++]Add support for ADD and SUBTRACT functions receiving time intervals

2021-08-09 Thread GitBox
jvictorhuguenin commented on pull request #10425: URL: https://github.com/apache/arrow/pull/10425#issuecomment-895676708 @anthonylouisbsb, applied changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-cookbook] westonpace merged pull request #14: Adding thisisnic and amol- as collaborators

2021-08-09 Thread GitBox
westonpace merged pull request #14: URL: https://github.com/apache/arrow-cookbook/pull/14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-u

[GitHub] [arrow-cookbook] westonpace opened a new pull request #26: Change gh-pages deploy to a single commit

2021-08-09 Thread GitBox
westonpace opened a new pull request #26: URL: https://github.com/apache/arrow-cookbook/pull/26 Right now the deploy operation checks out the current gh-pages branch, does a `git add *` to update it, and then pushes it. However, this has some downsides, most of which could be worked aroun

[GitHub] [arrow-datafusion] houqp edited a comment on issue #824: A global, shared `ExecutionContext`

2021-08-09 Thread GitBox
houqp edited a comment on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-895662485 Based on the discussion so far, I recommend closing this issue since it was originally created for global shared ExecutionContext and I think we have already reached a c

[GitHub] [arrow-datafusion] houqp commented on issue #824: A global, shared `ExecutionContext`

2021-08-09 Thread GitBox
houqp commented on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-895662485 Based on the discussion so far, I recommend closing this issue since it was originally created for global shared ExecutionContext and I think we have already reached a consensu

[GitHub] [arrow] lidavidm commented on a change in pull request #10890: ARROW-13575: [C++] Add hash_product kernel

2021-08-09 Thread GitBox
lidavidm commented on a change in pull request #10890: URL: https://github.com/apache/arrow/pull/10890#discussion_r685624528 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -133,6 +134,116 @@ Result> MeanInit(KernelContext* ctx, return visitor.Create();

[GitHub] [arrow-rs] houqp commented on a change in pull request #651: rough draft of time arithmetic

2021-08-09 Thread GitBox
houqp commented on a change in pull request #651: URL: https://github.com/apache/arrow-rs/pull/651#discussion_r685619334 ## File path: arrow/src/compute/kernels/temporal.rs ## @@ -166,8 +168,62 @@ where Ok(b.finish()) } +/// Add the given `time_delta` to each time in th

[GitHub] [arrow] kou commented on a change in pull request #10710: ARROW-11460: [R] Use system libraries if present on Linux

2021-08-09 Thread GitBox
kou commented on a change in pull request #10710: URL: https://github.com/apache/arrow/pull/10710#discussion_r685617384 ## File path: r/configure ## @@ -173,6 +186,11 @@ else BUNDLED_LIBS=`echo "$BUNDLED_LIBS" | sed -e "s/\\.a lib/ -l/g" | sed -e "s/\\.a$//" | sed -e

[GitHub] [arrow-rs] sunchao closed issue #660: Parquet fixed length byte array columns write byte array statistics

2021-08-09 Thread GitBox
sunchao closed issue #660: URL: https://github.com/apache/arrow-rs/issues/660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-rs] sunchao merged pull request #662: Write FixedLenByteArray stats for FixedLenByteArray columns (not ByteArray stats)

2021-08-09 Thread GitBox
sunchao merged pull request #662: URL: https://github.com/apache/arrow-rs/pull/662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] jvictorhuguenin commented on a change in pull request #10425: ARROW-12910: [Gandiva][C++]Add support for ADD and SUBTRACT functions receiving time intervals

2021-08-09 Thread GitBox
jvictorhuguenin commented on a change in pull request #10425: URL: https://github.com/apache/arrow/pull/10425#discussion_r685614811 ## File path: cpp/src/gandiva/precompiled/time_test.cc ## @@ -315,6 +316,18 @@ TEST(TestTime, TimeStampAdd) { EXPECT_EQ(add_date64_int64(String

[GitHub] [arrow] jvictorhuguenin commented on a change in pull request #10425: ARROW-12910: [Gandiva][C++]Add support for ADD and SUBTRACT functions receiving time intervals

2021-08-09 Thread GitBox
jvictorhuguenin commented on a change in pull request #10425: URL: https://github.com/apache/arrow/pull/10425#discussion_r685610874 ## File path: cpp/src/gandiva/precompiled/timestamp_arithmetic.cc ## @@ -172,6 +172,57 @@ TIMESTAMP_DIFF(timestamp) return millis + TO_MILLIS

[GitHub] [arrow] kou commented on a change in pull request #10900: ARROW-13585: [GLib] Add support for C ABI interface

2021-08-09 Thread GitBox
kou commented on a change in pull request #10900: URL: https://github.com/apache/arrow/pull/10900#discussion_r685603701 ## File path: c_glib/arrow-glib/basic-array.cpp ## @@ -556,6 +558,81 @@ garrow_array_class_init(GArrowArrayClass *klass) g_object_class_install_property(go

[GitHub] [arrow-datafusion] sundy-li commented on issue #846: Improve grouping performance by special casing small / fixed size keys

2021-08-09 Thread GitBox
sundy-li commented on issue #846: URL: https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895631463 If a column is nullable, we can use another byte to store the nullable bits. If [u8, u8, u16] are all nullable, u64 key can be used. -- This is an automated message

[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #22: Initial C++ cookbook

2021-08-09 Thread GitBox
lidavidm commented on a change in pull request #22: URL: https://github.com/apache/arrow-cookbook/pull/22#discussion_r685599066 ## File path: cpp/CONTRIBUTING.md ## @@ -0,0 +1,184 @@ +Bulding the C++ Cookbook + + +The C++ cookbook combines output from a

[GitHub] [arrow-datafusion] houqp commented on pull request #801: Create changelog for datafusion and ballista release

2021-08-09 Thread GitBox
houqp commented on pull request #801: URL: https://github.com/apache/arrow-datafusion/pull/801#issuecomment-895625177 I will wait for https://github.com/apache/arrow-datafusion/pull/845 before merging this in to create a release tarball for voting. -- This is an automated message from th

[GitHub] [arrow] edponce commented on a change in pull request #10896: ARROW-12959: [C++] Option for is_null(NaN) to evaluate to true

2021-08-09 Thread GitBox
edponce commented on a change in pull request #10896: URL: https://github.com/apache/arrow/pull/10896#discussion_r685586159 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -76,11 +79,32 @@ struct IsInfOperator { struct IsNullOperator { static Status C

[GitHub] [arrow] hengaini2055 edited a comment on issue #10899: Does feather file identify with pyarrow.memory_map(file)?

2021-08-09 Thread GitBox
hengaini2055 edited a comment on issue #10899: URL: https://github.com/apache/arrow/issues/10899#issuecomment-895618846 [library datasets](https://github.com/huggingface/datasets/blob/171f2bba9dd8b92006b13cf076a5bf31d67d3e69/src/datasets/table.py#L42), use ```pa.memory_map(filename)``` to

[GitHub] [arrow] edponce commented on a change in pull request #10896: ARROW-12959: [C++] Option for is_null(NaN) to evaluate to true

2021-08-09 Thread GitBox
edponce commented on a change in pull request #10896: URL: https://github.com/apache/arrow/pull/10896#discussion_r685586159 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -76,11 +79,32 @@ struct IsInfOperator { struct IsNullOperator { static Status C

[GitHub] [arrow] hengaini2055 commented on issue #10899: Does feather file identify with pyarrow.memory_map(file)?

2021-08-09 Thread GitBox
hengaini2055 commented on issue #10899: URL: https://github.com/apache/arrow/issues/10899#issuecomment-895618846 [library datasets](url), use ```pa.memory_map(filename)``` to create a memory mapped pa.table. The file may be a parquet file, a cvs file, or a *.arrow (feather file)? As you sa

[GitHub] [arrow] edponce commented on a change in pull request #10896: ARROW-12959: [C++] Option for is_null(NaN) to evaluate to true

2021-08-09 Thread GitBox
edponce commented on a change in pull request #10896: URL: https://github.com/apache/arrow/pull/10896#discussion_r685586159 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -76,11 +79,32 @@ struct IsInfOperator { struct IsNullOperator { static Status C

[GitHub] [arrow] hengaini2055 commented on issue #10899: Does feather file identify with pyarrow.memory_map(file)?

2021-08-09 Thread GitBox
hengaini2055 commented on issue #10899: URL: https://github.com/apache/arrow/issues/10899#issuecomment-895609491 @lidavidm Thanks! How can I memory-map a Parquet file? I want to gain 'zero copy' from a directory database(pyarrow). In Microsoft Power BI, We must read all dataset to memory a

[GitHub] [arrow-datafusion] NGA-TRAN commented on a change in pull request #808: Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
NGA-TRAN commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r685543182 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -779,8 +553,47 @@ impl GroupedHashAggregateStream { } type AccumulatorItem

[GitHub] [arrow-cookbook] westonpace opened a new issue #25: Add linting to C++ cookbook

2021-08-09 Thread GitBox
westonpace opened a new issue #25: URL: https://github.com/apache/arrow-cookbook/issues/25 PRs should have a clang-format check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-cookbook] westonpace commented on a change in pull request #22: Initial C++ cookbook

2021-08-09 Thread GitBox
westonpace commented on a change in pull request #22: URL: https://github.com/apache/arrow-cookbook/pull/22#discussion_r685544291 ## File path: cpp/code/CMakeLists.txt ## @@ -0,0 +1,47 @@ +cmake_minimum_required(VERSION 3.19) +project(arrow-cookbook) + +set(CMAKE_CXX_STANDARD 1

[GitHub] [arrow-cookbook] westonpace commented on a change in pull request #22: Initial C++ cookbook

2021-08-09 Thread GitBox
westonpace commented on a change in pull request #22: URL: https://github.com/apache/arrow-cookbook/pull/22#discussion_r685543686 ## File path: cpp/CONTRIBUTING.md ## @@ -0,0 +1,184 @@ +Bulding the C++ Cookbook + + +The C++ cookbook combines output from

[GitHub] [arrow] edponce commented on a change in pull request #10896: ARROW-12959: [C++] Option for is_null(NaN) to evaluate to true

2021-08-09 Thread GitBox
edponce commented on a change in pull request #10896: URL: https://github.com/apache/arrow/pull/10896#discussion_r685534294 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -76,7 +79,26 @@ struct IsInfOperator { struct IsNullOperator { static Status Ca

[GitHub] [arrow] cpcloud commented on a change in pull request #10856: [RFC] Arrow Compute Serialized Intermediate Representation draft for discussion

2021-08-09 Thread GitBox
cpcloud commented on a change in pull request #10856: URL: https://github.com/apache/arrow/pull/10856#discussion_r685523322 ## File path: format/ComputeIR.fbs ## @@ -0,0 +1,521 @@ +/// Licensed to the Apache Software Foundation (ASF) under one +/// or more contributor license a

[GitHub] [arrow] cpcloud commented on a change in pull request #10856: [RFC] Arrow Compute Serialized Intermediate Representation draft for discussion

2021-08-09 Thread GitBox
cpcloud commented on a change in pull request #10856: URL: https://github.com/apache/arrow/pull/10856#discussion_r685522985 ## File path: format/ComputeIR.fbs ## @@ -0,0 +1,510 @@ +/// Licensed to the Apache Software Foundation (ASF) under one +/// or more contributor license a

[GitHub] [arrow] lorenzwalthert commented on pull request #10879: ARROW-13562: [R] Styler followups

2021-08-09 Thread GitBox
lorenzwalthert commented on pull request #10879: URL: https://github.com/apache/arrow/pull/10879#issuecomment-895542436 Thanks @nealrichardson. I referenced your comment in an open issue (https://github.com/REditorSupport/languageserver/issues/462), I hope this gets resolved. Regarding tha

[GitHub] [arrow] westonpace closed pull request #10729: ARROW-12513: [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls

2021-08-09 Thread GitBox
westonpace closed pull request #10729: URL: https://github.com/apache/arrow/pull/10729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsu

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895502332 Revision: c84b440e48a9f91513531bfab3ac2f1d8fcd6630 Submitted crossbow builds: [ursacomputing/crossbow @ actions-749](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895501771 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-datafusion] alamb edited a comment on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
alamb edited a comment on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895497999 https://github.com/apache/arrow-datafusion/pull/808 is now ready for review by a wider group (no pun intended) -- This is an automated message from the Apache Git Serv

[GitHub] [arrow-datafusion] alamb commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895497999 https://github.com/apache/arrow-datafusion/pull/808 is now ready for review by a wider group -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] pitrou commented on pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-09 Thread GitBox
pitrou commented on pull request #10877: URL: https://github.com/apache/arrow/pull/10877#issuecomment-895495304 Thank you for contributing @neil-b ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] emkornfield commented on pull request #10729: ARROW-12513: [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls

2021-08-09 Thread GitBox
emkornfield commented on pull request #10729: URL: https://github.com/apache/arrow/pull/10729#issuecomment-895494683 forgot to to comment, these changes looks fine. thanks for tracing down the paths. I don't know why stats for datapagev2 would have been disabled. -- This is an automate

[GitHub] [arrow] pitrou closed pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-09 Thread GitBox
pitrou closed pull request #10877: URL: https://github.com/apache/arrow/pull/10877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] westonpace commented on pull request #10729: ARROW-12513: [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls

2021-08-09 Thread GitBox
westonpace commented on pull request #10729: URL: https://github.com/apache/arrow/pull/10729#issuecomment-895487924 Forgot about this. Rebasing and then merging on green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] westonpace commented on issue #10885: Does arrow intends to support IDL schema like protobuf?

2021-08-09 Thread GitBox
westonpace commented on issue #10885: URL: https://github.com/apache/arrow/issues/10885#issuecomment-895485912 > Thanks, so you mean the IPC feather file format is the output of ArvoStreamWriter, which is a binary file, but just with no data in it right? I'm sorry but I don't know ho

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895485857 Revision: 039e68ce39788af3eab7683c729f0869cc2f388a Submitted crossbow builds: [ursacomputing/crossbow @ actions-748](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895485382 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] jonkeane commented on pull request #10898: ARROW-13345: [C++] Added basic implementation for log to base b

2021-08-09 Thread GitBox
jonkeane commented on pull request #10898: URL: https://github.com/apache/arrow/pull/10898#issuecomment-895477736 That seems pretty uncommon to me. I could contrive a few examples that aren't totally outlandish (say you're scaling/normalizing a value by groups, you might save the log base

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895466709 Revision: 2f70cc4c82d9124118be151461b90fcf18f45ad2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-747](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895466005 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] bkietz commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-09 Thread GitBox
bkietz commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r685444859 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -1734,5 +1734,372 @@ TEST(TestTake, RandomFixedSizeBinary) { TakeRandomTest::

[GitHub] [arrow] pitrou closed pull request #10871: ARROW-10373: [C++] Validate null_count in Array::ValidateFull()

2021-08-09 Thread GitBox
pitrou closed pull request #10871: URL: https://github.com/apache/arrow/pull/10871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] westonpace edited a comment on pull request #10897: ARROW-13580: [C++] quoted_strings_can_be_null only applied to string columns

2021-08-09 Thread GitBox
westonpace edited a comment on pull request #10897: URL: https://github.com/apache/arrow/pull/10897#issuecomment-895450555 Ah, if this is a behavior change then I can update that docstring as well. I'll add, in support of my argument, that we parse quoted non-nulls as integers. For examp

[GitHub] [arrow] westonpace commented on pull request #10897: ARROW-13580: [C++] quoted_strings_can_be_null only applied to string columns

2021-08-09 Thread GitBox
westonpace commented on pull request #10897: URL: https://github.com/apache/arrow/pull/10897#issuecomment-895450555 Ah, if this is a behavior change then I can update that docstring as well. I'll add, in support of my argument, that we parsed quoted non-nulls as integers. For example:

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895445042 Revision: f5c007c2b6b6856e29c3f162d77e5173544cebb2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-746](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-09 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-895444350 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
Dandandan edited a comment on pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#issuecomment-895425959 On the db-benchmark aggregation queries: PR: ``` q1 took 33 ms q2 took 377 ms q3 took 986 ms q4 took 47 ms q5 took 973 ms q7 took 932 m

[GitHub] [arrow-datafusion] Dandandan commented on pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
Dandandan commented on pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#issuecomment-895425959 On the db-benchmark aggregation queries: PR: ``` q1 took 33 ms q2 took 377 ms q3 took 986 ms q4 took 47 ms q5 took 973 ms q7 took 932 ms q1

[GitHub] [arrow] carlosmalt commented on a change in pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

2021-08-09 Thread GitBox
carlosmalt commented on a change in pull request #10431: URL: https://github.com/apache/arrow/pull/10431#discussion_r685406286 ## File path: cpp/src/arrow/dataset/file_skyhook.h ## @@ -0,0 +1,275 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

[GitHub] [arrow-datafusion] Dandandan commented on issue #846: Improve grouping performance by special casing small / fixed size keys

2021-08-09 Thread GitBox
Dandandan commented on issue #846: URL: https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895422907 For the direct indexing idea, there is some more context here for the hash join https://github.com/apache/arrow-datafusion/issues/816 where a similar approach could be used

[GitHub] [arrow] nealrichardson closed pull request #10894: ARROW-13587: [R] Handle --use-LTO override

2021-08-09 Thread GitBox
nealrichardson closed pull request #10894: URL: https://github.com/apache/arrow/pull/10894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow] kkraus14 commented on pull request #10897: ARROW-13580: [C++] quoted_strings_can_be_null only applied to string columns

2021-08-09 Thread GitBox
kkraus14 commented on pull request #10897: URL: https://github.com/apache/arrow/pull/10897#issuecomment-895417384 Agreed that it's beneficial to change the behavior and docstring to allow for treating empty quoted strings as null in the case of numeric columns. Bigger picture, CSVs a

[GitHub] [arrow] pitrou commented on pull request #10898: ARROW-13345: [C++] Added basic implementation for log to base b

2021-08-09 Thread GitBox
pitrou commented on pull request #10898: URL: https://github.com/apache/arrow/pull/10898#issuecomment-895414211 cc @ianmcook @jonkeane for opinions about the log base question. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] rommelDB commented on pull request #10898: ARROW-13345: [C++] Added basic implementation for log to base b

2021-08-09 Thread GitBox
rommelDB commented on pull request #10898: URL: https://github.com/apache/arrow/pull/10898#issuecomment-895413500 > Is it useful for the log base to be given as a `Datum` rather than a function option? Are there use cases where one wants to lookup the log base in a column? @pitrou T

[GitHub] [arrow-datafusion] alamb commented on pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
alamb commented on pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#issuecomment-895409906 I am basically done with this PR. All that remains in my mind is to run some benchmarks and I'll mark it as ready for review -- This is an automated message from the Apac

[GitHub] [arrow] pitrou commented on a change in pull request #10871: ARROW-10373: [C++] Validate null_count in Array::ValidateFull()

2021-08-09 Thread GitBox
pitrou commented on a change in pull request #10871: URL: https://github.com/apache/arrow/pull/10871#discussion_r685391213 ## File path: cpp/src/arrow/array/validate.cc ## @@ -637,6 +638,23 @@ struct ValidateArrayFullImpl { ARROW_EXPORT Status ValidateArrayFull(const ArrayD

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

2021-08-09 Thread GitBox
Dandandan commented on a change in pull request #840: URL: https://github.com/apache/arrow-datafusion/pull/840#discussion_r685390622 ## File path: ballista/rust/core/proto/ballista.proto ## @@ -144,18 +144,19 @@ enum ScalarFunction { TOTIMESTAMP = 24; ARRAY = 25; NULLI

[GitHub] [arrow-datafusion] Dandandan commented on pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

2021-08-09 Thread GitBox
Dandandan commented on pull request #840: URL: https://github.com/apache/arrow-datafusion/pull/840#issuecomment-895408746 Thanks @houqp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow-datafusion] alamb commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895408719 @sundy-li the idea of special casing fixed length types into fixed length keys is a great idea, FWIW. I think we would probably get non trivial performance speedup for those p

[GitHub] [arrow-datafusion] Dandandan merged pull request #840: [ballista] support date_part and date_turnc ser/de, pass tpch 7

2021-08-09 Thread GitBox
Dandandan merged pull request #840: URL: https://github.com/apache/arrow-datafusion/pull/840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: githu

[GitHub] [arrow-datafusion] alamb opened a new issue #846: Improve grouping performance by special casing small / fixed size keys

2021-08-09 Thread GitBox
alamb opened a new issue #846: URL: https://github.com/apache/arrow-datafusion/issues/846 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The improved grouping algorithm on #790 improves grouping performance in general for DataFu

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895407611 > FWIW I would expect `ScalarValue::iter_to_array` to show up in profiles only for queries that had large numbers of groups where the time spent creating the output was a s

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-09 Thread GitBox
Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-895407042 @sundy-li Yes, I think for some types the hashing method might be further specialized to speed up the hashing or to reduce the amount of memory needed for the hash v

[GitHub] [arrow] pitrou closed pull request #10886: ARROW-5244: [C++] Remove experimental marker from some APIs

2021-08-09 Thread GitBox
pitrou closed pull request #10886: URL: https://github.com/apache/arrow/pull/10886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] lidavidm commented on a change in pull request #10890: ARROW-13575: [C++] Add hash_product kernel

2021-08-09 Thread GitBox
lidavidm commented on a change in pull request #10890: URL: https://github.com/apache/arrow/pull/10890#discussion_r685385813 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -189,6 +189,53 @@ TEST(TestBooleanAggregation, Sum) { ResultWith(Datum

[GitHub] [arrow] lidavidm commented on a change in pull request #10890: ARROW-13575: [C++] Add hash_product kernel

2021-08-09 Thread GitBox
lidavidm commented on a change in pull request #10890: URL: https://github.com/apache/arrow/pull/10890#discussion_r685385480 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -133,6 +134,116 @@ Result> MeanInit(KernelContext* ctx, return visitor.Create();

  1   2   3   >