[GitHub] [arrow-rs] tustvold commented on issue #1882: Remove `indexmap` dependency

2022-06-17 Thread GitBox
tustvold commented on issue #1882: URL: https://github.com/apache/arrow-rs/issues/1882#issuecomment-1159378665 A new release has been cut which updates some dependencies, including hashbrown https://github.com/bluss/indexmap/pull/231 -- This is an automated message from the Apache Git Ser

[GitHub] [arrow-rs] tustvold commented on issue #1886: how read/write REPEATED

2022-06-17 Thread GitBox
tustvold commented on issue #1886: URL: https://github.com/apache/arrow-rs/issues/1886#issuecomment-1159371207 Hi, I'm not very familiar with parquet-mr which your example appears to be based on, nor am I hugely knowledgeable about the record APIs for reading parquet, but I'll try to help o

[GitHub] [arrow-rs] tustvold commented on pull request #1890: Add validation to `RecordBatch` for non-nullable fields containing null values

2022-06-17 Thread GitBox
tustvold commented on PR #1890: URL: https://github.com/apache/arrow-rs/pull/1890#issuecomment-1159369988 I think a rebase should clear the test failures related to IPC dictionaries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] save-buffer commented on pull request #13364: ARROW-16756: [C++] Introduce non-owning ArraySpan, ExecSpan data structures and refactor ScalarKernels to use them

2022-06-17 Thread GitBox
save-buffer commented on PR #13364: URL: https://github.com/apache/arrow/pull/13364#issuecomment-1159369450 Yes, that was my experience as well. When looking with Apple's TimeProfiler I saw a ton of 0.1% to 1% stack traces. The execution really is spread out such that it's hard to pinpoint

[GitHub] [arrow-rs] tustvold commented on pull request #1900: Use bit_slice in combine_option_bitmap

2022-06-17 Thread GitBox
tustvold commented on PR #1900: URL: https://github.com/apache/arrow-rs/pull/1900#issuecomment-1159369179 Integration test failures seem unrelated, so getting this one in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow-rs] tustvold merged pull request #1900: Use bit_slice in combine_option_bitmap

2022-06-17 Thread GitBox
tustvold merged PR #1900: URL: https://github.com/apache/arrow-rs/pull/1900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #1899: Final slicing in combine_option_bitmap needs to use bit slices

2022-06-17 Thread GitBox
tustvold closed issue #1899: Final slicing in combine_option_bitmap needs to use bit slices URL: https://github.com/apache/arrow-rs/issues/1899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow-rs] tustvold merged pull request #1893: Correct nullable in read_dictionary

2022-06-17 Thread GitBox
tustvold merged PR #1893: URL: https://github.com/apache/arrow-rs/pull/1893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #1892: Dictionary IPC writer appears to write incorrect schema

2022-06-17 Thread GitBox
tustvold closed issue #1892: Dictionary IPC writer appears to write incorrect schema URL: https://github.com/apache/arrow-rs/issues/1892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] nevi-me commented on issue #1642: MapArray Requires Values Array

2022-06-17 Thread GitBox
nevi-me commented on issue #1642: URL: https://github.com/apache/arrow-rs/issues/1642#issuecomment-1159366876 @tustvold here's Arrow's spec: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L103-L131 ```rust /// A Map is a logical nested type that is represented as

[GitHub] [arrow-rs] tustvold commented on issue #1642: MapArray Requires Values Array

2022-06-17 Thread GitBox
tustvold commented on issue #1642: URL: https://github.com/apache/arrow-rs/issues/1642#issuecomment-1159362939 That is my understanding, but some spelunking in the C++ or Java implementations may be warranted to confirm how they choose to handle it -- This is an automated message from the

[GitHub] [arrow-rs] tustvold commented on issue #1699: MapArrayReader Does Not Understand Nesting

2022-06-17 Thread GitBox
tustvold commented on issue #1699: URL: https://github.com/apache/arrow-rs/issues/1699#issuecomment-1159362165 Hi @frolovdev. I'm afraid I'm away from a computer for the next few days, but taking a cursory look, lines like ``` let entry_len = rep_levels.iter().filter(|level

[GitHub] [arrow] liukun4515 commented on issue #13391: There is no test case using the `2.0.0-compression` test file

2022-06-17 Thread GitBox
liukun4515 commented on issue #13391: URL: https://github.com/apache/arrow/issues/13391#issuecomment-1159354832 > They're tested by the integration suite, e.g. see the output of this run: https://github.com/apache/arrow/runs/6927219399?check_suite_focus=true#step:7:11104 thank you and

[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

2022-06-17 Thread GitBox
marsupialtail commented on code in PR #13385: URL: https://github.com/apache/arrow/pull/13385#discussion_r900691273 ## cpp/src/arrow/filesystem/s3fs.h: ## @@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options { /// the region (environment variables, configuration profile, EC2 met

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2741: improve: supports user-defined `scale_factor` for dbgen

2022-06-17 Thread GitBox
codecov-commenter commented on PR #2741: URL: https://github.com/apache/arrow-datafusion/pull/2741#issuecomment-1159351402 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2741?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] westonpace merged pull request #13252: ARROW-16677: [C++] Support nesting of function registries

2022-06-17 Thread GitBox
westonpace merged PR #13252: URL: https://github.com/apache/arrow/pull/13252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[GitHub] [arrow] westonpace commented on pull request #13364: ARROW-16756: [C++] Introduce non-owning ArraySpan, ExecSpan data structures and refactor ScalarKernels to use them

2022-06-17 Thread GitBox
westonpace commented on PR #13364: URL: https://github.com/apache/arrow/pull/13364#issuecomment-1159348973 I spent some time playing around with the `ExecuteScalarExpressionOverhead/complex_expression` benchmark today to see if this PR made much change there. I didn't see too much change (

[GitHub] [arrow-datafusion] xudong963 merged pull request #2745: MINOR: Improve unsupported data type error message

2022-06-17 Thread GitBox
xudong963 merged PR #2745: URL: https://github.com/apache/arrow-datafusion/pull/2745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] xudong963 commented on a diff in pull request #2741: improve: supports user-defined `scale_factor` for dbgen

2022-06-17 Thread GitBox
xudong963 commented on code in PR #2741: URL: https://github.com/apache/arrow-datafusion/pull/2741#discussion_r900687864 ## benchmarks/tpch-gen.sh: ## @@ -21,7 +21,12 @@ pushd .. . ./dev/build-set-env.sh popd -docker build -t datafusion-tpchgen:$DATAFUSION_VERSION -f tpchgen.

[GitHub] [arrow-datafusion] xudong963 commented on a diff in pull request #2741: improve: supports user-defined `scale_factor` for dbgen

2022-06-17 Thread GitBox
xudong963 commented on code in PR #2741: URL: https://github.com/apache/arrow-datafusion/pull/2741#discussion_r900635837 ## benchmarks/tpch-gen.sh: ## @@ -21,7 +21,12 @@ pushd .. . ./dev/build-set-env.sh popd -docker build -t datafusion-tpchgen:$DATAFUSION_VERSION -f tpchgen.

[GitHub] [arrow] wesm commented on a diff in pull request #13398: ARROW-16824: [C++] Migrate VectorKernels to use ExecSpan, split out ChunkedArray execution

2022-06-17 Thread GitBox
wesm commented on code in PR #13398: URL: https://github.com/apache/arrow/pull/13398#discussion_r900630977 ## cpp/src/arrow/compute/row/grouper.cc: ## @@ -119,7 +125,13 @@ struct GrouperImpl : Grouper { } for (int i = 0; i < batch.num_values(); ++i) { - RETURN_N

[GitHub] [arrow] wesm opened a new pull request, #13398: ARROW-16824: [C++] Migrate VectorKernels to use ExecSpan, split out ChunkedArray execution

2022-06-17 Thread GitBox
wesm opened a new pull request, #13398: URL: https://github.com/apache/arrow/pull/13398 This is mostly mechanical refactoring. Since many VectorKernels support being passed in a ChunkedArray, I separated the `ExecSpan` code path (which does not support chunked arrays) from a separate `Vecto

[GitHub] [arrow] github-actions[bot] commented on pull request #13398: ARROW-16824: [C++] Migrate VectorKernels to use ExecSpan, split out ChunkedArray execution

2022-06-17 Thread GitBox
github-actions[bot] commented on PR #13398: URL: https://github.com/apache/arrow/pull/13398#issuecomment-1159325954 https://issues.apache.org/jira/browse/ARROW-16824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159285230 Yes, definitely. I'm just referring to the _implementation_ of e.g. `f(scalar) -> scalar)` or `g(scalar, scalar) -> scalar` — the ability to perform these operations will remain but the impl

[GitHub] [arrow] westonpace commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
westonpace commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159283611 I think I see your point. Is the code path for `array x scalar` remaining (e.g. `add(field_ref("x"), 7)`)? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] dongjoon-hyun commented on pull request #13392: ARROW-16848: [C++][Java] Update ORC to 1.7.5

2022-06-17 Thread GitBox
dongjoon-hyun commented on PR #13392: URL: https://github.com/apache/arrow/pull/13392#issuecomment-1159281615 Apache sites are recovered. Could you rebase this PR once more, @williamhyun ? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow-rs] frolovdev opened a new pull request, #1904: not panic

2022-06-17 Thread GitBox
frolovdev opened a new pull request, #1904: URL: https://github.com/apache/arrow-rs/pull/1904 # Which issue does this PR close? Closes #1390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] rtpsw commented on pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-17 Thread GitBox
rtpsw commented on PR #13375: URL: https://github.com/apache/arrow/pull/13375#issuecomment-1159218051 > I skipped over the changes to nested function registries since I already reviewed those (I think) in #13252 . I think there are a few additions here, so I'll try to rebase to make t

[GitHub] [arrow] rtpsw commented on pull request #13252: ARROW-16677: [C++] Support nesting of function registries

2022-06-17 Thread GitBox
rtpsw commented on PR #13252: URL: https://github.com/apache/arrow/pull/13252#issuecomment-1159215617 This can get pushed and I'll handle the merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] github-actions[bot] commented on pull request #13397: ARROW-16444: [R] Implement user-defined scalar functions in R bindings

2022-06-17 Thread GitBox
github-actions[bot] commented on PR #13397: URL: https://github.com/apache/arrow/pull/13397#issuecomment-1159193573 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13397: ARROW-16444: [R] Implement user-defined scalar functions in R bindings

2022-06-17 Thread GitBox
github-actions[bot] commented on PR #13397: URL: https://github.com/apache/arrow/pull/13397#issuecomment-1159193559 https://issues.apache.org/jira/browse/ARROW-16444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2746: Support dates in hash join

2022-06-17 Thread GitBox
codecov-commenter commented on PR #2746: URL: https://github.com/apache/arrow-datafusion/pull/2746#issuecomment-1159190627 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2746?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-rs] alamb commented on pull request #1879: Split up arrow::array::builder module (#1843)

2022-06-17 Thread GitBox
alamb commented on PR #1879: URL: https://github.com/apache/arrow-rs/pull/1879#issuecomment-1159185058 Thanks again @DaltonModlin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow-rs] alamb merged pull request #1879: Split up arrow::array::builder module (#1843)

2022-06-17 Thread GitBox
alamb merged PR #1879: URL: https://github.com/apache/arrow-rs/pull/1879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb closed issue #1843: Split up arrow::array::builder module

2022-06-17 Thread GitBox
alamb closed issue #1843: Split up arrow::array::builder module URL: https://github.com/apache/arrow-rs/issues/1843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [arrow] lidavidm commented on pull request #12868: ARROW-15130: [Docs] Add glossary

2022-06-17 Thread GitBox
lidavidm commented on PR #12868: URL: https://github.com/apache/arrow/pull/12868#issuecomment-1159179680 Any more comments here? I think we can keep iterating on the documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2746: Support dates hash join

2022-06-17 Thread GitBox
andygrove opened a new pull request, #2746: URL: https://github.com/apache/arrow-datafusion/pull/2746 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/2744 # Rationale for this change I want to run a DISTINCT query wi

[GitHub] [arrow] westonpace commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-17 Thread GitBox
westonpace commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r900418662 ## cpp/src/arrow/engine/substrait/relation_internal.cc: ## @@ -116,7 +119,7 @@ Result FromProto(const substrait::Rel& rel, } else { return Status

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2745: MINOR: Improve unsupported data type error message

2022-06-17 Thread GitBox
andygrove opened a new pull request, #2745: URL: https://github.com/apache/arrow-datafusion/pull/2745 # Which issue does this PR close? N/A # Rationale for this change If I get an error telling me a data type is not supported then I would like to know which

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2744: Unsupported data type in hasher: Date32

2022-06-17 Thread GitBox
andygrove opened a new issue, #2744: URL: https://github.com/apache/arrow-datafusion/issues/2744 **Describe the bug** I am trying to run a DISTINCT query with a Date32 column **To Reproduce** Run a DISTINCT query with a Date32 column **Expected behavior** It should work

[GitHub] [arrow-rs] frolovdev commented on issue #1699: MapArrayReader Does Not Understand Nesting

2022-06-17 Thread GitBox
frolovdev commented on issue #1699: URL: https://github.com/apache/arrow-rs/issues/1699#issuecomment-1159147393 @tustvold maybe I get your idea wrong but I can't reproduce it in any ways ``` message table { repeated group my_group_map { REQUIRED BYT

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1879: Split up arrow::array::builder module (#1843)

2022-06-17 Thread GitBox
codecov-commenter commented on PR #1879: URL: https://github.com/apache/arrow-rs/pull/1879#issuecomment-1159144531 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1879?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159136158 > An ExecBatch with all scalars does not necessarily have a length of 1. I think we're talking about different things — many of the ScalarKernel implementations have two implementation

[GitHub] [arrow-rs] alamb closed issue #1902: `dynamic_types` example does not print the projection

2022-06-17 Thread GitBox
alamb closed issue #1902: `dynamic_types` example does not print the projection URL: https://github.com/apache/arrow-rs/issues/1902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow-rs] alamb merged pull request #1903: Closes #1902: Print the original and projected RecordBatch in dynamic_types example

2022-06-17 Thread GitBox
alamb merged PR #1903: URL: https://github.com/apache/arrow-rs/pull/1903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb commented on pull request #1903: Closes #1902: Print the original and projected RecordBatch in dynamic_types example

2022-06-17 Thread GitBox
alamb commented on PR #1903: URL: https://github.com/apache/arrow-rs/pull/1903#issuecomment-1159130878 Thanks @martin-g and @paddyhoran ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] westonpace commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
westonpace commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159118738 > In the Acero execution engine, vector functions won't be supported in expressions aside from window functions (I guess). I'm inclined to simply disable the scalar input path here sin

[GitHub] [arrow] westonpace commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
westonpace commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159113163 > In [ARROW-16577](https://issues.apache.org/jira/browse/ARROW-16577) (which I'm going to tackle within the next week hopefully), I'm going to remove the all-scalar input path from all

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159111651 > I suppose, if the kernel functions are going to keep having scalars without length then the correct thing to do would be to always output either a scalar or an array of length 1. In

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159110677 In the Acero execution engine, vector functions won't be supported in expressions aside from window functions (I guess). I'm inclined to simply disable the scalar input path here since it is

[GitHub] [arrow] westonpace commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
westonpace commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159103757 I suppose, if the kernel functions are going to keep having scalars without length then the correct thing to do would be to always output either a scalar or an array of length 1. --

[GitHub] [arrow] westonpace commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
westonpace commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159097426 In the exec plan / engine (e.g. everywhere using ExecBatch), scalar columns have length (part of the reason I'm a fan of treating scalars everywhere as RLE encoded arrays is to easily

[GitHub] [arrow] arjunsr1 opened a new issue, #13396: Is the Arrow::Table.merge function in a working state?

2022-06-17 Thread GitBox
arjunsr1 opened a new issue, #13396: URL: https://github.com/apache/arrow/issues/13396 I'm trying to use the merge function in table.rb (Line 358) and it's not giving me the functionality I am expecting. From what I interpreted, the function should take the table that is passed in as a para

[GitHub] [arrow-rs] viirya merged pull request #1894: Minor: Add examples to docstring for `weekday`

2022-06-17 Thread GitBox
viirya merged PR #1894: URL: https://github.com/apache/arrow-rs/pull/1894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow] willbowditch commented on issue #12653: Conversion from one dataset to another that will not fit in memory?

2022-06-17 Thread GitBox
willbowditch commented on issue #12653: URL: https://github.com/apache/arrow/issues/12653#issuecomment-1159030345 Finding the same thing in `pyarrow 8.0.0` converting from a CSV to Parquet - I've tried various batch sizes on the scanner and various min/max rows/groups on the writer.

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1893: Correct nullable in read_dictionary

2022-06-17 Thread GitBox
viirya commented on code in PR #1893: URL: https://github.com/apache/arrow-rs/pull/1893#discussion_r900294799 ## arrow/src/ipc/reader.rs: ## @@ -702,7 +702,11 @@ pub fn read_dictionary( DataType::Dictionary(_, ref value_type) => { // Make a fake schema for

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1893: Correct nullable in read_dictionary

2022-06-17 Thread GitBox
viirya commented on code in PR #1893: URL: https://github.com/apache/arrow-rs/pull/1893#discussion_r900294799 ## arrow/src/ipc/reader.rs: ## @@ -702,7 +702,11 @@ pub fn read_dictionary( DataType::Dictionary(_, ref value_type) => { // Make a fake schema for

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159026620 (it's also unclear to me that supporting scalar inputs to this function is useful, but that's a separate question) -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] wesm commented on pull request #12460: ARROW-13530: [C++] Implement cumulative sum compute function

2022-06-17 Thread GitBox
wesm commented on PR #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1159026036 I'm refactoring all the vector kernels and I observed that the behavior of this kernel is inconsistent between scalar and array inputs: A scalar null returns an array of length with 0

[GitHub] [arrow] westonpace commented on a diff in pull request #13390: ARROW-16424: [C++] Update uri_path parsing in FromProto

2022-06-17 Thread GitBox
westonpace commented on code in PR #13390: URL: https://github.com/apache/arrow/pull/13390#discussion_r900283284 ## cpp/src/arrow/engine/substrait/relation_internal.cc: ## @@ -106,25 +106,37 @@ Result FromProto(const substrait::Rel& rel, path = item.uri_path_glob();

[GitHub] [arrow] westonpace commented on a diff in pull request #13344: ARROW-16686: [C++] Use shared_ptr with FunctionOptions

2022-06-17 Thread GitBox
westonpace commented on code in PR #13344: URL: https://github.com/apache/arrow/pull/13344#discussion_r900281049 ## python/pyarrow/_compute.pyx: ## @@ -2067,16 +2067,17 @@ def _group_by(args, keys, aggregations): vector[CAggregate] c_aggregations CDatum result

[GitHub] [arrow] lidavidm commented on a diff in pull request #13390: ARROW-16424: [C++] Update uri_path parsing in FromProto

2022-06-17 Thread GitBox
lidavidm commented on code in PR #13390: URL: https://github.com/apache/arrow/pull/13390#discussion_r900280168 ## cpp/src/arrow/engine/substrait/relation_internal.cc: ## @@ -106,25 +106,37 @@ Result FromProto(const substrait::Rel& rel, path = item.uri_path_glob();

[GitHub] [arrow] westonpace commented on a diff in pull request #13390: ARROW-16424: [C++] Update uri_path parsing in FromProto

2022-06-17 Thread GitBox
westonpace commented on code in PR #13390: URL: https://github.com/apache/arrow/pull/13390#discussion_r900279339 ## cpp/src/arrow/engine/substrait/relation_internal.cc: ## @@ -106,25 +106,37 @@ Result FromProto(const substrait::Rel& rel, path = item.uri_path_glob();

[GitHub] [arrow] marsupialtail commented on a diff in pull request #13385: ARROW-16521 [C++][R] Configure curl timeout policy for S3

2022-06-17 Thread GitBox
marsupialtail commented on code in PR #13385: URL: https://github.com/apache/arrow/pull/13385#discussion_r900266915 ## cpp/src/arrow/filesystem/s3fs.h: ## @@ -102,6 +102,8 @@ struct ARROW_EXPORT S3Options { /// the region (environment variables, configuration profile, EC2 met

[GitHub] [arrow] drin commented on pull request #13383: ARROW-16769: [C++] Add Warn() function to Status

2022-06-17 Thread GitBox
drin commented on PR #13383: URL: https://github.com/apache/arrow/pull/13383#issuecomment-1158969053 yep, I think so! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1903: Closes #1902: Print the original and projected RecordBatch in dynamic_types example

2022-06-17 Thread GitBox
codecov-commenter commented on PR #1903: URL: https://github.com/apache/arrow-rs/pull/1903#issuecomment-1158951662 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1903?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow] zagto commented on a diff in pull request #13333: ARROW-16773: [Docs][Format] Document Run-Length encoding in Arrow columnar format

2022-06-17 Thread GitBox
zagto commented on code in PR #1: URL: https://github.com/apache/arrow/pull/1#discussion_r900191747 ## docs/source/format/Columnar.rst: ## @@ -765,6 +765,65 @@ application. We discuss dictionary encoding as it relates to serialization further below. +.. _run-length-e

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2743: Discord invite link in communication page has expired

2022-06-17 Thread GitBox
andygrove opened a new issue, #2743: URL: https://github.com/apache/arrow-datafusion/issues/2743 **Describe the bug** The discord link in https://arrow.apache.org/datafusion/community/communication.html#slack-and-discord has expired Perhaps we should just ask people to email dev@

[GitHub] [arrow-datafusion] andygrove commented on issue #2709: Updating arrow2 branch

2022-06-17 Thread GitBox
andygrove commented on issue #2709: URL: https://github.com/apache/arrow-datafusion/issues/2709#issuecomment-1158922350 > The main issue a few months ago was that the datafusion codebase was not split like it is now, so it created conflicts in the various places where it wasn't possible to

[GitHub] [arrow-datafusion] andygrove commented on issue #2709: Updating arrow2 branch

2022-06-17 Thread GitBox
andygrove commented on issue #2709: URL: https://github.com/apache/arrow-datafusion/issues/2709#issuecomment-1158912362 > I do think there is broad agreement that the governance model of arrow2 (benign dictator) is not suitable for all users for a variety of reasons. I agree. Althoug

[GitHub] [arrow-rs] martin-g opened a new pull request, #1903: Closes #1902: Print the original and projected RecordBatch in dynamic_types example

2022-06-17 Thread GitBox
martin-g opened a new pull request, #1903: URL: https://github.com/apache/arrow-rs/pull/1903 # Which issue does this PR close? Closes #1902. # Rationale for this change The user can get a feeling of the data. # What changes are included in this PR?

[GitHub] [arrow-rs] martin-g opened a new issue, #1902: `dynamic_types` example does not print the projection

2022-06-17 Thread GitBox
martin-g opened a new issue, #1902: URL: https://github.com/apache/arrow-rs/issues/1902 **Describe the bug** https://github.com/apache/arrow-rs/blob/master/arrow/examples/dynamic_types.rs shows how to use and project `RecordBatch`es. I think it would be a slightly more useful if i

[GitHub] [arrow] lidavidm commented on pull request #13383: ARROW-16769: [C++] Add Warn() function to Status

2022-06-17 Thread GitBox
lidavidm commented on PR #13383: URL: https://github.com/apache/arrow/pull/13383#issuecomment-1158893351 Looks like everything's passing. Is this ready @drin? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow-rs] martin-g closed pull request #1895: Issue #1876: Explicitly declare the used features for each dependency in parquet

2022-06-17 Thread GitBox
martin-g closed pull request #1895: Issue #1876: Explicitly declare the used features for each dependency in parquet URL: https://github.com/apache/arrow-rs/pull/1895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow-rs] martin-g closed pull request #1897: Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test

2022-06-17 Thread GitBox
martin-g closed pull request #1897: Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test URL: https://github.com/apache/arrow-rs/pull/1897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] HaoYang670 commented on issue #1901: `log2(0)` panicked at `'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5`

2022-06-17 Thread GitBox
HaoYang670 commented on issue #1901: URL: https://github.com/apache/arrow-rs/issues/1901#issuecomment-1158847977 Ha! Maybe we could remove this function directly because we already have `num_required_bits` which provide same functionality. https://github.com/apache/arrow-rs/blob/master/

[GitHub] [arrow-rs] HaoYang670 commented on issue #1901: `log2(0)` panicked at `'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5`

2022-06-17 Thread GitBox
HaoYang670 commented on issue #1901: URL: https://github.com/apache/arrow-rs/issues/1901#issuecomment-1158844111 Thank you @jhorstmann. Agree with you on using `leading_zeros`, and this is also how the std library implements: ```rust #[inline] pub

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1900: Use bit_slice in combine_option_bitmap

2022-06-17 Thread GitBox
codecov-commenter commented on PR #1900: URL: https://github.com/apache/arrow-rs/pull/1900#issuecomment-1158841833 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1900?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] jhorstmann commented on issue #1901: `log2(0)` panicked at `'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5`

2022-06-17 Thread GitBox
jhorstmann commented on issue #1901: URL: https://github.com/apache/arrow-rs/issues/1901#issuecomment-1158840184 Since this seems to be only used for getting the width of bitpacked data, a faster way to calculate that is using the `leading_zeros` function. I contributed a similar improvemen

[GitHub] [arrow] cyb70289 commented on pull request #13394: ARROW-16850: [C++] Copy CSV data field and end chars separately

2022-06-17 Thread GitBox
cyb70289 commented on PR #13394: URL: https://github.com/apache/arrow/pull/13394#issuecomment-1158838873 From conbench, csv writer benchmark improvement on `i9` is about 8% to 25%. `m1` improvement is small (5% to 10%). -- This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] github-actions[bot] commented on pull request #13395: [Gandiva][C++] Add REGEXP_LIKE function

2022-06-17 Thread GitBox
github-actions[bot] commented on PR #13395: URL: https://github.com/apache/arrow/pull/13395#issuecomment-1158835036 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[GitHub] [arrow-rs] HaoYang670 opened a new issue, #1901: `log2(0)` panicked at `'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5`

2022-06-17 Thread GitBox
HaoYang670 opened a new issue, #1901: URL: https://github.com/apache/arrow-rs/issues/1901 **Describe the bug** https://github.com/apache/arrow-rs/blob/master/parquet/src/util/bit_util.rs#L142-L155 **To Reproduce** assert_eq!(log2(0), 0); **Expected behavior** Alter 1:

[GitHub] [arrow-rs] jhorstmann opened a new pull request, #1900: Use bit_slice in combine_option_bitmap

2022-06-17 Thread GitBox
jhorstmann opened a new pull request, #1900: URL: https://github.com/apache/arrow-rs/pull/1900 # Which issue does this PR close? Closes #1899. # Rationale for this change The buffers are storing validity bits and there the offset has to be interpreted as number o

[GitHub] [arrow] lidavidm commented on a diff in pull request #13109: ARROW-15365: [Python] Expose full cast options in the pyarrow.compute.cast function

2022-06-17 Thread GitBox
lidavidm commented on code in PR #13109: URL: https://github.com/apache/arrow/pull/13109#discussion_r90006 ## python/pyarrow/tests/test_compute.py: ## @@ -1702,13 +1702,37 @@ def test_logical(): def test_cast(): +arr = pa.array([1, 2, 3, 4], type='int64') +optio

[GitHub] [arrow] assignUser closed pull request #13240: ARROW-16406: [Docs][R] Update documentation with new nightly location

2022-06-17 Thread GitBox
assignUser closed pull request #13240: ARROW-16406: [Docs][R] Update documentation with new nightly location URL: https://github.com/apache/arrow/pull/13240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] assignUser commented on pull request #13240: ARROW-16406: [Docs][R] Update documentation with new nightly location

2022-06-17 Thread GitBox
assignUser commented on PR #13240: URL: https://github.com/apache/arrow/pull/13240#issuecomment-1158810321 Task merged with #13241/ARROW-16405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] lidavidm commented on issue #13391: There is no test case using the `2.0.0-compression` test file

2022-06-17 Thread GitBox
lidavidm commented on issue #13391: URL: https://github.com/apache/arrow/issues/13391#issuecomment-1158809150 They're tested by the integration suite, e.g. see the output of this run: https://github.com/apache/arrow/runs/6927219399?check_suite_focus=true#step:7:11104 -- This is an automat

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1898: Issue #1876: Explicitly declare the used features for each dependency in integration_testing

2022-06-17 Thread GitBox
codecov-commenter commented on PR #1898: URL: https://github.com/apache/arrow-rs/pull/1898#issuecomment-1158807754 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1898?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] jhorstmann opened a new issue, #1899: Final slicing in combine_option_bitmap needs to use bit slices

2022-06-17 Thread GitBox
jhorstmann opened a new issue, #1899: URL: https://github.com/apache/arrow-rs/issues/1899 **Describe the bug** In [`combine_option_bitmap`](https://github.com/apache/arrow-rs/blob/fc4044f35d4aa67e706c6d3f61a9f24bab5346be/arrow/src/compute/util.rs#L58) the buffer is sliced using `Buff

[GitHub] [arrow-rs] frolovdev commented on issue #1642: MapArray Requires Values Array

2022-06-17 Thread GitBox
frolovdev commented on issue #1642: URL: https://github.com/apache/arrow-rs/issues/1642#issuecomment-1158804062 @tustvold So the basic idea is to avoid the obligation of values in the map. According to ``` The value field encodes the map's value type and repetition. This field

[GitHub] [arrow] lidavidm commented on a diff in pull request #13390: ARROW-16424: [C++] Update uri_path parsing in FromProto

2022-06-17 Thread GitBox
lidavidm commented on code in PR #13390: URL: https://github.com/apache/arrow/pull/13390#discussion_r900050308 ## cpp/src/arrow/util/uri.h: ## @@ -68,6 +68,8 @@ class ARROW_EXPORT Uri { /// The URI path component. std::string path() const; + std::string extension() cons

[GitHub] [arrow-rs] martin-g opened a new pull request, #1898: Issue #1876: Explicitly declare the used features for each dependency in integration_testing

2022-06-17 Thread GitBox
martin-g opened a new pull request, #1898: URL: https://github.com/apache/arrow-rs/pull/1898 # Which issue does this PR close? Closes #1876. This is the last PR for https://github.com/apache/arrow-rs/issues/1876. It changes just integration_testing/Cargo.toml. The PR does **no

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1897: Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test

2022-06-17 Thread GitBox
codecov-commenter commented on PR #1897: URL: https://github.com/apache/arrow-rs/pull/1897#issuecomment-1158790046 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1897?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] martin-g opened a new pull request, #1897: Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test

2022-06-17 Thread GitBox
martin-g opened a new pull request, #1897: URL: https://github.com/apache/arrow-rs/pull/1897 # Which issue does this PR close? This is a PR for https://github.com/apache/arrow-rs/issues/1876. It changes just parquet_derive_test/Cargo.toml. The PR does not upgrade the dependencies

[GitHub] [arrow-rs] martin-g opened a new pull request, #1896: Issue #1876: Explicitly declare the used features for each dependency in parquet_derive

2022-06-17 Thread GitBox
martin-g opened a new pull request, #1896: URL: https://github.com/apache/arrow-rs/pull/1896 # Which issue does this PR close? This is a PR for https://github.com/apache/arrow-rs/issues/1876. It changes just parquet_derive/Cargo.toml. The PR does **not** upgrade the dependencies!

[GitHub] [arrow-rs] martin-g opened a new pull request, #1895: Issue #1876: Explicitly declare the used features for each dependency in parquet

2022-06-17 Thread GitBox
martin-g opened a new pull request, #1895: URL: https://github.com/apache/arrow-rs/pull/1895 Declare that parquet module uses rand's std and std_rng features # Which issue does this PR close? This is the third PR for https://github.com/apache/arrow-rs/issues/1876. It changes

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1891: Feature add weekday temporal kernel

2022-06-17 Thread GitBox
alamb commented on code in PR #1891: URL: https://github.com/apache/arrow-rs/pull/1891#discussion_r99399 ## arrow/src/compute/kernels/temporal.rs: ## @@ -211,6 +211,34 @@ where Ok(b.finish()) } +/// Extracts the day of week of a given temporal array as an array of in

[GitHub] [arrow-rs] alamb opened a new pull request, #1894: Minor: Add examples to docstring for `weekday`

2022-06-17 Thread GitBox
alamb opened a new pull request, #1894: URL: https://github.com/apache/arrow-rs/pull/1894 Minor follow on to https://github.com/apache/arrow-rs/pull/1891 from @nl5887 -- add an example in the docstring of `weekday` kernel -- This is an automated message from the Apache Git Service. To res

[GitHub] [arrow-rs] alamb merged pull request #1891: Feature add weekday temporal kernel

2022-06-17 Thread GitBox
alamb merged PR #1891: URL: https://github.com/apache/arrow-rs/pull/1891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1891: Feature add weekday temporal kernel

2022-06-17 Thread GitBox
alamb commented on code in PR #1891: URL: https://github.com/apache/arrow-rs/pull/1891#discussion_r96406 ## arrow/src/compute/kernels/temporal.rs: ## @@ -211,6 +211,34 @@ where Ok(b.finish()) } +/// Extracts the day of week of a given temporal array as an array of in

  1   2   >