[GitHub] [arrow-rs] velvia commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox
velvia commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880418323 This is definitely a good feature. It seems to me that in most cases, that time/duration manipulations could be based on existing arithmetic kernels, that'd be one way to go abou

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #729: provide more details on required .parquet file extension

2021-07-14 Thread GitBox
Jimexist opened a new pull request #729: URL: https://github.com/apache/arrow-datafusion/pull/729 # Which issue does this PR close? provide more details on required .parquet file extension Closes #. # Rationale for this change provide more details on required .pa

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #728: implement FromStr for FileType

2021-07-14 Thread GitBox
Jimexist opened a new pull request #728: URL: https://github.com/apache/arrow-datafusion/pull/728 # Which issue does this PR close? implement FromStr for FileType Closes #. # Rationale for this change implement FromStr for FileType so that it can accept lower cas

[GitHub] [arrow-rs] codecov-commenter commented on pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox
codecov-commenter commented on pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-880382473 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/521?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+A

[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox
bjchambers commented on a change in pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670118830 ## File path: arrow/src/compute/kernels/boolean.rs ## @@ -458,101 +458,113 @@ pub fn is_not_null(input: &Array) -> Result { Ok(BooleanArray::from(

[GitHub] [arrow-rs] bjchambers commented on a change in pull request #521: Change `nullif` to support arbitrary arrays

2021-07-14 Thread GitBox
bjchambers commented on a change in pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#discussion_r670114367 ## File path: arrow/src/compute/kernels/boolean.rs ## @@ -1148,12 +1222,215 @@ mod tests { let comp = comp.slice(2, 3); // Some(false), None, S

[GitHub] [arrow-rs] bjchambers opened a new pull request #555: failing test

2021-07-14 Thread GitBox
bjchambers opened a new pull request #555: URL: https://github.com/apache/arrow-rs/pull/555 # Which issue does this PR close? Closes #514. # Rationale for this change This was caused by a problem in struct slices and equality. It was fixed by a separate PR, but it seems

[GitHub] [arrow-rs] bjchambers commented on issue #514: Struct equality on slices has false negatives

2021-07-14 Thread GitBox
bjchambers commented on issue #514: URL: https://github.com/apache/arrow-rs/issues/514#issuecomment-880371136 I think that #389 seems to have fixed this. I'll make a PR with the corresponding test case to prevent regression. -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow-experimental-rs-parquet2] jorgecarleitao commented on pull request #1: Adds parquet2

2021-07-14 Thread GitBox
jorgecarleitao commented on pull request #1: URL: https://github.com/apache/arrow-experimental-rs-parquet2/pull/1#issuecomment-880369806 @jhorstmann , did you submitted a CLA? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles

2021-07-14 Thread GitBox
domoritz commented on a change in pull request #10698: URL: https://github.com/apache/arrow/pull/10698#discussion_r670105369 ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPack

[GitHub] [arrow] domoritz commented on a change in pull request #10698: ARROW-13303: [JS] Revise bundles

2021-07-14 Thread GitBox
domoritz commented on a change in pull request #10698: URL: https://github.com/apache/arrow/pull/10698#discussion_r670105202 ## File path: js/gulp/package-task.js ## @@ -46,14 +46,19 @@ const createMainPackageJson = (target, format) => (orig) => ({ ...createTypeScriptPack

[GitHub] [arrow] cyb70289 commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages

2021-07-14 Thread GitBox
cyb70289 commented on a change in pull request #10663: URL: https://github.com/apache/arrow/pull/10663#discussion_r670100132 ## File path: cpp/src/arrow/flight/test_util.cc ## @@ -616,6 +640,22 @@ Status ExampleLargeBatches(BatchVector* out) { return Status::OK(); } +arro

[GitHub] [arrow] domoritz closed pull request #10695: ARROW-13299: [JS] Upgrade ix and rxjs

2021-07-14 Thread GitBox
domoritz closed pull request #10695: URL: https://github.com/apache/arrow/pull/10695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox
NinaPeng commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880357237 > > @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? > > @NinaPeng The change looks reasonable to me. I want to merg

[GitHub] [arrow] liyafan82 commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox
liyafan82 commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880355669 > @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? @NinaPeng The change looks reasonable to me. I want to merge it

[GitHub] [arrow] cyb70289 commented on pull request #10679: ARROW-13170 [C++] Reducing branching in compute/kernels/vector_selection.cc

2021-07-14 Thread GitBox
cyb70289 commented on pull request #10679: URL: https://github.com/apache/arrow/pull/10679#issuecomment-880342643 > @wesm @cyb70289 @bkietz Is there anything else we could do for the low selectivity cases (1% select)? I don't have satisfying suggestions. A possible workaround I gu

[GitHub] [arrow] NinaPeng commented on pull request #10700: ARROW-13306: [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName

2021-07-14 Thread GitBox
NinaPeng commented on pull request #10700: URL: https://github.com/apache/arrow/pull/10700#issuecomment-880340157 @liyafan82 Thanks for your review. anything else need to be fixed? or this pull request can be merged? -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] kou commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox
kou commented on pull request #10614: URL: https://github.com/apache/arrow/pull/10614#issuecomment-880309249 Thanks! I've merged this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] kou closed pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox
kou closed pull request #10614: URL: https://github.com/apache/arrow/pull/10614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-datafusion] andygrove opened a new pull request #727: UnresolvedShuffleExec should represent a single shuffle

2021-07-14 Thread GitBox
andygrove opened a new pull request #727: URL: https://github.com/apache/arrow-datafusion/pull/727 # Which issue does this PR close? Closes #726 # Rationale for this change Small step towards getting shuffle working. # What changes are included in thi

[GitHub] [arrow-datafusion] andygrove opened a new issue #726: Ballista: UnresolvedShuffleExec should only have a single stage_id

2021-07-14 Thread GitBox
andygrove opened a new issue #726: URL: https://github.com/apache/arrow-datafusion/issues/726 **Describe the bug** UnresolvedShuffleExec should represent a single shuffle, not multiple. **To Reproduce** I discovered this while working on the PR to get shuffles working. **

[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10604: ARROW-13190: [C++] [Gandiva] Change behavior of INITCAP function

2021-07-14 Thread GitBox
anthonylouisbsb commented on a change in pull request #10604: URL: https://github.com/apache/arrow/pull/10604#discussion_r670020234 ## File path: cpp/src/gandiva/gdv_function_stubs.cc ## @@ -427,7 +427,8 @@ CAST_VARLEN_TYPE_FROM_NUMERIC(VARBINARY) #undef GDV_FN_CAST_VARCHAR_RE

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669991533 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669991187 ## File path: cpp/src/gandiva/precompiled/time_test.cc ## @@ -839,4 +839,86 @@ TEST(TestTime, TestToTimeNumeric) { EXPECT_EQ(expected_output, to_

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990651 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669990486 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow-datafusion] yordan-pavlov commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox
yordan-pavlov commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-88024 @mcassels thank you for reporting this - good find; in the implementation of the pruning predicate there is an assumption that for a predicate expression `f(v) OP c`, w

[GitHub] [arrow] augustoasilva commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
augustoasilva commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669989632 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow] github-actions[bot] commented on pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type

2021-07-14 Thread GitBox
github-actions[bot] commented on pull request #10721: URL: https://github.com/apache/arrow/pull/10721#issuecomment-880238653 https://issues.apache.org/jira/browse/ARROW-11673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] nirandaperera opened a new pull request #10721: ARROW-11673 - [C++] Casting dictionary type to use different index type

2021-07-14 Thread GitBox
nirandaperera opened a new pull request #10721: URL: https://github.com/apache/arrow/pull/10721 This PR adds casting from one dictionary type to anther dictionary type: ex: ``` dictionary(int8(), int16()) --> dictionary(int32(), int64()) ``` -- This is an automated message fro

[GitHub] [arrow] westonpace closed issue #10699: ModuleNotFoundError: No module named 'pyarrow._orc'

2021-07-14 Thread GitBox
westonpace closed issue #10699: URL: https://github.com/apache/arrow/issues/10699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[GitHub] [arrow-rs] alamb commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox
alamb commented on issue #554: URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880210994 > Thanks for this @alamb, I hadn't noticed that not linking PRs to existing issues affects the changelog. Yes @nevi-me I don't fully understand the intricacies of the changelog

[GitHub] [arrow] kevingurney commented on pull request #10614: ARROW-13100: [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code

2021-07-14 Thread GitBox
kevingurney commented on pull request #10614: URL: https://github.com/apache/arrow/pull/10614#issuecomment-880204002 @kou - we updated the description of the pull request to reflect the latest status. We also discovered a small issue during qualification. If you specified `MATLAB_BU

[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox
lidavidm commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-880197573 Broke up the main function a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox
lidavidm commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880197464 Removed support for toplevel nulls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #719: Optimize min/max queries with table statistics

2021-07-14 Thread GitBox
alamb commented on a change in pull request #719: URL: https://github.com/apache/arrow-datafusion/pull/719#discussion_r669933745 ## File path: datafusion/src/physical_plan/parquet.rs ## @@ -312,22 +431,47 @@ impl ParquetExec { if let Some(x) = &part.statistics.colu

[GitHub] [arrow-datafusion] alamb merged pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox
alamb merged pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow-datafusion] alamb closed issue #554: implement lead and lag with 2nd and 3rd argument

2021-07-14 Thread GitBox
alamb closed issue #554: URL: https://github.com/apache/arrow-datafusion/issues/554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox
alamb commented on a change in pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687#discussion_r669925638 ## File path: datafusion/src/physical_plan/expressions/lead_lag.rs ## @@ -176,6 +240,28 @@ mod tests { .iter() .collect::()

[GitHub] [arrow-rs] alamb closed issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox
alamb closed issue #554: URL: https://github.com/apache/arrow-rs/issues/554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[GitHub] [arrow-rs] nevi-me commented on issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox
nevi-me commented on issue #554: URL: https://github.com/apache/arrow-rs/issues/554#issuecomment-880178885 Thanks for this @alamb, I hadn't noticed that not linking PRs to existing issues affects the changelog. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow-rs] alamb merged pull request #389: make slice work for nested types

2021-07-14 Thread GitBox
alamb merged pull request #389: URL: https://github.com/apache/arrow-rs/pull/389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

[GitHub] [arrow-rs] nevi-me commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox
nevi-me commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880177931 this also relates to #45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] alamb opened a new issue #554: ArrayData::slice() does not work for nested types such as StructArray

2021-07-14 Thread GitBox
alamb opened a new issue #554: URL: https://github.com/apache/arrow-rs/issues/554 **Describe the bug** `ArrayData::slice()` does not work for nested types, because only the `ArrayData::buffers` are updated with the new offset and length. This has caused a lot of issues in the past.

[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox
alamb commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880175659 @Jimexist no one that I know of is taking this on. If you were interested that would be awesome The reference implementation would probably still be postgres in my mind. There i

[GitHub] [arrow] nealrichardson commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox
nealrichardson commented on a change in pull request #10624: URL: https://github.com/apache/arrow/pull/10624#discussion_r669913235 ## File path: r/R/dplyr-functions.R ## @@ -280,6 +284,81 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) { Express

[GitHub] [arrow-datafusion] alamb commented on pull request #716: #699 fix return type conflict when calling builtin math fuctions

2021-07-14 Thread GitBox
alamb commented on pull request #716: URL: https://github.com/apache/arrow-datafusion/pull/716#issuecomment-880173938 Any thoughts @Dandandan or @jorgecarleitao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #716: #699 fix return type conflict when calling builtin math fuctions

2021-07-14 Thread GitBox
alamb commented on a change in pull request #716: URL: https://github.com/apache/arrow-datafusion/pull/716#discussion_r669916709 ## File path: datafusion/src/execution/context.rs ## @@ -2364,6 +2365,75 @@ mod tests { assert_batches_sorted_eq!(expected, &results);

[GitHub] [arrow] bkietz commented on pull request #10636: ARROW-13153: [C++] `parquet_dataset` loses ordering of files in `_metadata`

2021-07-14 Thread GitBox
bkietz commented on pull request #10636: URL: https://github.com/apache/arrow/pull/10636#issuecomment-880169894 @westonpace needs rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] bkietz closed pull request #10629: ARROW-13218: [Doc] Document/clarify conventions for timestamp storage

2021-07-14 Thread GitBox
bkietz closed pull request #10629: URL: https://github.com/apache/arrow/pull/10629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] bkietz closed pull request #10628: ARROW-12364: [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-07-14 Thread GitBox
bkietz closed pull request #10628: URL: https://github.com/apache/arrow/pull/10628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-07-14 Thread GitBox
alamb commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-880147276 Thank you @MichaelBitard for taking the time to report it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] bkietz closed pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox
bkietz closed pull request #10720: URL: https://github.com/apache/arrow/pull/10720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] lidavidm commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox
lidavidm commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880137081 Should be fine. It would certainly trim down the inner loop a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] bkietz commented on pull request #10557: ARROW-13064: [C++] Implement select ('case when') function for fixed-width types

2021-07-14 Thread GitBox
bkietz commented on pull request #10557: URL: https://github.com/apache/arrow/pull/10557#issuecomment-880136525 @lidavidm what would you think about just raising an error for top level nulls? It doesn't seem like a useful case to me -- This is an automated message from the Apache Git Ser

[GitHub] [arrow] bkietz commented on a change in pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox
bkietz commented on a change in pull request #10608: URL: https://github.com/apache/arrow/pull/10608#discussion_r669873043 ## File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc ## @@ -676,7 +677,339 @@ void AddPrimitiveIfElseKernels(const std::shared_ptr& scalar_fun

[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox
thisisnic commented on pull request #10624: URL: https://github.com/apache/arrow/pull/10624#issuecomment-880129596 @nealrichardson - have made some updates; please could you re-review this when you have a chance? Tomorrow I'm going to add in the tests for warnings/errors raised when incor

[GitHub] [arrow] bkietz closed pull request #10606: ARROW-13005: [C++] Add support for take implementation on dense union type

2021-07-14 Thread GitBox
bkietz closed pull request #10606: URL: https://github.com/apache/arrow/pull/10606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] thisisnic commented on pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox
thisisnic commented on pull request #10624: URL: https://github.com/apache/arrow/pull/10624#issuecomment-880113887 This PR now also contains some unrelated styling changes as I ran styler on the files I changed before pushing my changes. -- This is an automated message from the Apache Gi

[GitHub] [arrow] lidavidm commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset

2021-07-14 Thread GitBox
lidavidm commented on a change in pull request #10693: URL: https://github.com/apache/arrow/pull/10693#discussion_r669830370 ## File path: docs/source/python/dataset.rst ## @@ -456,20 +456,163 @@ is materialized as columns when reading the data and can be used for filtering:

[GitHub] [arrow] bkietz commented on a change in pull request #10693: ARROW-13224: [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset

2021-07-14 Thread GitBox
bkietz commented on a change in pull request #10693: URL: https://github.com/apache/arrow/pull/10693#discussion_r669825752 ## File path: docs/source/python/dataset.rst ## @@ -456,20 +456,163 @@ is materialized as columns when reading the data and can be used for filtering:

[GitHub] [arrow-datafusion] alamb commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox
alamb commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880076415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow] github-actions[bot] commented on pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox
github-actions[bot] commented on pull request #10720: URL: https://github.com/apache/arrow/pull/10720#issuecomment-880075978 https://issues.apache.org/jira/browse/ARROW-13341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] bkietz opened a new pull request #10720: ARROW-13341: [C++][Compute] Fix race condition in ScalarAggregateNode

2021-07-14 Thread GitBox
bkietz opened a new pull request #10720: URL: https://github.com/apache/arrow/pull/10720 Multiple threads starting DoConsume would already have incremented `num_received_`, so if one were delayed another might erroneously begin to merge/finalize (leaving invalidated states) -- This is a

[GitHub] [arrow-datafusion] jgoday commented on pull request #687: #554: Lead/lag window function with offset and default value arguments

2021-07-14 Thread GitBox
jgoday commented on pull request #687: URL: https://github.com/apache/arrow-datafusion/pull/687#issuecomment-880075811 > > @Jimexist do you think this PR is ready? Do you need help reviewing ? > > looks okay after rebasing. Hi @Jimexist, just made the rebase from master -- T

[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox
lvheyang commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880059465 @alamb I think the root problem is some of the scalar functions ( such as abs/ sin/ cos/ pow ) are not monotonous. We cannot prune all the rows when `fun(min) < Value` or `f

[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10711: ARROW-13322: [C++][Gandiva] Add from_unixtime hive function to gandiva

2021-07-14 Thread GitBox
anthonylouisbsb commented on a change in pull request #10711: URL: https://github.com/apache/arrow/pull/10711#discussion_r669769114 ## File path: cpp/src/gandiva/precompiled/time.cc ## @@ -841,6 +843,161 @@ gdv_int64 castBIGINT_daytimeinterval(gdv_day_time_interval in) {

[GitHub] [arrow-rs] Jimexist commented on pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox
Jimexist commented on pull request #552: URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880037932 ``` sort 2^10 time: [110.68 us 111.64 us 112.55 us] change: [-14.710% -13.112% -11.406%] (p = 0.00 < 0.05)

[GitHub] [arrow-rs] codecov-commenter commented on pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox
codecov-commenter commented on pull request #552: URL: https://github.com/apache/arrow-rs/pull/552#issuecomment-880036383 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/552?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+A

[GitHub] [arrow] pachadotdev commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
pachadotdev commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669741245 ## File path: dev/archery/archery/crossbow/cli.py ## @@ -233,6 +233,27 @@ def latest_prefix(obj, prefix, fetch): click.echo(latest.branch) +@

[GitHub] [arrow-rs] Jimexist commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

2021-07-14 Thread GitBox
Jimexist commented on issue #527: URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880028583 anyone taking this? also is there any reference implementation in other languages? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-datafusion] lvheyang edited a comment on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox
lvheyang edited a comment on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602 I have reproduced this problem. I found the problem is in datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. In its comment : ```

[GitHub] [arrow-datafusion] lvheyang commented on issue #723: ABS() function in WHERE clause gives unexpected results

2021-07-14 Thread GitBox
lvheyang commented on issue #723: URL: https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880027602 I have reproduced this problem. I found the problem is in datafusion/src/physical_optimizer/pruning.rs, the PruningPredicate. In its comment : ``` /// A

[GitHub] [arrow] ursabot edited a comment on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox
ursabot edited a comment on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122 Benchmark runs are scheduled for baseline = 9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available a

[GitHub] [arrow-rs] Jimexist opened a new issue #553: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox
Jimexist opened a new issue #553: URL: https://github.com/apache/arrow-rs/issues/553 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

[GitHub] [arrow-rs] Jimexist opened a new pull request #552: use sort_unstable_by in primitive sorting

2021-07-14 Thread GitBox
Jimexist opened a new pull request #552: URL: https://github.com/apache/arrow-rs/pull/552 # Which issue does this PR close? use [`sort_unstable_by`](https://doc.rust-lang.org/std/primitive.slice.html#method.sort_unstable_by) in primitive sorting Closes #. # Rationale f

[GitHub] [arrow] kszucs edited a comment on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs edited a comment on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349 > nevermind... a spaces problem, which led to > > ``` > ___ ERROR collecting archery/crossbow/tests/test_reports.py > archery/crossbow/tes

[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880020349 > nevermind... a spaces problem, which led to > > ``` > ___ ERROR collecting archery/crossbow/tests/test_reports.py > archery/crossbow/tests/test

[GitHub] [arrow] kszucs commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880018883 > I got stuck with this > > ``` > archery/crossbow/tests/test_reports.py:20: in > from archery.crossbow.core import yaml > archery/crossbow/__init__.py:19: in

[GitHub] [arrow] rok commented on pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-07-14 Thread GitBox
rok commented on pull request #10476: URL: https://github.com/apache/arrow/pull/10476#issuecomment-880017358 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] jonkeane closed pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-07-14 Thread GitBox
jonkeane closed pull request #10476: URL: https://github.com/apache/arrow/pull/10476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-datafusion] Dandandan commented on issue #725: Global limit isn't really limiting parquet file reads and stops earlier

2021-07-14 Thread GitBox
Dandandan commented on issue #725: URL: https://github.com/apache/arrow-datafusion/issues/725#issuecomment-880011030 I added some form of limit push down to parquet some time ago. Might be that it isn't applied to your dataset somehow? Or maybe getting the metadata / statistics itself m

[GitHub] [arrow] kszucs commented on pull request #10659: ARROW-12122: [Python] Cannot install via pip M1 mac

2021-07-14 Thread GitBox
kszucs commented on pull request #10659: URL: https://github.com/apache/arrow/pull/10659#issuecomment-880010837 @wesm @xhochy could you please verify locally the produced wheels? - [pyarrow-5.0.0.dev471-cp39-cp39-macosx_11_0_arm64.whl](https://github.com/ursacomputing/crossbow/releases/d

[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
pachadotdev commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880009474 nevermind... a spaces problem, which led to ``` ___ ERROR collecting archery/crossbow/tests/test_reports.py archery/crossbow/tests/test_reports.

[GitHub] [arrow] pachadotdev commented on pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
pachadotdev commented on pull request #10650: URL: https://github.com/apache/arrow/pull/10650#issuecomment-880008583 I got stuck with this ``` archery/crossbow/tests/test_reports.py:20: in from archery.crossbow.core import yaml archery/crossbow/__init__.py:19: in fro

[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox
lidavidm commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-880001407 Merged, thanks. This should unblock ARROW-9431 if you do still plan to look at it. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] lidavidm closed pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox
lidavidm closed pull request #10412: URL: https://github.com/apache/arrow/pull/10412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-rs] MichaelBitard commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-07-14 Thread GitBox
MichaelBitard commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-87541 Oops, you are right, sorry. If I generate the sample.parquet with the latest version, it not longer hangs during reading. Thanks for noticing and sorry again! --

[GitHub] [arrow] nirandaperera commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox
nirandaperera commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-879991878 > I think I've addressed all the feedback here. I'm +1 for this! -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669719045 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):

[GitHub] [arrow] ursabot commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox
ursabot commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879986122 Benchmark runs are scheduled for baseline = 9c6d4179fefdf995fd0b940a292b81947fe68035 and contender = e32cf48c8f5f38ed5bbf69eb5d2ea8eda43d2b98. Results will be available as each

[GitHub] [arrow] lidavidm commented on pull request #10608: ARROW-13136: [C++] Add coalesce function

2021-07-14 Thread GitBox
lidavidm commented on pull request #10608: URL: https://github.com/apache/arrow/pull/10608#issuecomment-879985810 @ursabot please benchmark lang=C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] lidavidm commented on pull request #10412: ARROW-9430: [C++] Implement replace_with_mask kernel

2021-07-14 Thread GitBox
lidavidm commented on pull request #10412: URL: https://github.com/apache/arrow/pull/10412#issuecomment-879985272 I think I've addressed all the feedback here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669711765 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):

[GitHub] [arrow] lidavidm commented on a change in pull request #10663: ARROW-13253: [FlightRPC][C++] Fix segfault with large messages

2021-07-14 Thread GitBox
lidavidm commented on a change in pull request #10663: URL: https://github.com/apache/arrow/pull/10663#discussion_r669705163 ## File path: cpp/src/arrow/flight/serialization_internal.cc ## @@ -201,9 +193,7 @@ grpc::Status FlightDataSerialize(const FlightPayload& msg, ByteBuffe

[GitHub] [arrow] thisisnic commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

2021-07-14 Thread GitBox
thisisnic commented on a change in pull request #10624: URL: https://github.com/apache/arrow/pull/10624#discussion_r669142885 ## File path: r/src/compute.cpp ## @@ -316,6 +316,19 @@ std::shared_ptr make_compute_options( return std::make_shared(max_splits, reverse); }

[GitHub] [arrow-datafusion] Jimexist opened a new issue #725: Global limit isn't really limiting parquet file reads and stops earlier

2021-07-14 Thread GitBox
Jimexist opened a new issue #725: URL: https://github.com/apache/arrow-datafusion/issues/725 **Describe the bug** A clear and concise description of what the bug is. When given a global limit: ```sql select * from some_large_data limit 50; ``` even with a `-c` b

[GitHub] [arrow] kszucs commented on a change in pull request #10650: ARROW-13058: This is a draft to provide save-report

2021-07-14 Thread GitBox
kszucs commented on a change in pull request #10650: URL: https://github.com/apache/arrow/pull/10650#discussion_r669699660 ## File path: dev/archery/archery/crossbow/reports.py ## @@ -121,6 +121,61 @@ def show(self, outstream, asset_callback=None):

  1   2   >