[GitHub] [arrow] cyb70289 commented on a change in pull request #9843: ARROW-12145: [Developer][Archery] Flaky: test_static_runner_from_json

2021-03-29 Thread GitBox
cyb70289 commented on a change in pull request #9843: URL: https://github.com/apache/arrow/pull/9843#discussion_r603794401 ## File path: dev/archery/archery/tests/test_benchmarks.py ## @@ -94,10 +94,16 @@ def test_static_runner_from_json():

[GitHub] [arrow] github-actions[bot] commented on pull request #9842: ARROW-12040: [R] [CI] [C++] test-r-rstudio-r-base-3.6-opensuse15 timing out during tests

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9842: URL: https://github.com/apache/arrow/pull/9842#issuecomment-809917586 https://issues.apache.org/jira/browse/ARROW-12040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] kou commented on pull request #9845: ARROW-12128: [CI][Crossbow] Remove test-ubuntu-16.04-cpp job

2021-03-29 Thread GitBox
kou commented on pull request #9845: URL: https://github.com/apache/arrow/pull/9845#issuecomment-809904947 @github-actions crossbow submit -g nightly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kou opened a new pull request #9845: ARROW-12128: [CI][Crossbow] Remove test-ubuntu-16.04-cpp job

2021-03-29 Thread GitBox
kou opened a new pull request #9845: URL: https://github.com/apache/arrow/pull/9845 Ubuntu 16.04 is EOL in 2021-04. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [arrow] cyb70289 closed pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
cyb70289 closed pull request #9835: URL: https://github.com/apache/arrow/pull/9835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] projjal commented on a change in pull request #9816: ARROW-7215: [C++][Gandiva] Implement castVARCHAR(numeric_type) functions

2021-03-29 Thread GitBox
projjal commented on a change in pull request #9816: URL: https://github.com/apache/arrow/pull/9816#discussion_r603773257 ## File path: cpp/src/arrow/vendored/double-conversion/double-conversion.cc ## @@ -84,7 +84,24 @@ void

[GitHub] [arrow] cyb70289 commented on pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
cyb70289 commented on pull request #9835: URL: https://github.com/apache/arrow/pull/9835#issuecomment-809899280 Sanitizer test has succeeded. CI failure is not related. Will merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] cyb70289 commented on a change in pull request #9838: ARROW-12134: [C++] Add match_substring_regex kernel

2021-03-29 Thread GitBox
cyb70289 commented on a change in pull request #9838: URL: https://github.com/apache/arrow/pull/9838#discussion_r603771940 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -411,40 +411,104 @@ void TransformMatchSubstring(const uint8_t* pattern, int64_t

[GitHub] [arrow] cyb70289 commented on a change in pull request #9838: ARROW-12134: [C++] Add match_substring_regex kernel

2021-03-29 Thread GitBox
cyb70289 commented on a change in pull request #9838: URL: https://github.com/apache/arrow/pull/9838#discussion_r603769358 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -411,40 +411,104 @@ void TransformMatchSubstring(const uint8_t* pattern, int64_t

[GitHub] [arrow] projjal commented on a change in pull request #9724: ARROW-11986: [C++][Gandiva] Implement IN expressions for doubles and floats

2021-03-29 Thread GitBox
projjal commented on a change in pull request #9724: URL: https://github.com/apache/arrow/pull/9724#discussion_r603766965 ## File path: cpp/src/gandiva/proto/Types.proto ## @@ -219,8 +219,10 @@ message InNode { optional TreeNode node = 1; optional IntConstants intValues

[GitHub] [arrow] cyb70289 commented on a change in pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
cyb70289 commented on a change in pull request #9841: URL: https://github.com/apache/arrow/pull/9841#discussion_r603736068 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -18,11 +18,13 @@ #include "arrow/compute/kernels/common.h" #include

[GitHub] [arrow] projjal commented on pull request #9724: ARROW-11986: [C++][Gandiva] Implement IN expressions for doubles and floats

2021-03-29 Thread GitBox
projjal commented on pull request #9724: URL: https://github.com/apache/arrow/pull/9724#issuecomment-809873483 > @projjal, can you please help me discover why the test TestInFloat is failing in this travis build: https://travis-ci.com/github/apache/arrow/jobs/494494759 ? I already tried

[GitHub] [arrow] cyb70289 edited a comment on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
cyb70289 edited a comment on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809853074 > I'm not sure about two things: > > * Is `Exponentiate` a good name or should we go with `Power`? I picked `Exponentiate` as all the other operation names are

[GitHub] [arrow] github-actions[bot] commented on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809862926 https://issues.apache.org/jira/browse/ARROW-11070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] cyb70289 edited a comment on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
cyb70289 edited a comment on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809853074 > I'm not sure about two things: > > * Is `Exponentiate` a good name or should we go with `Power`? I picked `Exponentiate` as all the other operation names are

[GitHub] [arrow] projjal commented on pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-03-29 Thread GitBox
projjal commented on pull request #9450: URL: https://github.com/apache/arrow/pull/9450#issuecomment-809860443 @sagnikc-dremio The failing test might be because the generated jit code can't resolve the symbols corresponding to the utf8 library. Can you add the function in gdv_fn_stubs and

[GitHub] [arrow] projjal closed pull request #8158: ARROW-7215: [C++][Gandiva] Implement castVARCHAR(numeric_type) functions

2021-03-29 Thread GitBox
projjal closed pull request #8158: URL: https://github.com/apache/arrow/pull/8158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] cyb70289 commented on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
cyb70289 commented on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809853074 > I'm not sure about two things: > > * Is `Exponentiate` a good name or should we go with `Power`? I picked `Exponentiate` as all the other operation names are verbs

[GitHub] [arrow] github-actions[bot] commented on pull request #9840: ARROW-12107: [Rust][DataFusion] Support SELECT * from information_schema.columns

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9840: URL: https://github.com/apache/arrow/pull/9840#issuecomment-809842655 https://issues.apache.org/jira/browse/ARROW-12107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9835: URL: https://github.com/apache/arrow/pull/9835#issuecomment-809842473 Revision: 725e9b9215eed2feaedb59f5fbaab063040ca4dd Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] jorgecarleitao closed pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

2021-03-29 Thread GitBox
jorgecarleitao closed pull request #9291: URL: https://github.com/apache/arrow/pull/9291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow] github-actions[bot] commented on pull request #9839: ARROW-12140: [C++][CI] Fix Valgrind failures in Grouper tests

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9839: URL: https://github.com/apache/arrow/pull/9839#issuecomment-809829062 https://issues.apache.org/jira/browse/ARROW-12140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] jpedroantunes opened a new pull request #9844: ARROW-12146: [C++][Gandiva] Implement CONVERT_FROM(expression, ‘UTF8’, replacement char) function

2021-03-29 Thread GitBox
jpedroantunes opened a new pull request #9844: URL: https://github.com/apache/arrow/pull/9844 Implement CONVERT_FROM(expression, ‘UTF8’, replacement char) Converts the byte data in expression to UTF-8. Expression can be a literal string or a field name. Will replace any invalid

[GitHub] [arrow] github-actions[bot] commented on pull request #9838: ARROW-12134: [C++] Add match_substring_regex kernel

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9838: URL: https://github.com/apache/arrow/pull/9838#issuecomment-809823570 https://issues.apache.org/jira/browse/ARROW-12134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #9837: ARROW-12100: [C++][IPC] Allow null children field when num children is 0

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9837: URL: https://github.com/apache/arrow/pull/9837#issuecomment-809817571 https://issues.apache.org/jira/browse/ARROW-12100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #9767: ARROW-12139: [Python][Packaging] Use vcpkg to build macOS wheels

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9767: URL: https://github.com/apache/arrow/pull/9767#issuecomment-809798227 Revision: 765117e7f72caa85997eee92363cab482fc0653c Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] github-actions[bot] commented on pull request #9767: ARROW-12139: [Python][Packaging] Use vcpkg to build macOS wheels

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9767: URL: https://github.com/apache/arrow/pull/9767#issuecomment-809792465 https://issues.apache.org/jira/browse/ARROW-12139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #9836: ARROW-12138: [Go][IPC] Update flatbuffers definitions

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9836: URL: https://github.com/apache/arrow/pull/9836#issuecomment-809789886 https://issues.apache.org/jira/browse/ARROW-12138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] dianaclarke opened a new pull request #9843: ARROW-12145: [Developer][Archery] Flaky: test_static_runner_from_json

2021-03-29 Thread GitBox
dianaclarke opened a new pull request #9843: URL: https://github.com/apache/arrow/pull/9843 …_json -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow] dianaclarke commented on pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
dianaclarke commented on pull request #9822: URL: https://github.com/apache/arrow/pull/9822#issuecomment-809758214 > > @dianaclarke https://github.com/apache/arrow/pull/9822/checks?check_run_id=2219940455 > > Ah, now I see. I didn't write that test. I think @bkietz did, but I can

[GitHub] [arrow] seddonm1 commented on pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

2021-03-29 Thread GitBox
seddonm1 commented on pull request #9428: URL: https://github.com/apache/arrow/pull/9428#issuecomment-809756978 @sweb This is great and does work with all the cases I have run tests for :+1:. You can see my slight suggested change above and once thats done @alamb this is ready for merge.

[GitHub] [arrow] seddonm1 commented on a change in pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

2021-03-29 Thread GitBox
seddonm1 commented on a change in pull request #9428: URL: https://github.com/apache/arrow/pull/9428#discussion_r603656164 ## File path: rust/datafusion/tests/sql.rs ## @@ -2607,3 +2607,24 @@ async fn invalid_qualified_table_references() -> Result<()> { } Ok(()) }

[GitHub] [arrow] seddonm1 commented on a change in pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

2021-03-29 Thread GitBox
seddonm1 commented on a change in pull request #9428: URL: https://github.com/apache/arrow/pull/9428#discussion_r603655693 ## File path: rust/arrow/src/compute/kernels/regexp.rs ## @@ -0,0 +1,147 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] dianaclarke commented on pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
dianaclarke commented on pull request #9822: URL: https://github.com/apache/arrow/pull/9822#issuecomment-809747930 > @dianaclarke https://github.com/apache/arrow/pull/9822/checks?check_run_id=2219940455 Ah, now I see. I didn't write that test. I think @bkietz did, but I can try to

[GitHub] [arrow] seddonm1 commented on pull request #9834: ARROW-12136: [Rust][DataFusion] Reduce default batch_size to 8192

2021-03-29 Thread GitBox
seddonm1 commented on pull request #9834: URL: https://github.com/apache/arrow/pull/9834#issuecomment-809747636 Does this result look the same with SF > 1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] Demetrio92 commented on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

2021-03-29 Thread GitBox
Demetrio92 commented on issue #1688: URL: https://github.com/apache/arrow/issues/1688#issuecomment-809739042 Seems like the issue is back. But the guys are working on it. https://issues.apache.org/jira/browse/ARROW-11157 > As a workaround, you can read with pyarrow and do the

[GitHub] [arrow] anthonylouisbsb edited a comment on pull request #9707: ARROW-11984: [C++][Gandiva] Implement SHA1 and SHA256 functions

2021-03-29 Thread GitBox
anthonylouisbsb edited a comment on pull request #9707: URL: https://github.com/apache/arrow/pull/9707#issuecomment-809736515 @projjal I made a correction to fix the broken nightly build for the MacOS, It was a missing include header in the CMakeLists.txt of the Gandiva project -- This

[GitHub] [arrow] anthonylouisbsb commented on pull request #9707: ARROW-11984: [C++][Gandiva] Implement SHA1 and SHA256 functions

2021-03-29 Thread GitBox
anthonylouisbsb commented on pull request #9707: URL: https://github.com/apache/arrow/pull/9707#issuecomment-809736515 @projjal I made a correction to fix the broken nightly build, It was a missing include header in the CMakeLists.txt of the Gandiva project -- This is an automated

[GitHub] [arrow] emkornfield commented on pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
emkornfield commented on pull request #9822: URL: https://github.com/apache/arrow/pull/9822#issuecomment-809735644 @dianaclarke https://github.com/apache/arrow/pull/9822/checks?check_run_id=2219940455 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] alamb commented on pull request #9291: ARROW-11345: [Rust] Made most ops not rely on `value(i)`

2021-03-29 Thread GitBox
alamb commented on pull request #9291: URL: https://github.com/apache/arrow/pull/9291#issuecomment-809733266 @jorgecarleitao is this PR something that you plan to clean up and merge? Or should we close this PR as you have switched to working on arrow 2? -- This is an automated message

[GitHub] [arrow] alamb closed pull request #9506: WIP: [Rust] Remove ArrowPrimitiveType

2021-03-29 Thread GitBox
alamb closed pull request #9506: URL: https://github.com/apache/arrow/pull/9506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] alamb commented on pull request #9506: WIP: [Rust] Remove ArrowPrimitiveType

2021-03-29 Thread GitBox
alamb commented on pull request #9506: URL: https://github.com/apache/arrow/pull/9506#issuecomment-809732676 Closing this PR to hopefully make it clearer this revamp is happening in a separate repo -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] westonpace opened a new pull request #9842: ARROW-12040: [R] [CI] [C++] test-r-rstudio-r-base-3.6-opensuse15 timing out during tests

2021-03-29 Thread GitBox
westonpace opened a new pull request #9842: URL: https://github.com/apache/arrow/pull/9842 From a deadlocked run... ``` #0 0x7f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x7f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2

[GitHub] [arrow] github-actions[bot] commented on pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9835: URL: https://github.com/apache/arrow/pull/9835#issuecomment-809729397 https://issues.apache.org/jira/browse/ARROW-12103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #9834: ARROW-12136: [Rust][DataFusion] Reduce default batch_size to 8192

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9834: URL: https://github.com/apache/arrow/pull/9834#issuecomment-809727177 https://issues.apache.org/jira/browse/ARROW-12136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] rok edited a comment on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
rok edited a comment on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809678874 I'm not sure about two things: * Is `Exponentiate` a good name or should we go with `Power`? I picked `Exponentiate` as all the other operation names are verbs so `Power`

[GitHub] [arrow] trxcllnt commented on issue #9752: TS Project unable to build with webpack

2021-03-29 Thread GitBox
trxcllnt commented on issue #9752: URL: https://github.com/apache/arrow/issues/9752#issuecomment-809690137 @westandy I replicated this error locally and found a solution from [this comment](https://github.com/graphql/graphql-js/issues/2721#issuecomment-723008284). I installed the latest

[GitHub] [arrow] rok commented on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-29 Thread GitBox
rok commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-809682505 ping :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [arrow] kou closed pull request #9832: ARROW-12131: [CI][GLib] Ensure upgrading MSYS2

2021-03-29 Thread GitBox
kou closed pull request #9832: URL: https://github.com/apache/arrow/pull/9832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] kou commented on pull request #9832: ARROW-12131: [CI][GLib] Ensure upgrading MSYS2

2021-03-29 Thread GitBox
kou commented on pull request #9832: URL: https://github.com/apache/arrow/pull/9832#issuecomment-809679772 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [arrow] rok commented on pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
rok commented on pull request #9841: URL: https://github.com/apache/arrow/pull/9841#issuecomment-809678874 I'm not sure about two things: * Is `Exponentiate` a good name or should we go with `Power`? I picked `Exponentiate` as all the other operation names are verbs so `Power` feels

[GitHub] [arrow] rok opened a new pull request #9841: ARROW-11070: [C++] Implement power / exponentiation compute kernel

2021-03-29 Thread GitBox
rok opened a new pull request #9841: URL: https://github.com/apache/arrow/pull/9841 This is to resolve [https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11070](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11070). -- This is an automated message from the Apache

[GitHub] [arrow] itamarst commented on pull request #9631: ARROW-11644: [Python][Parquet] Low-level Parquet decryption in Python

2021-03-29 Thread GitBox
itamarst commented on pull request #9631: URL: https://github.com/apache/arrow/pull/9631#issuecomment-809674064 Should be ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] ianmcook commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
ianmcook commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809672648 > In SQL it's called `CONCAT`? (https://www.w3schools.com/sql/func_mysql_concat.asp, although this doesn't have the concept of a join separator) In SQL, `concat_ws` is

[GitHub] [arrow] lidavidm commented on a change in pull request #9808: ARROW-12097: [C++] Modify BackgroundGenerator so it creates fewer threads

2021-03-29 Thread GitBox
lidavidm commented on a change in pull request #9808: URL: https://github.com/apache/arrow/pull/9808#discussion_r603574136 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -947,65 +947,134 @@ AsyncGenerator MakeIteratorGenerator(Iterator it) { template class

[GitHub] [arrow] jorisvandenbossche commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
jorisvandenbossche commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809670982 In SQL it's called `CONCAT`? (https://www.w3schools.com/sql/func_mysql_concat.asp, although this doesn't have the concept of a join separator) -- This is an

[GitHub] [arrow] ianmcook commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
ianmcook commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809669201 > > I'll just point out that "join" is not a python-ism. There is a string join in Java, Rust, C#, JavaScript, etc. and it is consistently called join. I think R is the only

[GitHub] [arrow] ianmcook commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
ianmcook commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809664771 > I'll just point out that "join" is not a python-ism. There is a string join in Java, Rust, C#, JavaScript, etc. and it is consistently called join. I think R is the only

[GitHub] [arrow] westonpace commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
westonpace commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809664145 I'll just point out that "join" is not a python-ism. There is a string join in Java, Rust, C#, JavaScript, etc. and it is consistently called join. I think R is the only

[GitHub] [arrow] ianmcook commented on a change in pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
ianmcook commented on a change in pull request #8990: URL: https://github.com/apache/arrow/pull/8990#discussion_r603563736 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -428,6 +433,26 @@ TYPED_TEST(TestStringKernels,

[GitHub] [arrow] westandy commented on issue #9752: TS Project unable to build with webpack

2021-03-29 Thread GitBox
westandy commented on issue #9752: URL: https://github.com/apache/arrow/issues/9752#issuecomment-809657482 @trxcllnt - Neither `resolve{ enforceExtension: false}` nor does adding `.mjs` to the list of extensions fix any of the issues. ``` resolve: { enforceExtension:

[GitHub] [arrow] rodrigojdebem commented on pull request #9750: [WIP] ARROW-12021: [C++][Gandiva] Implement to_char() function on Gandiva

2021-03-29 Thread GitBox
rodrigojdebem commented on pull request #9750: URL: https://github.com/apache/arrow/pull/9750#issuecomment-809653407 @emkornfield hello! Do you know the reason for the checks being skipped? Is it because the PR is old? -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on pull request #9614: ARROW-11839: [C++] Use xsimd for generation of accelerated bit-unpacking

2021-03-29 Thread GitBox
pitrou commented on pull request #9614: URL: https://github.com/apache/arrow/pull/9614#issuecomment-809653406 TODO: switch to SafeLoad as in PR #9835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] westonpace commented on a change in pull request #9808: ARROW-12097: [C++] Modify BackgroundGenerator so it creates fewer threads

2021-03-29 Thread GitBox
westonpace commented on a change in pull request #9808: URL: https://github.com/apache/arrow/pull/9808#discussion_r603557564 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -947,65 +947,134 @@ AsyncGenerator MakeIteratorGenerator(Iterator it) { template class

[GitHub] [arrow] ianmcook commented on pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-29 Thread GitBox
ianmcook commented on pull request #8990: URL: https://github.com/apache/arrow/pull/8990#issuecomment-809650002 Regarding "join" in the name: +1 for consistency with Python's join function -1 because as Arrow gains more database-like features, the word "join" is likely to confuse

[GitHub] [arrow] alamb opened a new pull request #9840: ARROW-12107: [Rust][DataFusion] Support SELECT * from information_schema.columns

2021-03-29 Thread GitBox
alamb opened a new pull request #9840: URL: https://github.com/apache/arrow/pull/9840 Note this builds on the code in #9818 so putting up as a draft until that PR is merged # Rationale Provide schema metadata access (so a user can see what columns exist and their type).

[GitHub] [arrow] lidavidm commented on a change in pull request #9808: ARROW-12097: [C++] Modify BackgroundGenerator so it creates fewer threads

2021-03-29 Thread GitBox
lidavidm commented on a change in pull request #9808: URL: https://github.com/apache/arrow/pull/9808#discussion_r603536350 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -947,65 +947,134 @@ AsyncGenerator MakeIteratorGenerator(Iterator it) { template class

[GitHub] [arrow] nealrichardson commented on pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
nealrichardson commented on pull request #9835: URL: https://github.com/apache/arrow/pull/9835#issuecomment-809626062 @github-actions crossbow submit test-ubuntu-18.04-r-sanitizer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] velvia commented on pull request #9773: ARROW-12028 ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates

2021-03-29 Thread GitBox
velvia commented on pull request #9773: URL: https://github.com/apache/arrow/pull/9773#issuecomment-809590043 @alamb and others: have added a test and I believe addressed comments, hope this is all needed! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on pull request #9837: ARROW-12100: [C++][IPC] Allow null children field when num children is 0

2021-03-29 Thread GitBox
pitrou commented on pull request #9837: URL: https://github.com/apache/arrow/pull/9837#issuecomment-809585733 I hope @wesm can give guidance on whether this is legal or a workaround. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] pitrou opened a new pull request #9839: ARROW-12140: [C++][CI] Fix Valgrind failures in Grouper tests

2021-03-29 Thread GitBox
pitrou opened a new pull request #9839: URL: https://github.com/apache/arrow/pull/9839 Make sure no uninitialized bits remain in generated null bitmap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #9833: ARROW-12133: [C++][Gandiva] Add option to disable setting mcpu flag to host cpu during llvm ir compilation

2021-03-29 Thread GitBox
github-actions[bot] commented on pull request #9833: URL: https://github.com/apache/arrow/pull/9833#issuecomment-809580631 https://issues.apache.org/jira/browse/ARROW-12133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] ritchie46 commented on pull request #9506: WIP: [Rust] Remove ArrowPrimitiveType

2021-03-29 Thread GitBox
ritchie46 commented on pull request #9506: URL: https://github.com/apache/arrow/pull/9506#issuecomment-809580218 > @ritchie46 , I ended up working this out on a separate repo, https://github.com/jorgecarleitao/arrow2 . Note that there is no ArrayData, which required a brutal refactor. No

[GitHub] [arrow] emkornfield commented on pull request #9837: ARROW-12100: [C++][IPC] Allow null children field when num children is 0

2021-03-29 Thread GitBox
emkornfield commented on pull request #9837: URL: https://github.com/apache/arrow/pull/9837#issuecomment-809578565 This seems OK. I think we should clarify in the specification whether null should be allowed, and add a comment here that this is either a temporary workaround or at least a

[GitHub] [arrow] jorgecarleitao commented on pull request #9506: WIP: [Rust] Remove ArrowPrimitiveType

2021-03-29 Thread GitBox
jorgecarleitao commented on pull request #9506: URL: https://github.com/apache/arrow/pull/9506#issuecomment-809551912 @ritchie46 , I ended up working this out on a separate repo, https://github.com/jorgecarleitao/arrow2 . Note that there is no `ArrayData`, which required a brutal

[GitHub] [arrow] lidavidm opened a new pull request #9838: ARROW-12134: [C++] Add match_substring_regex kernel

2021-03-29 Thread GitBox
lidavidm opened a new pull request #9838: URL: https://github.com/apache/arrow/pull/9838 For consistency with match_substring, this is the equivalent of Python's re.search(), not re.match(). -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] trxcllnt commented on issue #9752: TS Project unable to build with webpack

2021-03-29 Thread GitBox
trxcllnt commented on issue #9752: URL: https://github.com/apache/arrow/issues/9752#issuecomment-809537936 @westandy could you try adding `.mjs` to your `resolve.extensions` list? Seems like webpack is picking up that we have `"modules": "Arrow.dom.mjs"` in our package.json, but I think

[GitHub] [arrow] ritchie46 commented on pull request #9506: WIP: [Rust] Remove ArrowPrimitiveType

2021-03-29 Thread GitBox
ritchie46 commented on pull request #9506: URL: https://github.com/apache/arrow/pull/9506#issuecomment-809531979 I fully understand the need for such a huge refactor. I am also very scared for such a refactor. As polars is not in the repo, and would enforce a huge redesign. If this

[GitHub] [arrow] pitrou commented on pull request #9837: ARROW-12100: [C++][IPC] Allow null children field when num children is 0

2021-03-29 Thread GitBox
pitrou commented on pull request #9837: URL: https://github.com/apache/arrow/pull/9837#issuecomment-809521540 @emkornfield What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou opened a new pull request #9837: ARROW-12100: [C++][IPC] Allow null children field when num children is 0

2021-03-29 Thread GitBox
pitrou opened a new pull request #9837: URL: https://github.com/apache/arrow/pull/9837 The C# implementation seems to omit the `Field.children` when writing primitive datatypes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] frank400 commented on pull request #9724: ARROW-11986: [C++][Gandiva] Implement IN expressions for doubles and floats

2021-03-29 Thread GitBox
frank400 commented on pull request #9724: URL: https://github.com/apache/arrow/pull/9724#issuecomment-809518153 @projjal, can you please help me discover why the test TestInFloat is failing in this travis build: https://travis-ci.com/github/apache/arrow/jobs/494494759 ? I already tried to

[GitHub] [arrow] ritchie46 commented on a change in pull request #9778: ARROW-12052: [Rust] Add Child Data to Arrow's C FFI implementation. …

2021-03-29 Thread GitBox
ritchie46 commented on a change in pull request #9778: URL: https://github.com/apache/arrow/pull/9778#discussion_r603432652 ## File path: rust/arrow/src/ffi.rs ## @@ -193,6 +206,20 @@ fn to_datatype(format: ) -> Result { "ttm" =>

[GitHub] [arrow] kszucs commented on pull request #9767: ARROW-12139: [Python][Packaging] Use vcpkg to build macOS wheels

2021-03-29 Thread GitBox
kszucs commented on pull request #9767: URL: https://github.com/apache/arrow/pull/9767#issuecomment-809515403 @github-actions crossbow submit wheel-osx-* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] kszucs commented on a change in pull request #9767: ARROW-12139: [Python][Packaging] Use vcpkg to build macOS wheels

2021-03-29 Thread GitBox
kszucs commented on a change in pull request #9767: URL: https://github.com/apache/arrow/pull/9767#discussion_r603423545 ## File path: dev/tasks/crossbow.py ## @@ -322,7 +322,11 @@ def create_tree(self, files): def create_commit(self, files, parents=None, message='',

[GitHub] [arrow] zeroshade commented on pull request #9836: ARROW-12138: [Go][IPC] Update flatbuffers definitions

2021-03-29 Thread GitBox
zeroshade commented on pull request #9836: URL: https://github.com/apache/arrow/pull/9836#issuecomment-809502755 All that was done for this change was to run `go run ./gen-flatbuffers.go` followed by updating the couple of files in the ipc module which had to switch direct references to

[GitHub] [arrow] zeroshade opened a new pull request #9836: ARROW-12138: [Go][IPC] Update flatbuffers definitions

2021-03-29 Thread GitBox
zeroshade opened a new pull request #9836: URL: https://github.com/apache/arrow/pull/9836 Updating the generated flatbuffer code so that newer features like compression in IPC can get implemented. Doing the updating of the flatbuffer generated code first as a separate change.

[GitHub] [arrow] pitrou commented on pull request #9779: ARROW-12056: [C++] Create sequencing AsyncGenerator

2021-03-29 Thread GitBox
pitrou commented on pull request #9779: URL: https://github.com/apache/arrow/pull/9779#issuecomment-809495463 Note conflicts need fixing now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] nealrichardson closed pull request #9797: ARROW-11965: [R][Docs] Simplify install.packages command in R dev docs

2021-03-29 Thread GitBox
nealrichardson closed pull request #9797: URL: https://github.com/apache/arrow/pull/9797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow] pitrou closed pull request #9644: ARROW-11887: [C++] Add asynchronous read to streaming CSV reader

2021-03-29 Thread GitBox
pitrou closed pull request #9644: URL: https://github.com/apache/arrow/pull/9644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] pitrou opened a new pull request #9835: ARROW-12103: [C++] Correctly handle unaligned access in bit-unpacking code

2021-03-29 Thread GitBox
pitrou opened a new pull request #9835: URL: https://github.com/apache/arrow/pull/9835 Found by the ubuntu-r-sanitizer job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] pitrou commented on pull request #9644: ARROW-11887: [C++] Add asynchronous read to streaming CSV reader

2021-03-29 Thread GitBox
pitrou commented on pull request #9644: URL: https://github.com/apache/arrow/pull/9644#issuecomment-809490088 Looks like the R tests are hanging hopelessly: https://github.com/westonpace/arrow/runs/2219131909?check_suite_focus=true -- This is an automated message from the Apache Git

[GitHub] [arrow] pitrou edited a comment on pull request #9644: ARROW-11887: [C++] Add asynchronous read to streaming CSV reader

2021-03-29 Thread GitBox
pitrou edited a comment on pull request #9644: URL: https://github.com/apache/arrow/pull/9644#issuecomment-809490088 Looks like the R builds are hanging hopelessly: https://github.com/westonpace/arrow/runs/2219131909?check_suite_focus=true -- This is an automated message from the Apache

[GitHub] [arrow] westonpace commented on a change in pull request #9779: ARROW-12056: [C++] Create sequencing AsyncGenerator

2021-03-29 Thread GitBox
westonpace commented on a change in pull request #9779: URL: https://github.com/apache/arrow/pull/9779#discussion_r603407695 ## File path: cpp/src/arrow/util/async_generator_test.cc ## @@ -793,6 +794,118 @@ TEST(TestAsyncUtil, ReadaheadFailed) {

[GitHub] [arrow] emkornfield commented on a change in pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
emkornfield commented on a change in pull request #9822: URL: https://github.com/apache/arrow/pull/9822#discussion_r603407244 ## File path: java/compression/src/main/java/org/apache/arrow/compression/ZstdCompressionCodec.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] Dandandan opened a new pull request #9834: ARROW-12136: [Rust][DataFusion] Reduce default batch_size to 8192

2021-03-29 Thread GitBox
Dandandan opened a new pull request #9834: URL: https://github.com/apache/arrow/pull/9834 I did some comparisons with different batch sized with TCP-H on SF=1 in memory / 16 partitions. We chose a higher batch_size earlier as DF had some problems with smaller batch sizes (in hash join,

[GitHub] [arrow] emkornfield commented on pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
emkornfield commented on pull request #9822: URL: https://github.com/apache/arrow/pull/9822#issuecomment-809482836 Yes, I expect it to be good. I'm going to try to run some end-to-end experiments and I can report back numbers. -- This is an automated message from the Apache Git

[GitHub] [arrow] emkornfield commented on a change in pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
emkornfield commented on a change in pull request #9822: URL: https://github.com/apache/arrow/pull/9822#discussion_r603400465 ## File path: java/compression/src/main/java/org/apache/arrow/compression/ZstdCompressionCodec.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] emkornfield commented on a change in pull request #9822: ARROW-12110: [Java] Implement ZSTD compression

2021-03-29 Thread GitBox
emkornfield commented on a change in pull request #9822: URL: https://github.com/apache/arrow/pull/9822#discussion_r603397345 ## File path: java/compression/src/main/java/org/apache/arrow/compression/ZstdCompressionCodec.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] Dandandan commented on a change in pull request #9818: ARROW-12106: [Rust][DataFusion] Support `SELECT * from information_schema.tables`

2021-03-29 Thread GitBox
Dandandan commented on a change in pull request #9818: URL: https://github.com/apache/arrow/pull/9818#discussion_r603386970 ## File path: rust/datafusion/src/execution/context.rs ## @@ -310,16 +324,25 @@ impl ExecutionContext { name: impl Into, catalog: Arc,

[GitHub] [arrow] lidavidm commented on pull request #9802: ARROW-10882: [Python] Allow writing dataset from iterator of batches

2021-03-29 Thread GitBox
lidavidm commented on pull request #9802: URL: https://github.com/apache/arrow/pull/9802#issuecomment-809437568 Aha, the reason why scanning a fragment is empty is because it gets constructed with an empty schema due to a spot of undefined behavior. ```cpp

  1   2   >