[GitHub] [arrow] mrkn commented on pull request #6302: ARROW-7633: [C++][CI] Create fuzz targets for tensors and sparse tensors

2021-01-05 Thread GitBox
mrkn commented on pull request #6302: URL: https://github.com/apache/arrow/pull/6302#issuecomment-754473308 @pitrou I'm sorry for the response to be too late. I've committed the fixes for two comments. And, if I should move `MakeRandomTensor`, I'll add the commit to do it. --

[GitHub] [arrow] nevi-me commented on pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2021-01-05 Thread GitBox
nevi-me commented on pull request #9025: URL: https://github.com/apache/arrow/pull/9025#issuecomment-754477186 Please enable `generate_custom_metadata_case()`in `dev/archery/archery/integration/datagen.py`. You'll see the Rust case disabled. That integration test fails with ```

[GitHub] [arrow] KirillLykov commented on a change in pull request #9079: ARROW-10578: [C++] Comparison kernels crashing for string array with null string scalar

2021-01-05 Thread GitBox
KirillLykov commented on a change in pull request #9079: URL: https://github.com/apache/arrow/pull/9079#discussion_r551791854 ## File path: python/pyarrow/tests/test_compute.py ## @@ -803,12 +836,17 @@ def con(values): return pa.array(values) def con(values): return pa

[GitHub] [arrow] jorisvandenbossche closed pull request #9079: ARROW-10578: [C++] Comparison kernels crashing for string array with null string scalar

2021-01-05 Thread GitBox
jorisvandenbossche closed pull request #9079: URL: https://github.com/apache/arrow/pull/9079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] westonpace opened a new pull request #9100: Added a test to try and reproduce arrow-11067. Since it can only fai…

2021-01-05 Thread GitBox
westonpace opened a new pull request #9100: URL: https://github.com/apache/arrow/pull/9100 …l on mac I'm going to commit just the test first to see if it fails properly and I can avoid setting up a mac devenv. This is an aut

[GitHub] [arrow] github-actions[bot] commented on pull request #9100: Added a test to try and reproduce arrow-11067. Since it can only fai…

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754533608 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] pitrou commented on pull request #9100: Added a test to try and reproduce arrow-11067. Since it can only fai…

2021-01-05 Thread GitBox
pitrou commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754557189 I restarted the MacOS jobs because they did fail, but not for the right reason :-( This is an automated message fr

[GitHub] [arrow] pitrou commented on a change in pull request #9100: Added a test to try and reproduce arrow-11067. Since it can only fai…

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9100: URL: https://github.com/apache/arrow/pull/9100#discussion_r551852896 ## File path: cpp/src/arrow/util/trie_test.cc ## @@ -175,6 +175,15 @@ TEST(Trie, EmptyString) { ASSERT_EQ(-1, trie.Find("x")); } +TEST(Trie, LongString

[GitHub] [arrow] kszucs commented on pull request #8915: ARROW-10904: [Python] Add support for Python 3.9 macOS wheels

2021-01-05 Thread GitBox
kszucs commented on pull request #8915: URL: https://github.com/apache/arrow/pull/8915#issuecomment-754560147 I'm afraid that it is enforced at organization level, so we need to update that action. This is an automated messa

[GitHub] [arrow] mqy opened a new pull request #9101: ARROW-11131: [Rust] Improve performance of bool_equal

2021-01-05 Thread GitBox
mqy opened a new pull request #9101: URL: https://github.com/apache/arrow/pull/9101 This PR follows https://github.com/apache/arrow/pull/8541, 1. Implement the logic when both `lhs` and `rhs` have zero null count. 2. May fixed a possible condition testing bug in `(0..len).all(|i|

[GitHub] [arrow] github-actions[bot] commented on pull request #9101: ARROW-11131: [Rust] Improve performance of bool_equal

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9101: URL: https://github.com/apache/arrow/pull/9101#issuecomment-754570371 https://issues.apache.org/jira/browse/ARROW-11131 This is an automated message from the Apache Git Ser

[GitHub] [arrow] liyafan82 commented on a change in pull request #8963: ARROW-10962: [FlightRPC][Java] fill in empty body buffer if needed

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #8963: URL: https://github.com/apache/arrow/pull/8963#discussion_r551866156 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestBasicOperation.java ## @@ -317,6 +325,71 @@ private void test(BiConsumer

[GitHub] [arrow] liyafan82 commented on a change in pull request #8963: ARROW-10962: [FlightRPC][Java] fill in empty body buffer if needed

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #8963: URL: https://github.com/apache/arrow/pull/8963#discussion_r551866948 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestBasicOperation.java ## @@ -317,6 +325,71 @@ private void test(BiConsumer

[GitHub] [arrow] liyafan82 commented on a change in pull request #8963: ARROW-10962: [FlightRPC][Java] fill in empty body buffer if needed

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #8963: URL: https://github.com/apache/arrow/pull/8963#discussion_r551867296 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestBasicOperation.java ## @@ -317,6 +325,71 @@ private void test(BiConsumer

[GitHub] [arrow] Dandandan commented on pull request #9070: ARROW-11030: [Rust][DataFusion] Concatenate left side batches to single batch in HashJoinExec

2021-01-05 Thread GitBox
Dandandan commented on pull request #9070: URL: https://github.com/apache/arrow/pull/9070#issuecomment-754573137 @jorgecarleitao @andygrove Do you think this design is good enough for now? For now it means anyone storing > 4GB variable size data in a build-side column should opt to u

[GitHub] [arrow] liyafan82 commented on a change in pull request #9088: ARROW-11114: [Java] Fix Schema and Field metadata JSON serialization

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #9088: URL: https://github.com/apache/arrow/pull/9088#discussion_r551868939 ## File path: java/vector/src/test/java/org/apache/arrow/vector/types/pojo/TestField.java ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [arrow] pitrou opened a new pull request #9102: ARROW-10955: [C++] Fix JSON reading of list(null) values

2021-01-05 Thread GitBox
pitrou opened a new pull request #9102: URL: https://github.com/apache/arrow/pull/9102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on a change in pull request #9097: ARROW-10881: [C++] Fix EXC_BAD_ACCESS in PutSpaced

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9097: URL: https://github.com/apache/arrow/pull/9097#discussion_r551874341 ## File path: cpp/src/parquet/encoding.cc ## @@ -110,11 +110,16 @@ class PlainEncoder : public EncoderImpl, virtual public TypedEncoder { void PutSpace

[GitHub] [arrow] codecov-io commented on pull request #9101: ARROW-11131: [Rust] Improve performance of bool_equal

2021-01-05 Thread GitBox
codecov-io commented on pull request #9101: URL: https://github.com/apache/arrow/pull/9101#issuecomment-754580244 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9101?src=pr&el=h1) Report > Merging [#9101](https://codecov.io/gh/apache/arrow/pull/9101?src=pr&el=desc) (4dd6bfb) into

[GitHub] [arrow] github-actions[bot] commented on pull request #9102: ARROW-10955: [C++] Fix JSON reading of list(null) values

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9102: URL: https://github.com/apache/arrow/pull/9102#issuecomment-754584408 https://issues.apache.org/jira/browse/ARROW-10955 This is an automated message from the Apache Git Ser

[GitHub] [arrow] alamb commented on a change in pull request #9086: [Rust] [DataFusion] [Experiment] Blocking threads filter

2021-01-05 Thread GitBox
alamb commented on a change in pull request #9086: URL: https://github.com/apache/arrow/pull/9086#discussion_r551883985 ## File path: rust/datafusion/src/physical_plan/filter.rs ## @@ -103,25 +103,23 @@ impl ExecutionPlan for FilterExec { } async fn execute(&self, p

[GitHub] [arrow] kszucs opened a new pull request #9103: [CI] Use pip to install crossbow's dependencies for the comment bot

2021-01-05 Thread GitBox
kszucs opened a new pull request #9103: URL: https://github.com/apache/arrow/pull/9103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] alamb commented on a change in pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
alamb commented on a change in pull request #9093: URL: https://github.com/apache/arrow/pull/9093#discussion_r551885031 ## File path: rust/datafusion/tests/sql.rs ## @@ -132,6 +132,7 @@ async fn parquet_single_nan_schema() { } #[tokio::test] +#[ignore = "Test ignored, will

[GitHub] [arrow] alamb commented on a change in pull request #9064: ARROW-11074: [Rust][DataFusion] Implement predicate push-down for parquet tables

2021-01-05 Thread GitBox
alamb commented on a change in pull request #9064: URL: https://github.com/apache/arrow/pull/9064#discussion_r551900306 ## File path: rust/datafusion/src/datasource/parquet.rs ## @@ -62,17 +64,37 @@ impl TableProvider for ParquetTable { self.schema.clone() } +

[GitHub] [arrow] github-actions[bot] commented on pull request #9103: [CI] Use pip to install crossbow's dependencies for the comment bot

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9103: URL: https://github.com/apache/arrow/pull/9103#issuecomment-754610712 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] nevi-me commented on a change in pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
nevi-me commented on a change in pull request #9093: URL: https://github.com/apache/arrow/pull/9093#discussion_r551913784 ## File path: rust/arrow/src/array/equal/list.rs ## @@ -71,45 +79,94 @@ fn offset_value_equal( pub(super) fn list_equal( lhs: &ArrayData, rhs: &A

[GitHub] [arrow] nevi-me commented on a change in pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
nevi-me commented on a change in pull request #9093: URL: https://github.com/apache/arrow/pull/9093#discussion_r551915799 ## File path: rust/datafusion/tests/sql.rs ## @@ -132,6 +132,7 @@ async fn parquet_single_nan_schema() { } #[tokio::test] +#[ignore = "Test ignored, wil

[GitHub] [arrow] nevi-me commented on a change in pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
nevi-me commented on a change in pull request #9093: URL: https://github.com/apache/arrow/pull/9093#discussion_r551919876 ## File path: rust/arrow/src/array/equal/utils.rs ## @@ -76,3 +80,185 @@ pub(super) fn equal_len( ) -> bool { lhs_values[lhs_start..(lhs_start + len)]

[GitHub] [arrow] nevi-me commented on a change in pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
nevi-me commented on a change in pull request #9093: URL: https://github.com/apache/arrow/pull/9093#discussion_r551920615 ## File path: rust/arrow/src/array/equal/utils.rs ## @@ -76,3 +80,185 @@ pub(super) fn equal_len( ) -> bool { lhs_values[lhs_start..(lhs_start + len)]

[GitHub] [arrow] kszucs commented on a change in pull request #9103: ARROW-11132: [CI] Use pip to install crossbow's dependencies for the comment bot

2021-01-05 Thread GitBox
kszucs commented on a change in pull request #9103: URL: https://github.com/apache/arrow/pull/9103#discussion_r551921283 ## File path: .github/workflows/comment_bot.yml ## @@ -34,17 +34,12 @@ jobs: uses: actions/checkout@v2 with: path: arrow -

[GitHub] [arrow] github-actions[bot] commented on pull request #9103: ARROW-11132: [CI] Use pip to install crossbow's dependencies for the comment bot

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9103: URL: https://github.com/apache/arrow/pull/9103#issuecomment-754630798 https://issues.apache.org/jira/browse/ARROW-11132 This is an automated message from the Apache Git Ser

[GitHub] [arrow] codecov-io edited a comment on pull request #9089: ARROW-11122: [Rust] Added FFI support for date and time.

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #9089: URL: https://github.com/apache/arrow/pull/9089#issuecomment-753752955 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9089?src=pr&el=h1) Report > Merging [#9089](https://codecov.io/gh/apache/arrow/pull/9089?src=pr&el=desc) (887df6e)

[GitHub] [arrow] liyafan82 commented on a change in pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #9053: URL: https://github.com/apache/arrow/pull/9053#discussion_r551933603 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java ## @@ -194,10 +194,9 @@ public ArrowMessage(FlightDescripto

[GitHub] [arrow] liyafan82 commented on a change in pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2021-01-05 Thread GitBox
liyafan82 commented on a change in pull request #9053: URL: https://github.com/apache/arrow/pull/9053#discussion_r551933913 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestMetadataVersion.java ## @@ -56,9 +56,8 @@ public static void setUpClass()

[GitHub] [arrow] liyucheng09 commented on pull request #8386: ARROW-10224: [Python] Add support for Python 3.9 except macOS wheel and Windows wheel

2021-01-05 Thread GitBox
liyucheng09 commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-754640646 @terencehonles Thanks a lot. I installed pyarrorw sucessfully. I was just curious that when I type `pip install pyarrow` the error log shows that it was stopped during i

[GitHub] [arrow] alamb commented on pull request #8882: ARROW-10864: [Rust] Use standard ordering for floats

2021-01-05 Thread GitBox
alamb commented on pull request #8882: URL: https://github.com/apache/arrow/pull/8882#issuecomment-754640846 I will try and look at this PR later today This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow] pitrou commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551938144 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -345,6 +345,18 @@ Result IsValid(const Datum& values, ExecContext* ctx = NULLPTR); ARROW_EXPORT Res

[GitHub] [arrow] liyafan82 commented on pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2021-01-05 Thread GitBox
liyafan82 commented on pull request #9053: URL: https://github.com/apache/arrow/pull/9053#issuecomment-754651491 > @liyafan82 does this actually make a difference in benchmarks? I agree it is easier to reason about, but is there any way to avoid backward incompability? @emkornfield S

[GitHub] [arrow] lidavidm commented on a change in pull request #8963: ARROW-10962: [FlightRPC][Java] fill in empty body buffer if needed

2021-01-05 Thread GitBox
lidavidm commented on a change in pull request #8963: URL: https://github.com/apache/arrow/pull/8963#discussion_r551954971 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestBasicOperation.java ## @@ -317,6 +325,71 @@ private void test(BiConsumer c

[GitHub] [arrow] lidavidm commented on a change in pull request #8963: ARROW-10962: [FlightRPC][Java] fill in empty body buffer if needed

2021-01-05 Thread GitBox
lidavidm commented on a change in pull request #8963: URL: https://github.com/apache/arrow/pull/8963#discussion_r551959223 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestBasicOperation.java ## @@ -317,6 +325,71 @@ private void test(BiConsumer c

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551960501 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -345,6 +345,18 @@ Result IsValid(const Datum& values, ExecContext* ctx = NULLPTR); ARROW_EXPORT Result

[GitHub] [arrow] jorisvandenbossche commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-754672429 Trying this out locally, I see the following strange behaviour: ``` In [90]: size = 300 In [91]: table = pa.table({'str': [str(x) for x in range(size)]})

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551969566 ## File path: cpp/src/arrow/compute/kernels/common.h ## @@ -19,6 +19,7 @@ // IWYU pragma: begin_exports +#include Review comment: Initially I incl

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551969646 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -132,6 +156,11 @@ const FunctionDoc is_null_doc("Return true if null",

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551970210 ## File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc ## @@ -31,61 +31,98 @@ namespace arrow { namespace compute { +template class TestValid

[GitHub] [arrow] pitrou commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
pitrou commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754673873 I wouldn't expect a "replace" operation to do this. Instead, this looks more like a "select" operation. @wesm What do you think? --

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551971142 ## File path: docs/source/cpp/compute.rst ## @@ -453,22 +453,26 @@ Structural transforms +==++

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551972306 ## File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc ## @@ -31,61 +31,98 @@ namespace arrow { namespace compute { +template class TestValid

[GitHub] [arrow] bkietz closed pull request #9102: ARROW-10955: [C++] Fix JSON reading of list(null) values

2021-01-05 Thread GitBox
bkietz closed pull request #9102: URL: https://github.com/apache/arrow/pull/9102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551974405 ## File path: cpp/src/arrow/compute/kernels/scalar_validity_test.cc ## @@ -31,61 +31,98 @@ namespace arrow { namespace compute { +template class TestValid

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2021-01-05 Thread GitBox
jorisvandenbossche commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r551972959 ## File path: python/pyarrow/parquet.py ## @@ -319,6 +319,44 @@ def read_row_groups(self, row_groups, columns=None, use_threads=True,

[GitHub] [arrow] bkietz commented on a change in pull request #8894: ARROW-10322: [C++][Dataset] Minimize Expression

2021-01-05 Thread GitBox
bkietz commented on a change in pull request #8894: URL: https://github.com/apache/arrow/pull/8894#discussion_r551978453 ## File path: cpp/src/arrow/dataset/partition_test.cc ## @@ -21,52 +21,51 @@ #include #include -#include #include #include #include #include

[GitHub] [arrow] kszucs closed pull request #9103: ARROW-11132: [CI] Use pip to install crossbow's dependencies for the comment bot

2021-01-05 Thread GitBox
kszucs closed pull request #9103: URL: https://github.com/apache/arrow/pull/9103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-754686227 BTW, the Dataset API also gives a way to get an iterator over record batches (with `Dataset.to_batches()`). The strange thing is that this seems to have another logic o

[GitHub] [arrow] jorisvandenbossche commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-754691714 > The strange thing is that this seems to have another logic of how many rows are included in each batch when crossing row groups, while in the end it is also using `Ge

[GitHub] [arrow] pitrou commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551989364 ## File path: cpp/src/arrow/compute/kernels/common.h ## @@ -19,6 +19,7 @@ // IWYU pragma: begin_exports +#include Review comment: We only add i

[GitHub] [arrow] pitrou commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r551989576 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -345,6 +345,18 @@ Result IsValid(const Datum& values, ExecContext* ctx = NULLPTR); ARROW_EXPORT Res

[GitHub] [arrow] pokemaster7 opened a new issue #9104: Reading Feather File from Custom Offset

2021-01-05 Thread GitBox
pokemaster7 opened a new issue #9104: URL: https://github.com/apache/arrow/issues/9104 Is it possible to embed a feather file in another file (with known offset/length) and read the feather portion in a correct and performant way? Here is a naive idea of what I'm trying to do, though

[GitHub] [arrow] andygrove commented on pull request #9070: ARROW-11030: [Rust][DataFusion] Concatenate left side batches to single batch in HashJoinExec

2021-01-05 Thread GitBox
andygrove commented on pull request #9070: URL: https://github.com/apache/arrow/pull/9070#issuecomment-754694469 @Dandandan Yes, I personally think this is fine for now (hence the approval) and since no-one has objected I think we can go ahead and merge this. -

[GitHub] [arrow] kszucs commented on pull request #8915: ARROW-10904: [Python] Add support for Python 3.9 macOS wheels

2021-01-05 Thread GitBox
kszucs commented on pull request #8915: URL: https://github.com/apache/arrow/pull/8915#issuecomment-754698886 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] nbruno commented on a change in pull request #9088: ARROW-11114: [Java] Fix Schema and Field metadata JSON serialization

2021-01-05 Thread GitBox
nbruno commented on a change in pull request #9088: URL: https://github.com/apache/arrow/pull/9088#discussion_r551997285 ## File path: java/vector/src/test/java/org/apache/arrow/vector/types/pojo/TestField.java ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [arrow] kiszk commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-01-05 Thread GitBox
kiszk commented on a change in pull request #8949: URL: https://github.com/apache/arrow/pull/8949#discussion_r551997558 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/Lz4CompressionCodec.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Soft

[GitHub] [arrow] jorisvandenbossche commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754703794 This might actually rather be the "setitem" kernel as proposed in https://issues.apache.org/jira/browse/ARROW-9430 ? (which is something we want as well)

[GitHub] [arrow] jorgecarleitao commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

2021-01-05 Thread GitBox
jorgecarleitao commented on pull request #9099: URL: https://github.com/apache/arrow/pull/9099#issuecomment-754707400 Note that this is hanging in the CI, which I think is unrelated with the CI itself. This is an automated m

[GitHub] [arrow] andygrove commented on a change in pull request #9086: [Rust] [DataFusion] [Experiment] Blocking threads filter

2021-01-05 Thread GitBox
andygrove commented on a change in pull request #9086: URL: https://github.com/apache/arrow/pull/9086#discussion_r552008306 ## File path: rust/datafusion/src/physical_plan/filter.rs ## @@ -103,25 +103,23 @@ impl ExecutionPlan for FilterExec { } async fn execute(&sel

[GitHub] [arrow] andygrove commented on a change in pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

2021-01-05 Thread GitBox
andygrove commented on a change in pull request #9099: URL: https://github.com/apache/arrow/pull/9099#discussion_r552010480 ## File path: rust/datafusion/src/physical_plan/parquet.rs ## @@ -256,16 +257,21 @@ impl ExecutionPlan for ParquetExec { let projection = self.pr

[GitHub] [arrow] andygrove commented on a change in pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

2021-01-05 Thread GitBox
andygrove commented on a change in pull request #9099: URL: https://github.com/apache/arrow/pull/9099#discussion_r552012528 ## File path: rust/datafusion/src/physical_plan/parquet.rs ## @@ -256,16 +257,21 @@ impl ExecutionPlan for ParquetExec { let projection = self.pr

[GitHub] [arrow] HedgehogCode commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-01-05 Thread GitBox
HedgehogCode commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-754727763 When I use the changes and try to compress and decompress an empty buffer (by using a variable sized vector with only missing values) I get a SIGSEGV: ``` # # A fatal

[GitHub] [arrow] HedgehogCode edited a comment on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-01-05 Thread GitBox
HedgehogCode edited a comment on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-754727763 When I use the changes and try to compress and decompress an empty buffer (by using a variable sized vector with only missing values) I get a SIGSEGV ([hs_err_pid10504

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8894: ARROW-10322: [C++][Dataset] Minimize Expression

2021-01-05 Thread GitBox
jorisvandenbossche commented on a change in pull request #8894: URL: https://github.com/apache/arrow/pull/8894#discussion_r552030630 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -62,10 +63,7 @@ class ARROW_DS_EXPORT ScanOptions { std::shared_ptr ReplaceSchema(std::sha

[GitHub] [arrow] bkietz commented on a change in pull request #8894: ARROW-10322: [C++][Dataset] Minimize Expression

2021-01-05 Thread GitBox
bkietz commented on a change in pull request #8894: URL: https://github.com/apache/arrow/pull/8894#discussion_r552038470 ## File path: cpp/src/arrow/dataset/expression.cc ## @@ -0,0 +1,1177 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

[GitHub] [arrow] westonpace commented on a change in pull request #9100: ARROW-11067: [C++] read_csv_arrow silently fails to read some strings and returns nulls

2021-01-05 Thread GitBox
westonpace commented on a change in pull request #9100: URL: https://github.com/apache/arrow/pull/9100#discussion_r552044312 ## File path: cpp/src/arrow/util/trie.h ## @@ -125,6 +126,9 @@ class ARROW_EXPORT Trie { int32_t Find(util::string_view s) const { const Node* no

[GitHub] [arrow] westonpace commented on a change in pull request #9100: ARROW-11067: [C++] read_csv_arrow silently fails to read some strings and returns nulls

2021-01-05 Thread GitBox
westonpace commented on a change in pull request #9100: URL: https://github.com/apache/arrow/pull/9100#discussion_r552047298 ## File path: cpp/src/arrow/util/trie_test.cc ## @@ -175,6 +175,15 @@ TEST(Trie, EmptyString) { ASSERT_EQ(-1, trie.Find("x")); } +TEST(Trie, LongSt

[GitHub] [arrow] github-actions[bot] commented on pull request #8915: ARROW-10904: [Python] Add support for Python 3.9 macOS wheels

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #8915: URL: https://github.com/apache/arrow/pull/8915#issuecomment-754749536 Revision: 6f13c4c268be10a8c0898d9e297b0303c35dc11f Submitted crossbow builds: [ursa-labs/crossbow @ actions-823](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] bu2 commented on a change in pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on a change in pull request #9023: URL: https://github.com/apache/arrow/pull/9023#discussion_r552049108 ## File path: cpp/src/arrow/compute/kernels/common.h ## @@ -19,6 +19,7 @@ // IWYU pragma: begin_exports +#include Review comment: Good point. Movi

[GitHub] [arrow] pitrou commented on a change in pull request #9105: ARROW-11009: [C++] Allow changing default memory pool with an environment variable

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9105: URL: https://github.com/apache/arrow/pull/9105#discussion_r552053668 ## File path: cpp/src/arrow/dataset/filter.cc ## @@ -704,28 +704,38 @@ using arrow::internal::JoinStrings; std::string AndExpression::ToString() const {

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8894: ARROW-10322: [C++][Dataset] Minimize Expression

2021-01-05 Thread GitBox
jorisvandenbossche commented on a change in pull request #8894: URL: https://github.com/apache/arrow/pull/8894#discussion_r552053599 ## File path: python/pyarrow/tests/parquet/test_dataset.py ## @@ -509,7 +509,7 @@ def test_filters_invalid_column(tempdir, use_legacy_dataset):

[GitHub] [arrow] nealrichardson commented on a change in pull request #8894: ARROW-10322: [C++][Dataset] Minimize Expression

2021-01-05 Thread GitBox
nealrichardson commented on a change in pull request #8894: URL: https://github.com/apache/arrow/pull/8894#discussion_r552055128 ## File path: python/pyarrow/tests/parquet/test_dataset.py ## @@ -509,7 +509,7 @@ def test_filters_invalid_column(tempdir, use_legacy_dataset):

[GitHub] [arrow] bu2 commented on pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
bu2 commented on pull request #9023: URL: https://github.com/apache/arrow/pull/9023#issuecomment-754755701 PR rebased! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow] pitrou opened a new pull request #9105: ARROW-11009: [C++] Allow changing default memory pool with an environment variable

2021-01-05 Thread GitBox
pitrou opened a new pull request #9105: URL: https://github.com/apache/arrow/pull/9105 ARROW_DEFAULT_MEMORY_POOL can take the name of the desired memory pool backend ('jemalloc', 'mimalloc', 'system'). This is an automate

[GitHub] [arrow] terencehonles commented on pull request #8386: ARROW-10224: [Python] Add support for Python 3.9 except macOS wheel and Windows wheel

2021-01-05 Thread GitBox
terencehonles commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-754763790 > @terencehonles Thanks a lot. I installed pyarrorw sucessfully. > > I was just curious that when I type `pip install pyarrow` the error log shows that it was stopped

[GitHub] [arrow] pitrou commented on a change in pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9100: URL: https://github.com/apache/arrow/pull/9100#discussion_r552071145 ## File path: cpp/src/arrow/util/trie.h ## @@ -125,6 +126,9 @@ class ARROW_EXPORT Trie { int32_t Find(util::string_view s) const { const Node* node =

[GitHub] [arrow] pitrou commented on a change in pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
pitrou commented on a change in pull request #9100: URL: https://github.com/apache/arrow/pull/9100#discussion_r552071844 ## File path: cpp/src/arrow/util/trie_test.cc ## @@ -175,6 +175,15 @@ TEST(Trie, EmptyString) { ASSERT_EQ(-1, trie.Find("x")); } +TEST(Trie, LongString

[GitHub] [arrow] pitrou commented on pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
pitrou commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754772279 @westonpace is Github Actions enabled on your fork? This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] bu2 commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
bu2 commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754778829 @jorisvandenbossche: Thank you for bringing up [ARROW-9430](https://issues.apache.org/jira/browse/ARROW-9430). Then @all please tell me what would be a good name. I may h

[GitHub] [arrow] pitrou commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
pitrou commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754786007 "setitem" is confusing to me (I would expect something where you pass indices and values to set at those indices). "select" is used in [Numpy](https://numpy.org/doc/stable/referenc

[GitHub] [arrow] github-actions[bot] commented on pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754789122 https://issues.apache.org/jira/browse/ARROW-11067 This is an automated message from the Apache Git Ser

[GitHub] [arrow] Dandandan commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

2021-01-05 Thread GitBox
Dandandan commented on pull request #9099: URL: https://github.com/apache/arrow/pull/9099#issuecomment-754789551 Locally seems to work now thanks @andygrove . Seems CI jobs are in queue for quite some time This is an automat

[GitHub] [arrow] westonpace commented on pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
westonpace commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754793203 @pitrou I'm pretty sure it is. Is there something I need to check? This is an automated message from the Apach

[GitHub] [arrow] pitrou commented on pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
pitrou commented on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754794793 @westonpace MinGW Windows builds I think. This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow] pitrou edited a comment on pull request #9100: ARROW-11067: [C++] Fix CSV null detection on large values

2021-01-05 Thread GitBox
pitrou edited a comment on pull request #9100: URL: https://github.com/apache/arrow/pull/9100#issuecomment-754794793 @westonpace C++ MinGW Windows builds I think. This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] pitrou commented on pull request #8805: ARROW-10725: [Python][Compute] Expose sort options in Python bindings

2021-01-05 Thread GitBox
pitrou commented on pull request #8805: URL: https://github.com/apache/arrow/pull/8805#issuecomment-754816122 @jorisvandenbossche Is this ready for review again? This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] github-actions[bot] commented on pull request #9105: ARROW-11009: [C++] Allow changing default memory pool with an environment variable

2021-01-05 Thread GitBox
github-actions[bot] commented on pull request #9105: URL: https://github.com/apache/arrow/pull/9105#issuecomment-754816274 https://issues.apache.org/jira/browse/ARROW-11009 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorisvandenbossche commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754822330 > "setitem" is confusing to me (I would expect something where you pass indices and values to set at those indices). That's https://issues.apache.org/jira/browse/

[GitHub] [arrow] pitrou closed pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
pitrou closed pull request #9023: URL: https://github.com/apache/arrow/pull/9023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on pull request #8805: ARROW-10725: [Python][Compute] Expose sort options in Python bindings

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #8805: URL: https://github.com/apache/arrow/pull/8805#issuecomment-754839607 I didn't yet update it regarding the discussion on the keyword arguments API above at https://github.com/apache/arrow/pull/8805#discussion_r542440135 We probably

[GitHub] [arrow] nevi-me closed pull request #9093: ARROW-11125: [Rust] Logical equality for list arrays

2021-01-05 Thread GitBox
nevi-me closed pull request #9093: URL: https://github.com/apache/arrow/pull/9093 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] pitrou commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-01-05 Thread GitBox
pitrou commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-754836583 "if_else", "choose", "select" are all fine with me. This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] jorisvandenbossche commented on pull request #9023: ARROW-11043: [C++] Add "is_nan" kernel

2021-01-05 Thread GitBox
jorisvandenbossche commented on pull request #9023: URL: https://github.com/apache/arrow/pull/9023#issuecomment-754846677 Thanks a lot @bu2 ! This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow] pitrou commented on pull request #9097: ARROW-10881: [C++] Fix EXC_BAD_ACCESS in PutSpaced

2021-01-05 Thread GitBox
pitrou commented on pull request #9097: URL: https://github.com/apache/arrow/pull/9097#issuecomment-754828459 CI tests are green on my fork (except for the usual Homebrew dependency install failures). This is an automated me

  1   2   >