[GitHub] [arrow-datafusion] xudong963 commented on issue #1515: high level roadmap for Arrow / Datafusion

2022-01-03 Thread GitBox
xudong963 commented on issue #1515: URL: https://github.com/apache/arrow-datafusion/issues/1515#issuecomment-1003966933 > > Make sense to me. > > BTW, I don't have edit permission in https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit > > @xudong

[GitHub] [arrow-datafusion] Igosuki commented on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
Igosuki commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1003995167 I merged the latest master and did some necessary updates to arrow2 here https://github.com/Igosuki/arrow-datafusion/tree/arrow22r (didn't fix handling timestamp logical t

[GitHub] [arrow-rs] alamb commented on a change in pull request #1127: *_dyn_scalar kernels: Support Float32Array and Float64Array,

2022-01-03 Thread GitBox
alamb commented on a change in pull request #1127: URL: https://github.com/apache/arrow-rs/pull/1127#discussion_r777220437 ## File path: arrow/src/array/array.rs ## @@ -227,6 +227,26 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual { /// A reference-counted reference t

[GitHub] [arrow-rs] alamb commented on pull request #1127: *_dyn_scalar kernels: Support Float32Array and Float64Array,

2022-01-03 Thread GitBox
alamb commented on pull request #1127: URL: https://github.com/apache/arrow-rs/pull/1127#issuecomment-1004026581 > I will review this pr, if it's ready. Thank you @liukun4515 -- the only reason I didn't mark it as ready is I wasn't sure about the change in https://github.com/apache/

[GitHub] [arrow] amol- commented on pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
amol- commented on pull request #11886: URL: https://github.com/apache/arrow/pull/11886#issuecomment-1004028229 > I'm only thinking about this now, but does this function work on chunked arrays? Can you add a test for that? Added a test for chunkedarray -- This is an automated mes

[GitHub] [arrow-datafusion] alamb commented on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
alamb commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1004031259 This is very cool work @Igosuki 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow-rs] alamb commented on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
alamb commented on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004033560 > Would it work for the kernels to take impl AsRef or similar. I tried doing that in > I think implementing Array for ArrayRef as proposed will otherwise result in two

[GitHub] [arrow-rs] alamb edited a comment on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
alamb edited a comment on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004033560 > Would it work for the kernels to take impl AsRef or similar. I tried doing that in -- I got stuck in lifetimes with `as_primive_array` -- will get a cutdown example

[GitHub] [arrow] rok commented on a change in pull request #11990: ARROW-15032: [C++] Add DateStruct Function

2022-01-03 Thread GitBox
rok commented on a change in pull request #11990: URL: https://github.com/apache/arrow/pull/11990#discussion_r777436434 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -580,6 +627,7 @@ TEST_F(ScalarTemporalTest, TestZoned2) { CheckScalarUnary("yea

[GitHub] [arrow-datafusion] yjshen commented on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
yjshen commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1004049147 oops, I think we've done something repetitive here. @houqp and I also merged the latest master except for the Avro feature [here](https://github.com/houqp/arrow-datafusion

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
yjshen edited a comment on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1004049147 oops, I think we've done something repetitive. @houqp and I also merged the latest master except for the Avro feature [here](https://github.com/houqp/arrow-datafusi

[GitHub] [arrow-rs] alamb commented on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
alamb commented on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004051178 So what I am really trying to do is to the following ## Initial state: ```rust pub fn as_primitive_array(arr: &ArrayRef) -> &PrimitiveArray where T: ArrowPri

[GitHub] [arrow] ahadnagy commented on pull request #11863: ARROW-14906: [C++] Enable CSV Writer to control the type of escape used for quoting

2022-01-03 Thread GitBox
ahadnagy commented on pull request #11863: URL: https://github.com/apache/arrow/pull/11863#issuecomment-1004062137 > 2. It seems it would be nice to have a common setting for quoting and escaping (either you quote or you escape special chars, not both at once) These are different opt

[GitHub] [arrow] pitrou commented on pull request #11863: ARROW-14906: [C++] Enable CSV Writer to control the type of escape used for quoting

2022-01-03 Thread GitBox
pitrou commented on pull request #11863: URL: https://github.com/apache/arrow/pull/11863#issuecomment-1004066106 @ahadnagy If you have both quoting and escaping enabled, how do you decide whether you should quote or escape a special character? -- This is an automated message from the Apa

[GitHub] [arrow-datafusion] alamb commented on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
alamb commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1004066438 @yjshen / @houqp / @Igosuki -- how close would you say your branches using arrow2 are ready to be used more broadly? Specifically, are you at the point where it would be

[GitHub] [arrow] pitrou commented on pull request #11841: ARROW-13923: [C++] Faster CSV chunker with long CSV cells

2022-01-03 Thread GitBox
pitrou commented on pull request #11841: URL: https://github.com/apache/arrow/pull/11841#issuecomment-1004068831 The performance seems unfortunately fragile with this approach. Updated benchmark numbers (again): ```

[GitHub] [arrow-rs] alamb commented on pull request #1130: Fix reading of dictionary encoded pages with null values (#1111)

2022-01-03 Thread GitBox
alamb commented on pull request #1130: URL: https://github.com/apache/arrow-rs/pull/1130#issuecomment-1004071113 Thank you @yordan-pavlov -- I plan to review / test this patch later today -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] ahadnagy commented on pull request #11863: ARROW-14906: [C++] Enable CSV Writer to control the type of escape used for quoting

2022-01-03 Thread GitBox
ahadnagy commented on pull request #11863: URL: https://github.com/apache/arrow/pull/11863#issuecomment-1004072707 Currently, the escape character is being used (this aligns with readr). The Pandas csv_writer has a `doublequote` parameter to control quoting/escaping inside a field explic

[GitHub] [arrow] pitrou commented on pull request #11863: ARROW-14906: [C++] Enable CSV Writer to control the type of escape used for quoting

2022-01-03 Thread GitBox
pitrou commented on pull request #11863: URL: https://github.com/apache/arrow/pull/11863#issuecomment-1004074040 Right, but this also makes the whole API confusing (why is the escape character being used? what if both double-quoting and escaping are enabled?). -- This is an automated mes

[GitHub] [arrow] lidavidm commented on a change in pull request #11990: ARROW-15032: [C++] Add DateStruct Function

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11990: URL: https://github.com/apache/arrow/pull/11990#discussion_r77742 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -546,6 +591,23 @@ TEST_F(ScalarTemporalTest, TestZoned2) { auto unit = tim

[GitHub] [arrow-rs] tustvold commented on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
tustvold commented on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004081792 Perhaps `fn as_primitive_array(arr: &impl AsRef) -> &PrimitiveArray`? If you explicitly specify the lifetimes it is easier to see what the compiler is complaining about

[GitHub] [arrow-rs] tustvold edited a comment on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
tustvold edited a comment on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004081792 Perhaps `fn as_primitive_array(arr: &impl AsRef) -> &PrimitiveArray`? If you explicitly specify the lifetimes it is easier to see what the compiler is complaining ab

[GitHub] [arrow] rok commented on a change in pull request #11990: ARROW-15032: [C++] Add DateStruct Function

2022-01-03 Thread GitBox
rok commented on a change in pull request #11990: URL: https://github.com/apache/arrow/pull/11990#discussion_r777474147 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -546,6 +591,23 @@ TEST_F(ScalarTemporalTest, TestZoned2) { auto unit = timestam

[GitHub] [arrow-rs] tustvold edited a comment on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
tustvold edited a comment on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004081792 Perhaps `fn as_primitive_array(arr: &impl AsRef) -> &PrimitiveArray`? If you explicitly specify the lifetimes it is easier to see what the compiler is complaining ab

[GitHub] [arrow] paleolimbot opened a new pull request #12062: ARROW-15173: [R] Provide backward compatibility for bridge to older versions of pyarrow

2022-01-03 Thread GitBox
paleolimbot opened a new pull request #12062: URL: https://github.com/apache/arrow/pull/12062 This PR updates the pointer logic that changed in #12011 (ARROW-15169) to make sure that users can use the arrow R package to communicate with pyarrow that hasn't been upgraded yet. Should

[GitHub] [arrow] github-actions[bot] commented on pull request #12062: ARROW-15173: [R] Provide backward compatibility for bridge to older versions of pyarrow

2022-01-03 Thread GitBox
github-actions[bot] commented on pull request #12062: URL: https://github.com/apache/arrow/pull/12062#issuecomment-1004093200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-rs] tustvold edited a comment on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
tustvold edited a comment on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004081792 Perhaps `fn as_primitive_array(arr: &impl AsRef) -> &PrimitiveArray`? If you explicitly specify the lifetimes it is easier to see what the compiler is complaining ab

[GitHub] [arrow-rs] tustvold edited a comment on issue #1128: Implement `Array` for `ArrayRef`

2022-01-03 Thread GitBox
tustvold edited a comment on issue #1128: URL: https://github.com/apache/arrow-rs/issues/1128#issuecomment-1004081792 Perhaps `fn as_primitive_array(arr: &impl AsRef) -> &PrimitiveArray`? If you explicitly specify the lifetimes it is easier to see what the compiler is complaining ab

[GitHub] [arrow] pitrou commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
pitrou commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r777481461 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); } +//

[GitHub] [arrow] jonkeane commented on pull request #12062: ARROW-15173: [R] Provide backward compatibility for bridge to older versions of pyarrow

2022-01-03 Thread GitBox
jonkeane commented on pull request #12062: URL: https://github.com/apache/arrow/pull/12062#issuecomment-1004105805 > ...but we don't have a good way to test against the old pyarrow version in our tests (unless @jonkeane can think of one!) I'm not sure we need to test the backwards co

[GitHub] [arrow-rs] andyredhead commented on issue #180: Parquet does not support wasm32-unknown-unknown target

2022-01-03 Thread GitBox
andyredhead commented on issue #180: URL: https://github.com/apache/arrow-rs/issues/180#issuecomment-1004108660 I had a play with getting a minimalist apache (v1) arrow & parquet to compile to wasm32-unknown-unknown back in July/August (2021) when the released version of arrow was ~5.0. It

[GitHub] [arrow] paleolimbot commented on pull request #12062: ARROW-15173: [R] Provide backward compatibility for bridge to older versions of pyarrow

2022-01-03 Thread GitBox
paleolimbot commented on pull request #12062: URL: https://github.com/apache/arrow/pull/12062#issuecomment-1004116153 I think there's a JIRA ticket for testing on Windows (if there isn't, I'll make one). Maybe that's a good ticket to use for adding a few tests for whatever `pip3 install py

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777509436 ## File path: cpp/src/arrow/compute/kernels/vector_replace.cc ## @@ -489,17 +872,57 @@ void RegisterVectorReplace(FunctionRegistry* registry) { } a

[GitHub] [arrow] nirandaperera commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
nirandaperera commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777521811 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, DropNullTableWit

[GitHub] [arrow] amol- commented on pull request #11726: ARROW-14738: [Python][Doc] Make return types clickable

2022-01-03 Thread GitBox
amol- commented on pull request #11726: URL: https://github.com/apache/arrow/pull/11726#issuecomment-1004145688 Given that further formatting discussions are probably expected to happen in the theme issue ( https://github.com/pydata/pydata-sphinx-theme/issues/527 ) should we ship this to m

[GitHub] [arrow] nirandaperera commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
nirandaperera commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777521811 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, DropNullTableWit

[GitHub] [arrow] pitrou commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
pitrou commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777518602 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, DropNullTableWithSlices

[GitHub] [arrow] pitrou commented on pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
pitrou commented on pull request #11886: URL: https://github.com/apache/arrow/pull/11886#issuecomment-1004157346 Note the crash on Python CI is unrelated, I've created https://issues.apache.org/jira/browse/ARROW-15234 for it. -- This is an automated message from the Apache Git Service. T

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1518: remove python wrapper and redirect to the contrib repo

2022-01-03 Thread GitBox
Jimexist opened a new pull request #1518: URL: https://github.com/apache/arrow-datafusion/pull/1518 # Which issue does this PR close? Closes #1324 # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow] pitrou opened a new pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
pitrou opened a new pull request #12063: URL: https://github.com/apache/arrow/pull/12063 The crash would happen at handler destruction if called from a non-Python thread that doesn't hold the GIL. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
github-actions[bot] commented on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004159523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] amol- commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
amol- commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777539373 ## File path: docs/source/cpp/compute.rst ## @@ -1549,6 +1549,17 @@ These functions select and return a subset of their input. * \(4) For each element *i*

[GitHub] [arrow-datafusion] Jimexist commented on pull request #1514: Add __version__ attribute to python library

2022-01-03 Thread GitBox
Jimexist commented on pull request #1514: URL: https://github.com/apache/arrow-datafusion/pull/1514#issuecomment-1004166862 after https://github.com/apache/arrow-datafusion/pull/1518 is merged please consider merging to that repo instead. thanks! -- This is an automated message from the

[GitHub] [arrow-datafusion] Jimexist commented on pull request #1508: Ship Cargo.lock in the source distribution

2022-01-03 Thread GitBox
Jimexist commented on pull request #1508: URL: https://github.com/apache/arrow-datafusion/pull/1508#issuecomment-1004167073 after https://github.com/apache/arrow-datafusion/pull/1518 is merged please consider merging to that repo instead. thanks! -- This is an automated message from the

[GitHub] [arrow] jonkeane commented on pull request #11751: ARROW-14694: [R] Let me dput a schema

2022-01-03 Thread GitBox
jonkeane commented on pull request #11751: URL: https://github.com/apache/arrow/pull/11751#issuecomment-1004167282 This looks good to me — I've also rebased to resolve conflicts (though someone should probably take a second glance at that and make sure I didn't futz up any of the changes!)

[GitHub] [arrow-datafusion] Igosuki commented on pull request #68: Experimenting with arrow2

2022-01-03 Thread GitBox
Igosuki commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1004169207 I only did it to test out the latest versions of both arrow2 and datafusion. There are breaking changes, and i had to make some small updates to arrow2. Additionally,

[GitHub] [arrow-rs] paddyhoran commented on issue #1120: More frequent major releases for arrow-rs

2022-01-03 Thread GitBox
paddyhoran commented on issue #1120: URL: https://github.com/apache/arrow-rs/issues/1120#issuecomment-1004170916 Also another pro is that we can hopefully bring arrow2 back into the fold (though that would need confirmation/additional work). As I remember it, @jorgecarleitao's main sticki

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777545990 ## File path: cpp/src/arrow/compute/kernels/vector_replace_test.cc ## @@ -793,5 +851,772 @@ TYPED_TEST(TestReplaceBinary, ReplaceWithMaskRandom) { }

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777547587 ## File path: cpp/src/arrow/compute/kernels/vector_replace.cc ## @@ -442,23 +442,414 @@ struct ReplaceWithMaskFunctor { } return ReplaceWithMas

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777548551 ## File path: cpp/src/arrow/compute/kernels/vector_replace.cc ## @@ -442,23 +442,414 @@ struct ReplaceWithMaskFunctor { } return ReplaceWithMas

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777552367 ## File path: cpp/src/arrow/compute/kernels/vector_replace.cc ## @@ -442,23 +442,460 @@ struct ReplaceWithMaskFunctor { } return ReplaceWithMas

[GitHub] [arrow] lidavidm commented on a change in pull request #11853: ARROW-1699: [C++] forward, backward fill kernel functions

2022-01-03 Thread GitBox
lidavidm commented on a change in pull request #11853: URL: https://github.com/apache/arrow/pull/11853#discussion_r777552716 ## File path: cpp/src/arrow/compute/kernels/vector_replace.cc ## @@ -442,23 +442,409 @@ struct ReplaceWithMaskFunctor { } return ReplaceWithMas

[GitHub] [arrow-datafusion] maxburke commented on pull request #1477: Fix SortExec discards field metadata on the output schema

2022-01-03 Thread GitBox
maxburke commented on pull request #1477: URL: https://github.com/apache/arrow-datafusion/pull/1477#issuecomment-1004187409 @alamb If I recall the reason this change was made was because in our cases the schema was stripped of the timezone information but the underlying record batches pre

[GitHub] [arrow] github-actions[bot] commented on pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
github-actions[bot] commented on pull request #12064: URL: https://github.com/apache/arrow/pull/12064#issuecomment-1004188327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] amol- commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
amol- commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777564422 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, DropNullTableWithSlices)

[GitHub] [arrow] amol- commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
amol- commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777564836 ## File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc ## @@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, DropNullTableWithSlices)

[GitHub] [arrow] coryan commented on pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-03 Thread GitBox
coryan commented on pull request #11996: URL: https://github.com/apache/arrow/pull/11996#issuecomment-1004194721 Ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [arrow-datafusion] tfeda closed pull request #1514: Add __version__ attribute to python library

2022-01-03 Thread GitBox
tfeda closed pull request #1514: URL: https://github.com/apache/arrow-datafusion/pull/1514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] tfeda commented on pull request #1514: Add __version__ attribute to python library

2022-01-03 Thread GitBox
tfeda commented on pull request #1514: URL: https://github.com/apache/arrow-datafusion/pull/1514#issuecomment-1004201457 Ah, great timing. I'll do that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-datafusion] cpcloud commented on pull request #1508: Ship Cargo.lock in the source distribution

2022-01-03 Thread GitBox
cpcloud commented on pull request #1508: URL: https://github.com/apache/arrow-datafusion/pull/1508#issuecomment-1004216889 Ok, thanks. I'll just proactively move it over -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] nirandaperera commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
nirandaperera commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777588759 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2378,26 +2378,44 @@ struct NonZeroVisitor { using T = typename GetViewTy

[GitHub] [arrow] lidavidm commented on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
lidavidm commented on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004223585 Ah, I do see it failing elsewhere in CI. I think the existing tests are sufficient then and we should get in the fix. -- This is an automated message from the Apache Git Ser

[GitHub] [arrow] lidavidm closed pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
lidavidm closed pull request #12063: URL: https://github.com/apache/arrow/pull/12063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] ursabot commented on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
ursabot commented on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004225974 Benchmark runs are scheduled for baseline = c6143a2396058dcc31506050238dc0f932aae9ba and contender = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d. 47f6bc3976bcbabbd64c8bdf7f5e00cb

[GitHub] [arrow] ursabot edited a comment on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004225974 Benchmark runs are scheduled for baseline = c6143a2396058dcc31506050238dc0f932aae9ba and contender = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d. 47f6bc3976bcbabbd64c8bdf7

[GitHub] [arrow] pitrou commented on pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
pitrou commented on pull request #12064: URL: https://github.com/apache/arrow/pull/12064#issuecomment-1004238546 CI failures are unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] ursabot edited a comment on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004225974 Benchmark runs are scheduled for baseline = c6143a2396058dcc31506050238dc0f932aae9ba and contender = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d. 47f6bc3976bcbabbd64c8bdf7

[GitHub] [arrow-datafusion] rdettai commented on issue #1504: The destruction of GroupState in high cardinality aggregation takes a lot of time

2022-01-03 Thread GitBox
rdettai commented on issue #1504: URL: https://github.com/apache/arrow-datafusion/issues/1504#issuecomment-1004244530 Thanks for the analysis @ic4y ! I am quite surprised we pay the fragmentation that comes from the row oriented structure of `Accumulators` that much more at de-allocation

[GitHub] [arrow] lidavidm closed pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
lidavidm closed pull request #12064: URL: https://github.com/apache/arrow/pull/12064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] ursabot commented on pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
ursabot commented on pull request #12064: URL: https://github.com/apache/arrow/pull/12064#issuecomment-1004249768 Benchmark runs are scheduled for baseline = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d and contender = 27c264cfed14800e55a49fbcb7fa404efb68b7c8. 27c264cfed14800e55a49fbcb7fa404e

[GitHub] [arrow] kszucs closed pull request #12058: ARROW-15231: [Packaging][deb] Add missing ArrowFlight-1.0.typelib

2022-01-03 Thread GitBox
kszucs closed pull request #12058: URL: https://github.com/apache/arrow/pull/12058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] ursabot edited a comment on pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12064: URL: https://github.com/apache/arrow/pull/12064#issuecomment-1004249768 Benchmark runs are scheduled for baseline = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d and contender = 27c264cfed14800e55a49fbcb7fa404efb68b7c8. 27c264cfed14800e55a49fbcb

[GitHub] [arrow] ursabot commented on pull request #12058: ARROW-15231: [Packaging][deb] Add missing ArrowFlight-1.0.typelib

2022-01-03 Thread GitBox
ursabot commented on pull request #12058: URL: https://github.com/apache/arrow/pull/12058#issuecomment-1004254801 Benchmark runs are scheduled for baseline = 27c264cfed14800e55a49fbcb7fa404efb68b7c8 and contender = cb1897ee0d20cfbaad1d879573362ce29c3e11b0. cb1897ee0d20cfbaad1d879573362ce2

[GitHub] [arrow] ursabot edited a comment on pull request #12064: ARROW-12042: [C++] Fix array_sort_indices on chunked arrays

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12064: URL: https://github.com/apache/arrow/pull/12064#issuecomment-1004249768 Benchmark runs are scheduled for baseline = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d and contender = 27c264cfed14800e55a49fbcb7fa404efb68b7c8. 27c264cfed14800e55a49fbcb

[GitHub] [arrow] ursabot edited a comment on pull request #12058: ARROW-15231: [Packaging][deb] Add missing ArrowFlight-1.0.typelib

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12058: URL: https://github.com/apache/arrow/pull/12058#issuecomment-1004254801 Benchmark runs are scheduled for baseline = 27c264cfed14800e55a49fbcb7fa404efb68b7c8 and contender = cb1897ee0d20cfbaad1d879573362ce29c3e11b0. cb1897ee0d20cfbaad1d87957

[GitHub] [arrow] ursabot edited a comment on pull request #12058: ARROW-15231: [Packaging][deb] Add missing ArrowFlight-1.0.typelib

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12058: URL: https://github.com/apache/arrow/pull/12058#issuecomment-1004254801 Benchmark runs are scheduled for baseline = 27c264cfed14800e55a49fbcb7fa404efb68b7c8 and contender = cb1897ee0d20cfbaad1d879573362ce29c3e11b0. cb1897ee0d20cfbaad1d87957

[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

2022-01-03 Thread GitBox
JabariBooker commented on a change in pull request #12014: URL: https://github.com/apache/arrow/pull/12014#discussion_r777631342 ## File path: cpp/src/arrow/array/validate.cc ## @@ -166,6 +166,80 @@ struct ValidateArrayImpl { return Status::OK(); } + Status Visit(con

[GitHub] [arrow] ursabot edited a comment on pull request #12063: ARROW-15234: [Python] Fix crash with custom CSV invalid row handler

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #12063: URL: https://github.com/apache/arrow/pull/12063#issuecomment-1004225974 Benchmark runs are scheduled for baseline = c6143a2396058dcc31506050238dc0f932aae9ba and contender = 47f6bc3976bcbabbd64c8bdf7f5e00cb0223f78d. 47f6bc3976bcbabbd64c8bdf7

[GitHub] [arrow] pitrou opened a new pull request #12065: ARROW-9483: [C++] Reorganize testing headers

2022-01-03 Thread GitBox
pitrou opened a new pull request #12065: URL: https://github.com/apache/arrow/pull/12065 Include less headers transitively from `gtest_util.h`. Remove `gtest_common.h`, use `RandomArrayGenerator` instead. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] github-actions[bot] commented on pull request #12065: ARROW-9483: [C++] Reorganize testing headers

2022-01-03 Thread GitBox
github-actions[bot] commented on pull request #12065: URL: https://github.com/apache/arrow/pull/12065#issuecomment-1004278248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] thisisnic closed pull request #11992: ARROW-14653: [R] head() hangs on CSV datasets > 600MB

2022-01-03 Thread GitBox
thisisnic closed pull request #11992: URL: https://github.com/apache/arrow/pull/11992 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] wjones127 commented on a change in pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-03 Thread GitBox
wjones127 commented on a change in pull request #12007: URL: https://github.com/apache/arrow/pull/12007#discussion_r777642358 ## File path: docs/source/python/data.rst ## @@ -264,6 +270,32 @@ individual arrays, and no copy is involved: arr.type arr +Map arrays +~~

[GitHub] [arrow] wjones127 commented on a change in pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-03 Thread GitBox
wjones127 commented on a change in pull request #12007: URL: https://github.com/apache/arrow/pull/12007#discussion_r777643176 ## File path: python/pyarrow/tests/test_array.py ## @@ -2643,6 +2643,30 @@ def test_fixed_size_list_array_flatten(): assert arr2.flatten().flatten(

[GitHub] [arrow] wjones127 commented on a change in pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-03 Thread GitBox
wjones127 commented on a change in pull request #12007: URL: https://github.com/apache/arrow/pull/12007#discussion_r777643176 ## File path: python/pyarrow/tests/test_array.py ## @@ -2643,6 +2643,30 @@ def test_fixed_size_list_array_flatten(): assert arr2.flatten().flatten(

[GitHub] [arrow] wjones127 commented on a change in pull request #11855: ARROW-13735: [C++][Python] Creating a Map array with non-default field names segfaults

2022-01-03 Thread GitBox
wjones127 commented on a change in pull request #11855: URL: https://github.com/apache/arrow/pull/11855#discussion_r777643434 ## File path: cpp/src/arrow/array/array_list_test.cc ## @@ -717,6 +717,29 @@ TEST_F(TestMapArray, BuildingStringToInt) { ASSERT_ARRAYS_EQUAL(*actual,

[GitHub] [arrow] pitrou commented on a change in pull request #11886: ARROW-13035: [C++] indices_nonzero compute function

2022-01-03 Thread GitBox
pitrou commented on a change in pull request #11886: URL: https://github.com/apache/arrow/pull/11886#discussion_r777643128 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2355,6 +2358,98 @@ const FunctionDoc array_take_doc( "given by `indices`. Nul

[GitHub] [arrow] ursabot commented on pull request #11992: ARROW-14653: [R] head() hangs on CSV datasets > 600MB

2022-01-03 Thread GitBox
ursabot commented on pull request #11992: URL: https://github.com/apache/arrow/pull/11992#issuecomment-1004287116 Benchmark runs are scheduled for baseline = cb1897ee0d20cfbaad1d879573362ce29c3e11b0 and contender = 762fad5e5d1499b20db81a75cbc448c1ef6fca03. 762fad5e5d1499b20db81a75cbc448c1

[GitHub] [arrow] pitrou commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

2022-01-03 Thread GitBox
pitrou commented on a change in pull request #12014: URL: https://github.com/apache/arrow/pull/12014#discussion_r777648679 ## File path: cpp/src/arrow/array/validate.cc ## @@ -166,6 +166,80 @@ struct ValidateArrayImpl { return Status::OK(); } + Status Visit(const Dat

[GitHub] [arrow] ursabot edited a comment on pull request #11992: ARROW-14653: [R] head() hangs on CSV datasets > 600MB

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #11992: URL: https://github.com/apache/arrow/pull/11992#issuecomment-1004287116 Benchmark runs are scheduled for baseline = cb1897ee0d20cfbaad1d879573362ce29c3e11b0 and contender = 762fad5e5d1499b20db81a75cbc448c1ef6fca03. 762fad5e5d1499b20db81a75c

[GitHub] [arrow] paleolimbot commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
paleolimbot commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r777654597 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); }

[GitHub] [arrow] westonpace commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

2022-01-03 Thread GitBox
westonpace commented on a change in pull request #12031: URL: https://github.com/apache/arrow/pull/12031#discussion_r777654664 ## File path: cpp/src/arrow/compute/exec/plan_test.cc ## @@ -301,11 +301,11 @@ TEST(ExecPlan, ToString) { {"sink", SinkNodeOptions

[GitHub] [arrow] ursabot edited a comment on pull request #11992: ARROW-14653: [R] head() hangs on CSV datasets > 600MB

2022-01-03 Thread GitBox
ursabot edited a comment on pull request #11992: URL: https://github.com/apache/arrow/pull/11992#issuecomment-1004287116 Benchmark runs are scheduled for baseline = cb1897ee0d20cfbaad1d879573362ce29c3e11b0 and contender = 762fad5e5d1499b20db81a75cbc448c1ef6fca03. 762fad5e5d1499b20db81a75c

[GitHub] [arrow] dhruv9vats commented on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-03 Thread GitBox
dhruv9vats commented on pull request #11946: URL: https://github.com/apache/arrow/pull/11946#issuecomment-1004303644 Added some basic tests. If more sophisticated ones are also needed, could you give a brief outline of those? @pitrou -- This is an automated message from the Apache G

[GitHub] [arrow] lafiona commented on pull request #12004: ARROW-13185: [MATLAB] Create a single MEX gateway function which delegates to specific C++ functions

2022-01-03 Thread GitBox
lafiona commented on pull request #12004: URL: https://github.com/apache/arrow/pull/12004#issuecomment-1004306539 @edponce, `mexfcn` only receives input arguments from MATLAB. End users will interact with the MATLAB APIs, which will make use of `mexfcn`. `mexfcn` will dispatch to individua

[GitHub] [arrow] westonpace commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

2022-01-03 Thread GitBox
westonpace commented on pull request #12031: URL: https://github.com/apache/arrow/pull/12031#issuecomment-1004306629 I have a test plan which is a hard-coded version of one of the TPC-H queries and, using this, I think indentation is missing in the GroupByNode's aggregates. But, this is m

[GitHub] [arrow] paleolimbot commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
paleolimbot commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r77766 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); }

[GitHub] [arrow] paleolimbot commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
paleolimbot commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r777666240 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); }

[GitHub] [arrow] edponce commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-03 Thread GitBox
edponce commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r777666713 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -156,39 +210,50 @@ struct Maximum { } }; +// Check if timestamp timezones are co

[GitHub] [arrow] paleolimbot commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
paleolimbot commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r777668121 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); }

[GitHub] [arrow] paleolimbot commented on a change in pull request #12030: ARROW-9186: [R] Allow specifying CSV file encoding

2022-01-03 Thread GitBox
paleolimbot commented on a change in pull request #12030: URL: https://github.com/apache/arrow/pull/12030#discussion_r777670303 ## File path: r/src/io.cpp ## @@ -178,4 +180,134 @@ void io___BufferOutputStream__Write( StopIfNotOk(stream->Write(RAW(bytes), bytes.size())); }

  1   2   >