[GitHub] [arrow] vibhatha commented on a change in pull request #12267: ARROW-15258: [C++] Easy options to create a source node from a table

2022-02-14 Thread GitBox
vibhatha commented on a change in pull request #12267: URL: https://github.com/apache/arrow/pull/12267#discussion_r806544545 ## File path: cpp/src/arrow/compute/exec/source_node.cc ## @@ -174,12 +177,82 @@ struct SourceNode : ExecNode { AsyncGenerator> generator_; }; +str

[GitHub] [arrow-datafusion] seddonm1 commented on issue #1836: Register multiple tables into `ExecutionContext` at once

2022-02-14 Thread GitBox
seddonm1 commented on issue #1836: URL: https://github.com/apache/arrow-datafusion/issues/1836#issuecomment-1039961634 Given that some of the API is still changing (like https://github.com/apache/arrow-datafusion/pull/1779) - which in my head means that requirements are not fully understo

[GitHub] [arrow] vibhatha commented on a change in pull request #12267: ARROW-15258: [C++] Easy options to create a source node from a table

2022-02-14 Thread GitBox
vibhatha commented on a change in pull request #12267: URL: https://github.com/apache/arrow/pull/12267#discussion_r806541037 ## File path: cpp/src/arrow/compute/exec/source_node.cc ## @@ -174,12 +177,82 @@ struct SourceNode : ExecNode { AsyncGenerator> generator_; }; +str

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039741409 Benchmark runs are scheduled for baseline = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455 and contender = 49c6849acdaf8450276e93ccafe96aa6972d5a28. Results will be available

[GitHub] [arrow] ursabot edited a comment on pull request #12313: ARROW-15351: [Doc][Guide] Additional tutorial for R bindings

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12313: URL: https://github.com/apache/arrow/pull/12313#issuecomment-1038908489 Benchmark runs are scheduled for baseline = 90edde22090dbb97a5f3728f476511c5b5c388de and contender = 3cc7df895e0cf2bf0aa1ac1977c20e5f93ec2f25. 3cc7df895e0cf2bf0aa1ac197

[GitHub] [arrow] HaykManukyanAvetiky closed issue #12413: Pyarrow write dataset ignores delimiter

2022-02-14 Thread GitBox
HaykManukyanAvetiky closed issue #12413: URL: https://github.com/apache/arrow/issues/12413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow] HaykManukyanAvetiky commented on issue #12413: Pyarrow write dataset ignores delimiter

2022-02-14 Thread GitBox
HaykManukyanAvetiky commented on issue #12413: URL: https://github.com/apache/arrow/issues/12413#issuecomment-1039915015 ok thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow] ursabot edited a comment on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1296: Implement an iterator for DictionaryArray

2022-02-14 Thread GitBox
codecov-commenter edited a comment on pull request #1296: URL: https://github.com/apache/arrow-rs/pull/1296#issuecomment-1034646600 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm

[GitHub] [arrow-datafusion] houqp commented on pull request #1834: protobuf OctetLength should be deserialized as octet_length, not length

2022-02-14 Thread GitBox
houqp commented on pull request #1834: URL: https://github.com/apache/arrow-datafusion/pull/1834#issuecomment-1039897790 Thanks @carols10cents -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] houqp merged pull request #1834: protobuf OctetLength should be deserialized as octet_length, not length

2022-02-14 Thread GitBox
houqp merged pull request #1834: URL: https://github.com/apache/arrow-datafusion/pull/1834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] houqp closed issue #1833: OctetLength in protobuf is getting deserialized to length, not octet_length

2022-02-14 Thread GitBox
houqp closed issue #1833: URL: https://github.com/apache/arrow-datafusion/issues/1833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1823: implement bitmap_distinct function using bitmap

2022-02-14 Thread GitBox
Ted-Jiang commented on issue #1823: URL: https://github.com/apache/arrow-datafusion/issues/1823#issuecomment-1039896766 Test result of use `roaring-rs` and `croaring-rs` (use same logic: insert one value one time) 1million_rows_10thousand_distinct.parquet ``` bitmap distinct

[GitHub] [arrow] cyb70289 commented on pull request #12399: ARROW-14993: [C++] Benchmark CSV writer

2022-02-14 Thread GitBox
cyb70289 commented on pull request #12399: URL: https://github.com/apache/arrow/pull/12399#issuecomment-1039896632 Benchmark result on xeon gold 5218, clang-12. ``` --- Benchmark

[GitHub] [arrow] cyb70289 removed a comment on pull request #12399: ARROW-14993: [C++] Benchmark CSV writer

2022-02-14 Thread GitBox
cyb70289 removed a comment on pull request #12399: URL: https://github.com/apache/arrow/pull/12399#issuecomment-1035881525 Example output: ``` --- Benchmark Time

[GitHub] [arrow-datafusion] matthewmturner commented on issue #1836: Register multiple tables into `ExecutionContext` at once

2022-02-14 Thread GitBox
matthewmturner commented on issue #1836: URL: https://github.com/apache/arrow-datafusion/issues/1836#issuecomment-1039876376 @seddonm1 @alamb @houqp im interested in your thoughts on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] matthewmturner opened a new issue #1836: Register multiple tables into `ExecutionContext` at once

2022-02-14 Thread GitBox
matthewmturner opened a new issue #1836: URL: https://github.com/apache/arrow-datafusion/issues/1836 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrate

[GitHub] [arrow] ursabot edited a comment on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039741409 Benchmark runs are scheduled for baseline = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455 and contender = 49c6849acdaf8450276e93ccafe96aa6972d5a28. Results will be available

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1649: feat: implement exists subquery

2022-02-14 Thread GitBox
xudong963 commented on pull request #1649: URL: https://github.com/apache/arrow-datafusion/pull/1649#issuecomment-1039856271 > Hi @xudong963, how's the progress? > > In our case, we need to use in subquery. Do you have plan for that? If no, I'll try to implement that. The PR

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12368: ARROW-13993: [C++] [Compute] Add hash_one aggregate function

2022-02-14 Thread GitBox
dhruv9vats commented on a change in pull request #12368: URL: https://github.com/apache/arrow/pull/12368#discussion_r806444672 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate_test.cc ## @@ -2460,6 +2461,558 @@ TEST(GroupBy, Distinct) { } } +MATCHER_P(AnyOfScala

[GitHub] [arrow-datafusion] xudong963 commented on issue #1209: support more subqueries

2022-02-14 Thread GitBox
xudong963 commented on issue #1209: URL: https://github.com/apache/arrow-datafusion/issues/1209#issuecomment-1039854915 > Hi @xudong963, could you add #1835 to your list? I think it has already had. ![image](https://user-images.githubusercontent.com/41979257/153994402-116bd298-1

[GitHub] [arrow-datafusion] yahoNanJing commented on issue #1209: support more subqueries

2022-02-14 Thread GitBox
yahoNanJing commented on issue #1209: URL: https://github.com/apache/arrow-datafusion/issues/1209#issuecomment-1039851651 Hi @xudong963, could you add #1835 to your list? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow-datafusion] yahoNanJing opened a new issue #1835: Implement In Subquery

2022-02-14 Thread GitBox
yahoNanJing opened a new issue #1835: URL: https://github.com/apache/arrow-datafusion/issues/1835 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** **Describe the solution you'd like** **Describe alternatives you'

[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #1649: feat: implement exists subquery

2022-02-14 Thread GitBox
yahoNanJing commented on pull request #1649: URL: https://github.com/apache/arrow-datafusion/pull/1649#issuecomment-1039849922 Hi @xudong963, how's the progress? In our case, we need to use in subquery. Do you have plan for that? If no, I'll try to implement that. -- This is an

[GitHub] [arrow-rs] sunchao commented on a change in pull request #1284: Vectorized DeltaBitPackDecoder (#1281)

2022-02-14 Thread GitBox
sunchao commented on a change in pull request #1284: URL: https://github.com/apache/arrow-rs/pull/1284#discussion_r806434263 ## File path: parquet/src/encodings/decoding.rs ## @@ -431,232 +433,253 @@ pub struct DeltaBitPackDecoder { initialized: bool, // Header info

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1780: Enable periodic cleanup of work_dir directories in ballista executor

2022-02-14 Thread GitBox
Ted-Jiang commented on issue #1780: URL: https://github.com/apache/arrow-datafusion/issues/1780#issuecomment-1039826718 > > > preemptively > > > @houqp > > > Sorry for my confusion , You mean if a job has 3 stage, when stage3 is running, we can delete stage 1 first? > > IMO, I

[GitHub] [arrow-datafusion] Ted-Jiang removed a comment on issue #1780: Enable periodic cleanup of work_dir directories in ballista executor

2022-02-14 Thread GitBox
Ted-Jiang removed a comment on issue #1780: URL: https://github.com/apache/arrow-datafusion/issues/1780#issuecomment-1032254580 > preemptively @houqp Sorry for my confusion , You mean if a job has 3 stage, when stage3 is running, we can delete stage 1 first? -- This is an automat

[GitHub] [arrow-datafusion] Ted-Jiang commented on a change in pull request #1783: Enable periodic cleanup of work_dir directories in ballista executor

2022-02-14 Thread GitBox
Ted-Jiang commented on a change in pull request #1783: URL: https://github.com/apache/arrow-datafusion/pull/1783#discussion_r806422958 ## File path: ballista/rust/executor/src/main.rs ## @@ -148,3 +167,108 @@ async fn main() -> Result<()> { Ok(()) } + +/// This function

[GitHub] [arrow-datafusion] Ted-Jiang commented on a change in pull request #1783: Enable periodic cleanup of work_dir directories in ballista executor

2022-02-14 Thread GitBox
Ted-Jiang commented on a change in pull request #1783: URL: https://github.com/apache/arrow-datafusion/pull/1783#discussion_r806422483 ## File path: ballista/rust/executor/src/main.rs ## @@ -148,3 +167,108 @@ async fn main() -> Result<()> { Ok(()) } + +/// This function

[GitHub] [arrow] ursabot edited a comment on pull request #12397: ARROW-15657: [C++][Java] Upgrade Apache ORC to 1.7.3

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12397: URL: https://github.com/apache/arrow/pull/12397#issuecomment-1038373441 Benchmark runs are scheduled for baseline = 6b7c7a2702466f7c3c9c1f9dd41bc42458cff398 and contender = 45041fcd92b72bd36c08ca8d03074ccef7d9d782. 45041fcd92b72bd36c08ca8d0

[GitHub] [arrow] ursabot edited a comment on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039491210 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 3b3e28df6efb4b19cae989cb199b25bd2ea13412. Results will be available

[GitHub] [arrow-julia] baumgold commented on issue #233: error serializing `Union{Missing,Nothing,Nanosecond}`

2022-02-14 Thread GitBox
baumgold commented on issue #233: URL: https://github.com/apache/arrow-julia/issues/233#issuecomment-1039799040 I ran into the same issue recently. Any update here? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #1810: Refactor scheduler state with different management policy for volatile and stable states

2022-02-14 Thread GitBox
liukun4515 commented on pull request #1810: URL: https://github.com/apache/arrow-datafusion/pull/1810#issuecomment-1039793878 I will look this later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-julia] codecov-commenter commented on pull request #277: refactor Arrow.write to support incremental writes

2022-02-14 Thread GitBox
codecov-commenter commented on pull request #277: URL: https://github.com/apache/arrow-julia/pull/277#issuecomment-1039780445 # [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/277?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-julia] kou commented on pull request #277: refactor Arrow.write to support incremental writes

2022-02-14 Thread GitBox
kou commented on pull request #277: URL: https://github.com/apache/arrow-julia/pull/277#issuecomment-1039769773 Approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [arrow] guyuqi commented on pull request #12398: ARROW-15440: [Go] Implement 'unpack_bool' with Arm64 GoLang Assembly

2022-02-14 Thread GitBox
guyuqi commented on pull request #12398: URL: https://github.com/apache/arrow/pull/12398#issuecomment-1039759352 @zeroshade Could you please have a look this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039741409 Benchmark runs are scheduled for baseline = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455 and contender = 49c6849acdaf8450276e93ccafe96aa6972d5a28. Results will be available

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039491210 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 3b3e28df6efb4b19cae989cb199b25bd2ea13412. Results will be available

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1296: wip. Implement an iterator for DictionaryArray

2022-02-14 Thread GitBox
codecov-commenter edited a comment on pull request #1296: URL: https://github.com/apache/arrow-rs/pull/1296#issuecomment-1034646600 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1296?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm

[GitHub] [arrow] ursabot commented on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot commented on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039741409 Benchmark runs are scheduled for baseline = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455 and contender = 49c6849acdaf8450276e93ccafe96aa6972d5a28. Results will be available as each

[GitHub] [arrow] AlvinJ15 commented on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
AlvinJ15 commented on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039741330 @ursabot please benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] vibhatha commented on pull request #12113: ARROW-14679: [R] [C++] Handle suffix argument in joins

2022-02-14 Thread GitBox
vibhatha commented on pull request #12113: URL: https://github.com/apache/arrow/pull/12113#issuecomment-1039722482 > I wasn't sure if your TODO task was meant to be finished or not so I noted the places where I saw prefix still in use. That’s a modification still pending. -- This

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039491210 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 3b3e28df6efb4b19cae989cb199b25bd2ea13412. Results will be available

[GitHub] [arrow] github-actions[bot] commented on pull request #12427: PARQUET-2124: Remove Parquet Dictionary DCHECK

2022-02-14 Thread GitBox
github-actions[bot] commented on pull request #12427: URL: https://github.com/apache/arrow/pull/12427#issuecomment-1039701625 https://issues.apache.org/jira/browse/PARQUET-2124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] tachyonwill opened a new pull request #12427: PARQUET-2124: Remove Parquet Dictionary DCHECK

2022-02-14 Thread GitBox
tachyonwill opened a new pull request #12427: URL: https://github.com/apache/arrow/pull/12427 DCHECK doesn't make sense here as we can hit this condition due to Parquet file with a non-dictionary encoded data page followed by a dictionary encoded data page(dictionary page before both d

[GitHub] [arrow-julia] baumgold commented on pull request #277: refactor Arrow.write to support incremental writes

2022-02-14 Thread GitBox
baumgold commented on pull request #277: URL: https://github.com/apache/arrow-julia/pull/277#issuecomment-1039698085 Could someone please approve running CI again? As a side-note, it appears that the main branch currently produces failing tests with Julia v1.3 and v1.4 (note that all

[GitHub] [arrow] ursabot edited a comment on pull request #12397: ARROW-15657: [C++][Java] Upgrade Apache ORC to 1.7.3

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12397: URL: https://github.com/apache/arrow/pull/12397#issuecomment-1038373441 Benchmark runs are scheduled for baseline = 6b7c7a2702466f7c3c9c1f9dd41bc42458cff398 and contender = 45041fcd92b72bd36c08ca8d03074ccef7d9d782. 45041fcd92b72bd36c08ca8d0

[GitHub] [arrow] ursabot edited a comment on pull request #12400: MINOR: [Docs][Archery] Correct the links in the README.md

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12400: URL: https://github.com/apache/arrow/pull/12400#issuecomment-1039120561 Benchmark runs are scheduled for baseline = 269f5d2d42259971e291bd61dadc4cff4d969273 and contender = 699449f2f5fe36938191d771f321ec15d3fd3331. 699449f2f5fe36938191d771f

[GitHub] [arrow] ursabot edited a comment on pull request #12417: ARROW-15674: [C++][Gandiva] Like function doesn't properly handle patterns with special characters in certain cases

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12417: URL: https://github.com/apache/arrow/pull/12417#issuecomment-1039069433 Benchmark runs are scheduled for baseline = 3cc7df895e0cf2bf0aa1ac1977c20e5f93ec2f25 and contender = 5f590e9e64d880e2290dacc76ac85b4cd0d5f40a. 5f590e9e64d880e2290dacc76

[GitHub] [arrow] ursabot edited a comment on pull request #12364: ARROW-15606: [CI] [R] Add brew build that exercises the R package

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12364: URL: https://github.com/apache/arrow/pull/12364#issuecomment-1039282958 Benchmark runs are scheduled for baseline = 699449f2f5fe36938191d771f321ec15d3fd3331 and contender = 5ad5ddcafee8fada9cebb341df638b750c98efb7. 5ad5ddcafee8fada9cebb341d

[GitHub] [arrow] ursabot edited a comment on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039491210 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 3b3e28df6efb4b19cae989cb199b25bd2ea13412. Results will be available

[GitHub] [arrow] ursabot edited a comment on pull request #12313: ARROW-15351: [Doc][Guide] Additional tutorial for R bindings

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12313: URL: https://github.com/apache/arrow/pull/12313#issuecomment-1038908489 Benchmark runs are scheduled for baseline = 90edde22090dbb97a5f3728f476511c5b5c388de and contender = 3cc7df895e0cf2bf0aa1ac1977c20e5f93ec2f25. 3cc7df895e0cf2bf0aa1ac197

[GitHub] [arrow] ursabot edited a comment on pull request #12351: ARROW-15598: [C++][Gandiva] Avoid using hardcoded raw pointer addresses in generated code

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12351: URL: https://github.com/apache/arrow/pull/12351#issuecomment-1039069459 Benchmark runs are scheduled for baseline = 5f590e9e64d880e2290dacc76ac85b4cd0d5f40a and contender = 269f5d2d42259971e291bd61dadc4cff4d969273. 269f5d2d42259971e291bd61d

[GitHub] [arrow] ursabot edited a comment on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf

[GitHub] [arrow-datafusion] realno commented on issue #1832: Extract datafusion protobuf serialization into its own crate

2022-02-14 Thread GitBox
realno commented on issue #1832: URL: https://github.com/apache/arrow-datafusion/issues/1832#issuecomment-1039659235 +1 It sounds reasonable to create a crate instead of depending on Ballista. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wjones127 commented on issue #12416: Parquet Partition issues with Int64 Null

2022-02-14 Thread GitBox
wjones127 commented on issue #12416: URL: https://github.com/apache/arrow/issues/12416#issuecomment-1039656356 > The issue is when i read it back using pq.read_table. I tried providing schema during pq.write_to_dataset but didnt work. Could you provide a reproducible example where th

[GitHub] [arrow] ursabot commented on pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
ursabot commented on pull request #12339: URL: https://github.com/apache/arrow/pull/12339#issuecomment-1039652084 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 7b5efe47ba5a31f9850e5cdbf47feea4e0f6c455. 7b5efe47ba5a31f9850e5cdbf47feea4

[GitHub] [arrow-datafusion] realno commented on pull request #1834: protobuf OctetLength should be deserialized as octet_length, not length

2022-02-14 Thread GitBox
realno commented on pull request #1834: URL: https://github.com/apache/arrow-datafusion/pull/1834#issuecomment-1039651446 Thanks for the fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] westonpace closed pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
westonpace closed pull request #12339: URL: https://github.com/apache/arrow/pull/12339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsu

[GitHub] [arrow] NarayanB commented on issue #12416: Parquet Partition issues with Int64 Null

2022-02-14 Thread GitBox
NarayanB commented on issue #12416: URL: https://github.com/apache/arrow/issues/12416#issuecomment-1039647488 Hi, Thanks for the answer. In my case, i don't convert to pandas but i have the table whose schema is correct as int64 before i call pq.write_to_dataset(...). The issue is when

[GitHub] [arrow] github-actions[bot] commented on pull request #12426: ARROW-15672: [C++] Enable CSV writer to control the field delimiter

2022-02-14 Thread GitBox
github-actions[bot] commented on pull request #12426: URL: https://github.com/apache/arrow/pull/12426#issuecomment-1039640235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] sanjibansg opened a new pull request #12426: ARROW-15672: [C++] Enable CSV writer to control the field delimiter

2022-02-14 Thread GitBox
sanjibansg opened a new pull request #12426: URL: https://github.com/apache/arrow/pull/12426 This PR modifies the WriteOptions for CSV to allow changing the default delimiter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039491210 Benchmark runs are scheduled for baseline = 5ad5ddcafee8fada9cebb341df638b750c98efb7 and contender = 3b3e28df6efb4b19cae989cb199b25bd2ea13412. Results will be available

[GitHub] [arrow-datafusion] alamb commented on pull request #1810: Refactor scheduler state with different management policy for volatile and stable states

2022-02-14 Thread GitBox
alamb commented on pull request #1810: URL: https://github.com/apache/arrow-datafusion/pull/1810#issuecomment-1039627937 Hi @yahoNanJing I will look at this tomorrow. Also FYI I think @houqp may be delayed in responding for a while. -- This is an automated message from the Apache Git

[GitHub] [arrow] ElenaHenderson commented on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-02-14 Thread GitBox
ElenaHenderson commented on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1039622824 @AlvinJ15 @pitrou Benchmark builds are failing because of an issue with `aws-sdk-cpp` that was fixed in master branch by pinning `aws-sdk-cpp` version. I recommend getti

[GitHub] [arrow] lafiona commented on a change in pull request #12424: ARROW-15650: [MATLAB] Rename the MEX gateway function [WIP]

2022-02-14 Thread GitBox
lafiona commented on a change in pull request #12424: URL: https://github.com/apache/arrow/pull/12424#discussion_r806280978 ## File path: matlab/CMakeLists.txt ## @@ -305,16 +305,17 @@ target_include_directories(arrow_matlab PRIVATE ${CPP_SOURCE_DIR}) target_include_directori

[GitHub] [arrow] westonpace commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806280641 ## File path: .github/workflows/js.yml ## @@ -94,7 +94,7 @@ jobs: windows: name: AMD64 Windows 2019 NodeJS ${{ matrix.node }} -runs-on: w

[GitHub] [arrow] emkornfield commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
emkornfield commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806280633 ## File path: .github/workflows/go.yml ## @@ -134,7 +134,7 @@ jobs: windows: name: AMD64 Windows 2019 Go ${{ matrix.go }} -runs-on: wind

[GitHub] [arrow] westonpace commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806279597 ## File path: r/src/compute-exec.cpp ## @@ -157,7 +158,33 @@ std::shared_ptr ExecNode_Scan( arrow::dataset::ScanNodeOption

[GitHub] [arrow] domoritz commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
domoritz commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806279225 ## File path: .github/workflows/js.yml ## @@ -94,7 +94,7 @@ jobs: windows: name: AMD64 Windows 2019 NodeJS ${{ matrix.node }} -runs-on: win

[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
nealrichardson commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806278437 ## File path: r/src/compute-exec.cpp ## @@ -157,7 +158,33 @@ std::shared_ptr ExecNode_Scan( arrow::dataset::ScanNodeOp

[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
nealrichardson commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806278437 ## File path: r/src/compute-exec.cpp ## @@ -157,7 +158,33 @@ std::shared_ptr ExecNode_Scan( arrow::dataset::ScanNodeOp

[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
nealrichardson commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806277566 ## File path: r/R/dataset-write.R ## @@ -116,25 +116,40 @@ write_dataset <- function(dataset, if (inherits(dataset, "arrow_dplyr_query")) {

[GitHub] [arrow] westonpace commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806276486 ## File path: r/src/compute-exec.cpp ## @@ -157,7 +158,33 @@ std::shared_ptr ExecNode_Scan( arrow::dataset::ScanNodeOption

[GitHub] [arrow] nealrichardson commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
nealrichardson commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806275298 ## File path: r/src/compute-exec.cpp ## @@ -157,7 +158,33 @@ std::shared_ptr ExecNode_Scan( arrow::dataset::ScanNodeOp

[GitHub] [arrow] jonkeane commented on a change in pull request #12324: ARROW-15013: [R] Expose concatenate at the R level

2022-02-14 Thread GitBox
jonkeane commented on a change in pull request #12324: URL: https://github.com/apache/arrow/pull/12324#discussion_r806267955 ## File path: r/tests/testthat/test-Array.R ## @@ -989,6 +989,59 @@ test_that("auto int64 conversion to int can be disabled (ARROW-10093)", { }) })

[GitHub] [arrow-datafusion] alamb commented on issue #1818: A corner bug in union

2022-02-14 Thread GitBox
alamb commented on issue #1818: URL: https://github.com/apache/arrow-datafusion/issues/1818#issuecomment-1039599084 Nice find @xudong963 > Currently, DF judges if the logical plans in union have the same schema by arrow schema's field name This sounds like the problem to me

[GitHub] [arrow] ursabot edited a comment on pull request #12379: ARROW-15215: [C++] Consolidate kernel data-copy utilities between replace_with_mask, case_when, coalesce, choose, fill_null_forward, f

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12379: URL: https://github.com/apache/arrow/pull/12379#issuecomment-1037355037 Benchmark runs are scheduled for baseline = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd and contender = 6b7c7a2702466f7c3c9c1f9dd41bc42458cff398. 6b7c7a2702466f7c3c9c1f9dd

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #12409: ARROW-15668: Simplified skip logic in integration tests

2022-02-14 Thread GitBox
jorgecarleitao commented on a change in pull request #12409: URL: https://github.com/apache/arrow/pull/12409#discussion_r806261144 ## File path: dev/archery/archery/integration/runner.py ## @@ -129,6 +136,11 @@ def _gold_tests(self, gold_dir): skip = set()

[GitHub] [arrow-rs] alamb commented on a change in pull request #1296: wip. Implement an iterator for DictionaryArray

2022-02-14 Thread GitBox
alamb commented on a change in pull request #1296: URL: https://github.com/apache/arrow-rs/pull/1296#discussion_r806260714 ## File path: arrow/src/array/array_primitive.rs ## @@ -155,6 +155,16 @@ impl PrimitiveArray { }; PrimitiveArray::from(data) } + +

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #12409: ARROW-15668: Simplified skip logic in integration tests

2022-02-14 Thread GitBox
jorgecarleitao commented on a change in pull request #12409: URL: https://github.com/apache/arrow/pull/12409#discussion_r806258603 ## File path: dev/archery/archery/integration/languages/cpp.yaml ## @@ -0,0 +1,19 @@ +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] pitrou commented on pull request #12422: ARROW-15678: [C++][CI] a crossbow job with MinRelSize enabled

2022-02-14 Thread GitBox
pitrou commented on pull request #12422: URL: https://github.com/apache/arrow/pull/12422#issuecomment-1039590176 Yes, it's based on that info that I estimate that it's a compiler bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] jonkeane commented on pull request #12422: ARROW-15678: [C++][CI] a crossbow job with MinRelSize enabled

2022-02-14 Thread GitBox
jonkeane commented on pull request #12422: URL: https://github.com/apache/arrow/pull/12422#issuecomment-1039589243 Yup, that's the same place I saw locally. If it helps, here's the stack trace + disassembly from when when I triggered it locally as well: ``` 2491 Thread_8298

[GitHub] [arrow] westonpace commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806256661 ## File path: .github/workflows/julia.yml ## @@ -46,7 +46,7 @@ jobs: - 'nightly' os: - ubuntu-latest - - window

[GitHub] [arrow] westonpace commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806255892 ## File path: .github/workflows/js.yml ## @@ -94,7 +94,7 @@ jobs: windows: name: AMD64 Windows 2019 NodeJS ${{ matrix.node }} -runs-on: w

[GitHub] [arrow] westonpace commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806254868 ## File path: .github/workflows/go.yml ## @@ -134,7 +134,7 @@ jobs: windows: name: AMD64 Windows 2019 Go ${{ matrix.go }} -runs-on: windo

[GitHub] [arrow] westonpace commented on a change in pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12425: URL: https://github.com/apache/arrow/pull/12425#discussion_r806254305 ## File path: .github/workflows/cpp.yml ## @@ -186,9 +186,9 @@ jobs: fail-fast: false matrix: os: - - windows-latest +

[GitHub] [arrow] github-actions[bot] commented on pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
github-actions[bot] commented on pull request #12425: URL: https://github.com/apache/arrow/pull/12425#issuecomment-1039583034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] westonpace opened a new pull request #12425: ARROW-15682: [CI] Github starting to migrate "windows-latest" tag from windows 2019 to windows 2022

2022-02-14 Thread GitBox
westonpace opened a new pull request #12425: URL: https://github.com/apache/arrow/pull/12425 Announcement: https://github.blog/changelog/2022-01-11-github-actions-jobs-running-on-windows-latest-are-now-running-on-windows-server-2022/ It will be rolling out over the next 8 weeks. I n

[GitHub] [arrow] wjones127 edited a comment on issue #12416: Parquet Partition issues with Int64 Null

2022-02-14 Thread GitBox
wjones127 edited a comment on issue #12416: URL: https://github.com/apache/arrow/issues/12416#issuecomment-1039292700 Hi, this problem here likely isn't the partitioning read, but the conversion to pandas. From [the docs](https://arrow.apache.org/docs/python/pandas.html#nullable-types):

[GitHub] [arrow] ursabot edited a comment on pull request #12364: ARROW-15606: [CI] [R] Add brew build that exercises the R package

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12364: URL: https://github.com/apache/arrow/pull/12364#issuecomment-1039282958 Benchmark runs are scheduled for baseline = 699449f2f5fe36938191d771f321ec15d3fd3331 and contender = 5ad5ddcafee8fada9cebb341df638b750c98efb7. 5ad5ddcafee8fada9cebb341d

[GitHub] [arrow] ursabot edited a comment on pull request #12400: MINOR: [Docs][Archery] Correct the links in the README.md

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12400: URL: https://github.com/apache/arrow/pull/12400#issuecomment-1039120561 Benchmark runs are scheduled for baseline = 269f5d2d42259971e291bd61dadc4cff4d969273 and contender = 699449f2f5fe36938191d771f321ec15d3fd3331. 699449f2f5fe36938191d771f

[GitHub] [arrow] westonpace commented on a change in pull request #12113: ARROW-14679: [R] [C++] Handle suffix argument in joins

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12113: URL: https://github.com/apache/arrow/pull/12113#discussion_r806242165 ## File path: r/R/query-engine.R ## @@ -161,14 +160,18 @@ ExecPlan <- R6Class("ExecPlan", # (as when we've done collapse() and not projected a

[GitHub] [arrow] westonpace commented on a change in pull request #12113: ARROW-14679: [R] [C++] Handle suffix argument in joins

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12113: URL: https://github.com/apache/arrow/pull/12113#discussion_r806241299 ## File path: r/src/compute-exec.cpp ## @@ -217,7 +217,8 @@ std::shared_ptr ExecNode_Join( const std::shared_ptr& input, int type, const std:

[GitHub] [arrow] westonpace commented on a change in pull request #12339: ARROW-14908: [C++][R] Dataset hash join segfaults on Windows

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12339: URL: https://github.com/apache/arrow/pull/12339#discussion_r806238789 ## File path: cpp/src/arrow/compute/exec/hash_join.cc ## @@ -103,7 +103,7 @@ class HashJoinBasicImpl : public HashJoinImpl { filter_ = std::move(f

[GitHub] [arrow] westonpace commented on a change in pull request #12316: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-02-14 Thread GitBox
westonpace commented on a change in pull request #12316: URL: https://github.com/apache/arrow/pull/12316#discussion_r806235649 ## File path: r/R/dataset-write.R ## @@ -116,25 +116,40 @@ write_dataset <- function(dataset, if (inherits(dataset, "arrow_dplyr_query")) { # p

[GitHub] [arrow] ursabot edited a comment on pull request #12351: ARROW-15598: [C++][Gandiva] Avoid using hardcoded raw pointer addresses in generated code

2022-02-14 Thread GitBox
ursabot edited a comment on pull request #12351: URL: https://github.com/apache/arrow/pull/12351#issuecomment-1039069459 Benchmark runs are scheduled for baseline = 5f590e9e64d880e2290dacc76ac85b4cd0d5f40a and contender = 269f5d2d42259971e291bd61dadc4cff4d969273. 269f5d2d42259971e291bd61d

  1   2   3   >