[GitHub] [arrow] github-actions[bot] commented on pull request #12152: ARROW-15123: [R] Schema order not respected and file header ignored

2022-01-14 Thread GitBox
github-actions[bot] commented on pull request #12152: URL: https://github.com/apache/arrow/pull/12152#issuecomment-1012912869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] thisisnic closed pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

2022-01-14 Thread GitBox
thisisnic closed pull request #12152: URL: https://github.com/apache/arrow/pull/12152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] thisisnic commented on pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

2022-01-14 Thread GitBox
thisisnic commented on pull request #12152: URL: https://github.com/apache/arrow/pull/12152#issuecomment-1012934828 Closed this as I realised that the intended behaviour (to match `read_arrow_csv()`) is that including a schema means that we assume there is no header row. -- This is an a

[GitHub] [arrow-datafusion] jon-chuang commented on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang commented on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tune.

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] yjshen opened a new pull request #1564: Minor: Fix newly emerged lint problems on the master branch

2022-01-14 Thread GitBox
yjshen opened a new pull request #1564: URL: https://github.com/apache/arrow-datafusion/pull/1564 # Which issue does this PR close? Closes #. # Rationale for this change https://github.com/apache/arrow-datafusion/runs/4809829269?check_suite_focus=true I

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] thisisnic removed a comment on pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

2022-01-14 Thread GitBox
thisisnic removed a comment on pull request #12152: URL: https://github.com/apache/arrow/pull/12152#issuecomment-1012934828 Closed this as I realised that the intended behaviour (to match `read_arrow_csv()`) is that including a schema means that we assume there is no header row. -- This

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow] ursabot edited a comment on pull request #11821: ARROW-13841: [Doc] Document the different subcomponents that make up the CI and how they fit together

2022-01-14 Thread GitBox
ursabot edited a comment on pull request #11821: URL: https://github.com/apache/arrow/pull/11821#issuecomment-1012296992 Benchmark runs are scheduled for baseline = 0c5cd7316c7da4ca00012a8e15f044560db2b1d0 and contender = 71ce8db417fda0066adc64e5e37cbf0da7fb0da2. 71ce8db417fda0066adc64e5e

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow] ursabot edited a comment on pull request #11751: ARROW-14694: [R] Let me dput a schema

2022-01-14 Thread GitBox
ursabot edited a comment on pull request #11751: URL: https://github.com/apache/arrow/pull/11751#issuecomment-1012152838 Benchmark runs are scheduled for baseline = 77fc23fcae0331da3adf94619a381a371a6e414f and contender = c48353fd21b26bcb894d791d49c29371607eb9b9. c48353fd21b26bcb894d791d4

[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1170: Fix new clippy lints introduced in Rust 1.58

2022-01-14 Thread GitBox
jhorstmann commented on a change in pull request #1170: URL: https://github.com/apache/arrow-rs/pull/1170#discussion_r784709789 ## File path: arrow/src/buffer/immutable.rs ## @@ -244,6 +244,7 @@ impl std::ops::Deref for Buffer { } unsafe impl Sync for Buffer {} +#[allow(cli

[GitHub] [arrow] dongjoon-hyun opened a new pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

2022-01-14 Thread GitBox
dongjoon-hyun opened a new pull request #12153: URL: https://github.com/apache/arrow/pull/12153 This PR aims to add `pyarrow.orc.read_table` like `pyarrow.parquet.read_table`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1562: Consolidate configurations in `ExecutionConfig`, `RuntimeConfig` and `PhysicalPlanConfig`

2022-01-14 Thread GitBox
yjshen commented on a change in pull request #1562: URL: https://github.com/apache/arrow-datafusion/pull/1562#discussion_r784709208 ## File path: ballista/rust/core/src/serde/logical_plan/from_proto.rs ## @@ -246,8 +246,8 @@ impl TryInto for &protobuf::LogicalPlanNode {

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1562: Consolidate configurations in `ExecutionConfig`, `RuntimeConfig` and `PhysicalPlanConfig`

2022-01-14 Thread GitBox
yjshen commented on a change in pull request #1562: URL: https://github.com/apache/arrow-datafusion/pull/1562#discussion_r784712799 ## File path: datafusion/src/execution/context.rs ## @@ -901,14 +898,13 @@ pub struct ExecutionConfig { /// Should Datafusion parquet reader

[GitHub] [arrow] github-actions[bot] commented on pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

2022-01-14 Thread GitBox
github-actions[bot] commented on pull request #12153: URL: https://github.com/apache/arrow/pull/12153#issuecomment-1012977611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] tustvold commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-14 Thread GitBox
tustvold commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 > That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs Sorry, I meant more cherry-picking idea

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-14 Thread GitBox
tustvold edited a comment on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 > That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs Sorry, I meant more cherry-picki

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-14 Thread GitBox
tustvold edited a comment on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 > That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs Sorry, I meant more cherry-picki

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-14 Thread GitBox
tustvold edited a comment on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 > That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs Sorry, I meant more cherry-picki

[GitHub] [arrow] ViniciusSouzaRoque commented on pull request #12145: ARROW-15326: [C++] Fix Gandiva crashes

2022-01-14 Thread GitBox
ViniciusSouzaRoque commented on pull request #12145: URL: https://github.com/apache/arrow/pull/12145#issuecomment-1012988726 > @augustoasilva @pravindra @ViniciusSouzaRoque Can you take a look? ok, everything right with this change. -- This is an automated message from the Apache G

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-14 Thread GitBox
tustvold edited a comment on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 > That's why I think it would be good if we can come up with a way to avoid cherry-picking commits from arrow2 into arrow-rs Sorry, I meant more cherry-picki

[GitHub] [arrow-datafusion] Igosuki commented on issue #1544: Streaming support for DataFusion

2022-01-14 Thread GitBox
Igosuki commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1013001849 I've used kafka-streams, flink and beam professionally, the point of streaming is to execute windowed functions, join aggregated data with in-memory tables and distribute

[GitHub] [arrow-datafusion] Igosuki commented on issue #1061: ARROW2: support avro

2022-01-14 Thread GitBox
Igosuki commented on issue #1061: URL: https://github.com/apache/arrow-datafusion/issues/1061#issuecomment-1013006341 No, besides the patch I mentioned which enables arrays to be null https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/read/deserialize.rs#L101 -- This is an a

[GitHub] [arrow-datafusion] Igosuki commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013010833 Just use massif or heaptrack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-rs] jhorstmann opened a new issue #1171: Fix or remove memory-tracking feature

2022-01-14 Thread GitBox
jhorstmann opened a new issue #1171: URL: https://github.com/apache/arrow-rs/issues/1171 **Describe the bug** The non-default `memory-check` feature fails to compile (probably at least since 2021-04, that was the last time `alloc/mod.rs` was changed). **To Reproduce** `

[GitHub] [arrow-datafusion] Igosuki commented on a change in pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki commented on a change in pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#discussion_r784746803 ## File path: datafusion/src/avro_to_arrow/reader.rs ## @@ -101,30 +100,49 @@ impl ReaderBuilder { } /// Create a new `Reader` from t

[GitHub] [arrow-datafusion] Igosuki commented on a change in pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki commented on a change in pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#discussion_r784748048 ## File path: datafusion/src/datasource/object_store/mod.rs ## @@ -33,6 +33,12 @@ use local::LocalFileSystem; use crate::error::{DataFusionErr

[GitHub] [arrow-datafusion] alamb merged pull request #1557: Update to rust 1.58

2022-01-14 Thread GitBox
alamb merged pull request #1557: URL: https://github.com/apache/arrow-datafusion/pull/1557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] alamb commented on pull request #1555: Fix new clippy lints introduced in Rust 1.58

2022-01-14 Thread GitBox
alamb commented on pull request #1555: URL: https://github.com/apache/arrow-datafusion/pull/1555#issuecomment-1013021361 > looks like this is a subset of #1557? Yes, I think #1557 from @xudong963 looks better 👍 closing this one -- This is an automated message from the Apache

[GitHub] [arrow-datafusion] alamb closed pull request #1555: Fix new clippy lints introduced in Rust 1.58

2022-01-14 Thread GitBox
alamb closed pull request #1555: URL: https://github.com/apache/arrow-datafusion/pull/1555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] alamb commented on pull request #1557: Update to rust 1.58

2022-01-14 Thread GitBox
alamb commented on pull request #1557: URL: https://github.com/apache/arrow-datafusion/pull/1557#issuecomment-1013021592 Thanks @xudong963 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] yjshen commented on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
yjshen commented on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013033180 @jon-chuang Thanks for bringing this up. I may mistake something for Ray, please point out. IMHO, Ray is designed to ease the development of the general purposed dis

[GitHub] [arrow-datafusion] Igosuki commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013038787 @houqp What hardware/setup did you use to run the benchmark ? I'm actually getting way worse performance if running tpch using parquet -- This is an automated messa

[GitHub] [arrow] ursabot edited a comment on pull request #12147: ARROW-15318: [C++][Python] Regression reading partition keys of large batches.

2022-01-14 Thread GitBox
ursabot edited a comment on pull request #12147: URL: https://github.com/apache/arrow/pull/12147#issuecomment-1012509111 Benchmark runs are scheduled for baseline = 71ce8db417fda0066adc64e5e37cbf0da7fb0da2 and contender = 270416b0d071030bc2ce64adde40df83356c3d2f. 270416b0d071030bc2ce64add

[GitHub] [arrow] ursabot edited a comment on pull request #12142: ARROW-15322: [Docs][Go] Update sidebar link for Go docs.

2022-01-14 Thread GitBox
ursabot edited a comment on pull request #12142: URL: https://github.com/apache/arrow/pull/12142#issuecomment-1012269048 Benchmark runs are scheduled for baseline = c48353fd21b26bcb894d791d49c29371607eb9b9 and contender = 111347de8295e465c947ef26ba057b8a527bcaa7. 111347de8295e465c947ef26b

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1557: Update to rust 1.58

2022-01-14 Thread GitBox
xudong963 commented on pull request #1557: URL: https://github.com/apache/arrow-datafusion/pull/1557#issuecomment-1013042019 > Thanks @xudong963 ! I wrote this PR without reading the email today, so I didn't notice that you have already filed a ticket. I'm sorry @alamb -- This i

[GitHub] [arrow] djnavarro opened a new pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
djnavarro opened a new pull request #12154: URL: https://github.com/apache/arrow/pull/12154 This patch provides dplyr bindings to for lubridate functions `floor_date()`, `ceiling_date()`, and `round_date()`. This is my first attempt at writing a patch, so my apologies if I've made any erro

[GitHub] [arrow] github-actions[bot] commented on pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
github-actions[bot] commented on pull request #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1013051995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] Igosuki edited a comment on pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki edited a comment on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013038787 @houqp What hardware/setup did you use to run the benchmark ? I'm actually getting way worse performance if running tpch using parquet Edit : ok flamegra

[GitHub] [arrow-datafusion] jon-chuang commented on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang commented on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ yes. yes and failure re

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] alamb commented on pull request #1557: Update to rust 1.58

2022-01-14 Thread GitBox
alamb commented on pull request #1557: URL: https://github.com/apache/arrow-datafusion/pull/1557#issuecomment-1013069049 No worries at all 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] jcralmeida commented on a change in pull request #11982: ARROW-15313: [C++][Java][FlightRPC] Implement type info method to flight-sql

2022-01-14 Thread GitBox
jcralmeida commented on a change in pull request #11982: URL: https://github.com/apache/arrow/pull/11982#discussion_r784797747 ## File path: format/FlightSql.proto ## @@ -867,6 +867,69 @@ enum SqlSupportsConvert { SQL_CONVERT_VARCHAR = 19; } +/* + * Represents a request t

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] alamb commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
alamb commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013079492 Thank you @houqp and @Igosuki -- I'll try and take a look at this later today or tomorrow. I will also start the discussion of "what does this mean for arrow-rs", whi

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] rok commented on pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
rok commented on pull request #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1013080010 This looks great @djnavarro thanks for doing it! Regarding your TODOs: > support week_start (essential) Perhaps you can put in an R shim to get lubridate matchin

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1012935879 Hi all, I've been working on a Rust API for the Ray distributed computing framework that powers many popular python ML libraries like RLLib, Ray Train and Ray Tu

[GitHub] [arrow-datafusion] Igosuki commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-14 Thread GitBox
Igosuki commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013080406 Ok adding BufReader gave 50% perf on parquet. https://github.com/houqp/arrow-datafusion/pull/19/commits/d8a184969bd2a88292158cbc704e0cb959b28ea6 -- This is an autom

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] kszucs opened a new pull request #12155: ARROW-15095: [Dev][Website] Changelog generation should use commit messages

2022-01-14 Thread GitBox
kszucs opened a new pull request #12155: URL: https://github.com/apache/arrow/pull/12155 Display commit titles in the changelog instead of jira titles. Since we list all of the resolved jira tickets even without an applied patch, this is only possible where we have an issue-commit pair ava

[GitHub] [arrow] github-actions[bot] commented on pull request #12155: ARROW-15095: [Dev][Website] Changelog generation should use commit messages

2022-01-14 Thread GitBox
github-actions[bot] commented on pull request #12155: URL: https://github.com/apache/arrow/pull/12155#issuecomment-1013083443 https://issues.apache.org/jira/browse/ARROW-15095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] rok commented on pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
rok commented on pull request #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1013085390 Another thing regarding timezones - only timestamps in arrow have timezones. Dates do not. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] kszucs closed pull request #12145: ARROW-15326: [C++] Fix Gandiva crashes

2022-01-14 Thread GitBox
kszucs closed pull request #12145: URL: https://github.com/apache/arrow/pull/12145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-14 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] joosthooz commented on a change in pull request #12089: ARROW-9285: [C++] Detect unauthorized memory allocations in function kernels

2022-01-14 Thread GitBox
joosthooz commented on a change in pull request #12089: URL: https://github.com/apache/arrow/pull/12089#discussion_r784816517 ## File path: cpp/src/arrow/compute/exec.cc ## @@ -110,6 +112,78 @@ int64_t ExecBatch::TotalBufferSize() const { return sum; } +bool AddBuffersToS

[GitHub] [arrow] ursabot commented on pull request #12145: ARROW-15326: [C++] Fix Gandiva crashes

2022-01-14 Thread GitBox
ursabot commented on pull request #12145: URL: https://github.com/apache/arrow/pull/12145#issuecomment-1013091190 Benchmark runs are scheduled for baseline = 99f7c3cf3e6c2a9555ceff3d48ef73e485ede546 and contender = fc6d408f5adf09cfcf1949354df396edf6e3e4ab. fc6d408f5adf09cfcf1949354df396ed

[GitHub] [arrow] paleolimbot commented on a change in pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
paleolimbot commented on a change in pull request #12154: URL: https://github.com/apache/arrow/pull/12154#discussion_r784807691 ## File path: r/R/util.R ## @@ -209,3 +209,74 @@ handle_csv_read_error <- function(e, schema) { abort(e) } + + +parse_period_unit <- function(x)

[GitHub] [arrow] paleolimbot commented on pull request #12154: ARROW-14281: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-14 Thread GitBox
paleolimbot commented on pull request #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1013092505 @jonkeane Can you approve the workflows to run? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kszucs closed pull request #12146: ARROW-15076: [C++][Gandiva] Fix allocation of AES {en,de}cryption result

2022-01-14 Thread GitBox
kszucs closed pull request #12146: URL: https://github.com/apache/arrow/pull/12146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] ursabot commented on pull request #12146: ARROW-15076: [C++][Gandiva] Fix allocation of AES {en,de}cryption result

2022-01-14 Thread GitBox
ursabot commented on pull request #12146: URL: https://github.com/apache/arrow/pull/12146#issuecomment-1013097351 Benchmark runs are scheduled for baseline = fc6d408f5adf09cfcf1949354df396edf6e3e4ab and contender = 5632423fea91f5c2c69709c56eb64696bd9301ef. 5632423fea91f5c2c69709c56eb64696

[GitHub] [arrow] paleolimbot commented on a change in pull request #12152: ARROW-15123: [R] CSV dataset file header read in as data

2022-01-14 Thread GitBox
paleolimbot commented on a change in pull request #12152: URL: https://github.com/apache/arrow/pull/12152#discussion_r784824022 ## File path: r/R/util.R ## @@ -209,5 +209,5 @@ handle_csv_read_error <- function(e, schema) { )) } - abort(e) + stop(e) Review comment:

[GitHub] [arrow] ursabot edited a comment on pull request #12145: ARROW-15326: [C++] Fix Gandiva crashes

2022-01-14 Thread GitBox
ursabot edited a comment on pull request #12145: URL: https://github.com/apache/arrow/pull/12145#issuecomment-1013091190 Benchmark runs are scheduled for baseline = 99f7c3cf3e6c2a9555ceff3d48ef73e485ede546 and contender = fc6d408f5adf09cfcf1949354df396edf6e3e4ab. fc6d408f5adf09cfcf1949354

[GitHub] [arrow] kszucs commented on a change in pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-14 Thread GitBox
kszucs commented on a change in pull request #11938: URL: https://github.com/apache/arrow/pull/11938#discussion_r784830527 ## File path: python/pyarrow/dataset.py ## @@ -47,6 +46,9 @@ _get_partition_keys, _filesystemdataset_write, ) +# keep Expression functionality e

[GitHub] [arrow] kszucs commented on pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-14 Thread GitBox
kszucs commented on pull request #11938: URL: https://github.com/apache/arrow/pull/11938#issuecomment-1013105300 @jorisvandenbossche could you please rebase since the PR is rather old -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kszucs edited a comment on pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-14 Thread GitBox
kszucs edited a comment on pull request #11938: URL: https://github.com/apache/arrow/pull/11938#issuecomment-1013105300 @jorisvandenbossche could you please rebase since the PR is rather old? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-14 Thread GitBox
jorisvandenbossche commented on a change in pull request #11938: URL: https://github.com/apache/arrow/pull/11938#discussion_r784831930 ## File path: python/pyarrow/dataset.py ## @@ -47,6 +46,9 @@ _get_partition_keys, _filesystemdataset_write, ) +# keep Expression fun

[GitHub] [arrow] jorisvandenbossche commented on pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-14 Thread GitBox
jorisvandenbossche commented on pull request #11938: URL: https://github.com/apache/arrow/pull/11938#issuecomment-1013105809 I updated it yesterday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

  1   2   3   >