[GitHub] [arrow-rs] tustvold opened a new pull request, #2898: Specialize interleave integer

2022-10-19 Thread GitBox
tustvold opened a new pull request, #2898: URL: https://github.com/apache/arrow-rs/pull/2898 _Draft as builds on #2885_ # Which issue does this PR close? Part of #2864 # Rationale for this change ``` interleave i32(0.0) 100 [0..100, 100..230, 450..

[GitHub] [arrow-ballista] yahoNanJing opened a new issue, #411: Support count distinct aggregation function

2022-10-19 Thread GitBox
yahoNanJing opened a new issue, #411: URL: https://github.com/apache/arrow-ballista/issues/411 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** **Describe the solution you'd like** **Describe alternatives you've c

[GitHub] [arrow-ballista] dependabot[bot] commented on pull request #410: Update sqlparser requirement from 0.25 to 0.26

2022-10-19 Thread GitBox
dependabot[bot] commented on PR #410: URL: https://github.com/apache/arrow-ballista/pull/410#issuecomment-1284919374 The following labels could not be found: `auto-dependencies`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-ballista] dependabot[bot] opened a new pull request, #410: Update sqlparser requirement from 0.25 to 0.26

2022-10-19 Thread GitBox
dependabot[bot] opened a new pull request, #410: URL: https://github.com/apache/arrow-ballista/pull/410 Updates the requirements on [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) to permit the latest version. Changelog Sourced from https://github.com/sqlparser-rs/sqlpar

[GitHub] [arrow-datafusion] comphead commented on pull request #3882: simplify unqualified col name aliases

2022-10-19 Thread GitBox
comphead commented on PR #3882: URL: https://github.com/apache/arrow-datafusion/pull/3882#issuecomment-1284895937 > Ok, but the current branch now also fails with `select trim(a), trim(b) from table`? As the two are now giving the same alias. This is my main worry to merge this. As suffix

[GitHub] [arrow-ballista] r4ntix opened a new pull request, #409: Fix q20 sql typo in benchmarks

2022-10-19 Thread GitBox
r4ntix opened a new pull request, #409: URL: https://github.com/apache/arrow-ballista/pull/409 # Which issue does this PR close? Closes #374 # Rationale for this change See #374 # What changes are included in this PR? fix q20 sql typo

[GitHub] [arrow-datafusion] Ted-Jiang closed pull request #3899: Factorize common AND factors out of OR predicates to support filterPushDown as possible

2022-10-19 Thread GitBox
Ted-Jiang closed pull request #3899: Factorize common AND factors out of OR predicates to support filterPushDown as possible URL: https://github.com/apache/arrow-datafusion/pull/3899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow-ballista] r4ntix commented on issue #375: Benchmark q22 produces empty result set

2022-10-19 Thread GitBox
r4ntix commented on issue #375: URL: https://github.com/apache/arrow-ballista/issues/375#issuecomment-1284860302 It's a datafusion-proto issue, I submit a PR to fix it: https://github.com/apache/arrow-datafusion/pull/3902 -- This is an automated message from the Apache Git Service. To res

[GitHub] [arrow-datafusion] r4ntix opened a new pull request, #3902: Add `Substring(str [from int] [for int])` support in `datafusion-proto`

2022-10-19 Thread GitBox
r4ntix opened a new pull request, #3902: URL: https://github.com/apache/arrow-datafusion/pull/3902 # Which issue does this PR close? Closes #3901 and https://github.com/apache/arrow-ballista/issues/375 # Rationale for this change See #3901 # What chan

[GitHub] [arrow-datafusion] r4ntix opened a new issue, #3901: datafusion-proto deserialize with Substring(str [from int] [for int]) fails

2022-10-19 Thread GitBox
r4ntix opened a new issue, #3901: URL: https://github.com/apache/arrow-datafusion/issues/3901 **Describe the bug** Serialize and Deserialize with `Substring(str [from int] [for int])`: ```rust // substr(string, position, count) let test_expr_with_count = Expr::ScalarFunction {

[GitHub] [arrow-datafusion] Dandandan commented on pull request #3882: simplify unqualified col name aliases

2022-10-19 Thread GitBox
Dandandan commented on PR #3882: URL: https://github.com/apache/arrow-datafusion/pull/3882#issuecomment-1284843513 Ok, but the current branch now also fails with `select trim(a), trim(b) from table`? As the two are now giving the same alias. This is my main worry to merge this. As suffix

[GitHub] [arrow-datafusion] HaoYang670 opened a new issue, #3900: [QUESTION] How many times should be the function `create_name` called when executing a query?

2022-10-19 Thread GitBox
HaoYang670 opened a new issue, #3900: URL: https://github.com/apache/arrow-datafusion/issues/3900 **Describe the bug** A clear and concise description of what the bug is. Related to this function: https://github.com/apache/arrow-datafusion/blob/master/datafusion/expr/src/expr.rs#L951

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #3899: Factorize common AND factors out of OR predicates to support filterPushDown as possible

2022-10-19 Thread GitBox
Ted-Jiang opened a new pull request, #3899: URL: https://github.com/apache/arrow-datafusion/pull/3899 # Which issue does this PR close? Closes #3858 . # Rationale for this change # What changes are included in this PR? # Are there any user-facing c

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3859: Factorize common AND factors out of OR predicates to support filterPu…

2022-10-19 Thread GitBox
Ted-Jiang commented on PR #3859: URL: https://github.com/apache/arrow-datafusion/pull/3859#issuecomment-1284825350 > Actually, I think the regression is pretty bad (there are many filters) -- I am going to revert this PR to get the tests green again. I am really sorry @Ted-Jiang see

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3859: Factorize common AND factors out of OR predicates to support filterPu…

2022-10-19 Thread GitBox
Ted-Jiang commented on PR #3859: URL: https://github.com/apache/arrow-datafusion/pull/3859#issuecomment-1284821175 > @Ted-Jiang can you please re-create this PR (or I can do so to) and we can iterate on it some more. I am really sorry about that @alamb of course!This is not your faul

[GitHub] [arrow-datafusion] HaoYang670 commented on issue #3891: `count(Literal)` gives wrong column name

2022-10-19 Thread GitBox
HaoYang670 commented on issue #3891: URL: https://github.com/apache/arrow-datafusion/issues/3891#issuecomment-1284818587 Hi @andygrove, I guess this bug is probably related to the sql-parser. Because there is no error when constructing the expression directly: ```rust #[test]

[GitHub] [arrow-ballista] liurenjie1024 opened a new issue, #408: Design: Implement bubble execution mode.

2022-10-19 Thread GitBox
liurenjie1024 opened a new issue, #408: URL: https://github.com/apache/arrow-ballista/issues/408 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated whe

[GitHub] [arrow] kou merged pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
kou merged PR #14462: URL: https://github.com/apache/arrow/pull/14462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
kou commented on PR #14462: URL: https://github.com/apache/arrow/pull/14462#issuecomment-1284802384 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow] cyb70289 commented on a diff in pull request #14442: ARROW-18081: [Go] Add Scalar Boolean functions

2022-10-19 Thread GitBox
cyb70289 commented on code in PR #14442: URL: https://github.com/apache/arrow/pull/14442#discussion_r166822 ## go/arrow/bitutil/bitmaps.go: ## @@ -390,15 +396,26 @@ func CopyBitmap(src []byte, srcOffset, length int, dst []byte, dstOffset int) { rdr := NewBit

[GitHub] [arrow] paleolimbot commented on a diff in pull request #14328: ARROW-17879: [R] Intermittent memory leaks in the valgrind nightly test

2022-10-19 Thread GitBox
paleolimbot commented on code in PR #14328: URL: https://github.com/apache/arrow/pull/14328#discussion_r164056 ## r/src/compute-exec.cpp: ## @@ -147,6 +147,35 @@ class ExecPlanReader : public arrow::RecordBatchReader { ~ExecPlanReader() { StopProducing(); } Review Comm

[GitHub] [arrow-datafusion] HaoYang670 commented on issue #3892: Improve efficiency of multiple optimizer passes

2022-10-19 Thread GitBox
HaoYang670 commented on issue #3892: URL: https://github.com/apache/arrow-datafusion/issues/3892#issuecomment-1284776650 As this is very similar to the optimizer in compilers, I guess we can also find some inspiration from how compilers judge to finish the optimizing process. -- This i

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r149117 ## datafusion/expr/src/operator.rs: ## @@ -115,6 +115,41 @@ impl Operator { | Operator::StringConcat => None, } } + +/// Ret

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r148906 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -640,6 +640,155 @@ impl PhysicalExpr for BinaryExpr { self.evaluate_with_resolved_arg

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r148750 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -640,6 +640,155 @@ impl PhysicalExpr for BinaryExpr { self.evaluate_with_resolved_arg

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r148474 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -61,6 +62,81 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { Ok(tmp_result)

[GitHub] [arrow-datafusion] isidentical commented on pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284751479 Since there were a lot of discussions (and many thanks for it @alamb, it was very inspiring), here is a quick summary: - I've pushed https://github.com/apache/arrow-d

[GitHub] [arrow-datafusion] tustvold commented on issue #847: Implement parquet page-level skipping with column index, using min/max stats

2022-10-19 Thread GitBox
tustvold commented on issue #847: URL: https://github.com/apache/arrow-datafusion/issues/847#issuecomment-1284748127 I believe this has been implemented by #3780, feel free to reopen if I have missed anything -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-datafusion] tustvold closed issue #847: Implement parquet page-level skipping with column index, using min/max stats

2022-10-19 Thread GitBox
tustvold closed issue #847: Implement parquet page-level skipping with column index, using min/max stats URL: https://github.com/apache/arrow-datafusion/issues/847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r145450 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -645,22 +663,19 @@ struct ExtensionIdRegistryImpl : ExtensionIdRegistry { }; template -using EnumParse

[GitHub] [arrow] westonpace commented on pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on PR #14415: URL: https://github.com/apache/arrow/pull/14415#issuecomment-1284746747 @bkietz Thanks for the review. I still need to take a look at the cmake changes but I'll try that out later. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r144978 ## cpp/src/arrow/engine/substrait/expression_internal.cc: ## @@ -90,6 +95,9 @@ Result DecodeScalarFunction( ARROW_RETURN_NOT_OK(DecodeArg(scalar_fn.arguments(i)

[GitHub] [arrow] vibhatha commented on pull request #14024: ARROW-17521: [Python] Add python bindings for NamedTableProvider for Substrait consumer

2022-10-19 Thread GitBox
vibhatha commented on PR #14024: URL: https://github.com/apache/arrow/pull/14024#issuecomment-1284745953 > This is a nice improvement! > > I am wondering one thing: why does the function take a list of names, instead of a single name? > > I understand from [#14024 (comment)](h

[GitHub] [arrow-datafusion] tustvold commented on issue #1923: Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix.

2022-10-19 Thread GitBox
tustvold commented on issue #1923: URL: https://github.com/apache/arrow-datafusion/issues/1923#issuecomment-1284745812 I believe this was implemented by a combination of https://github.com/apache/arrow-datafusion/pull/2578 and https://github.com/apache/arrow-datafusion/pull/2677. These sta

[GitHub] [arrow-datafusion] tustvold closed issue #1923: Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix.

2022-10-19 Thread GitBox
tustvold closed issue #1923: Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. URL: https://github.com/apache/arrow-datafusion/issues/1923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r143752 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -156,15 +156,33 @@ Result SubstraitCall::GetValueArg(uint32_t index) const { return value_arg_it->secon

[GitHub] [arrow-datafusion] tustvold closed issue #537: Support min/max statistics in ParquetTable and ParquetExec

2022-10-19 Thread GitBox
tustvold closed issue #537: Support min/max statistics in ParquetTable and ParquetExec URL: https://github.com/apache/arrow-datafusion/issues/537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] tustvold commented on issue #537: Support min/max statistics in ParquetTable and ParquetExec

2022-10-19 Thread GitBox
tustvold commented on issue #537: URL: https://github.com/apache/arrow-datafusion/issues/537#issuecomment-1284744800 I believe this is now supported, but feel free to reopen if I am mistaken -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] tustvold closed issue #1383: Reading nested parquet files results in `index out of bounds`

2022-10-19 Thread GitBox
tustvold closed issue #1383: Reading nested parquet files results in `index out of bounds` URL: https://github.com/apache/arrow-datafusion/issues/1383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] tustvold commented on issue #1383: Reading nested parquet files results in `index out of bounds`

2022-10-19 Thread GitBox
tustvold commented on issue #1383: URL: https://github.com/apache/arrow-datafusion/issues/1383#issuecomment-1284743894 This appears to now work correctly, I suspect it was fixed by https://github.com/apache/arrow-rs/pull/1588 -- This is an automated message from the Apache Git Service. T

[GitHub] [arrow] kou closed pull request #14431: ARROW-18071: [Dev][Archery][Crossbow] Show body from server on upload error

2022-10-19 Thread GitBox
kou closed pull request #14431: ARROW-18071: [Dev][Archery][Crossbow] Show body from server on upload error URL: https://github.com/apache/arrow/pull/14431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kou commented on pull request #14431: ARROW-18071: [Dev][Archery][Crossbow] Show body from server on upload error

2022-10-19 Thread GitBox
kou commented on PR #14431: URL: https://github.com/apache/arrow/pull/14431#issuecomment-1284743026 Close in favor of #14462. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r141720 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -698,12 +744,15 @@ Result> GetValueArgs(const SubstraitCall& call, ExtensionIdRegistry::SubstraitCallToAr

[GitHub] [arrow-datafusion] tustvold commented on issue #825: Add documentation for support for skipping Parquet row groups

2022-10-19 Thread GitBox
tustvold commented on issue #825: URL: https://github.com/apache/arrow-datafusion/issues/825#issuecomment-1284741470 FWIW this may overlap with #3464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r140728 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -61,6 +62,81 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { Ok(tmp_result)

[GitHub] [arrow-datafusion] tustvold closed issue #1058: Use `parquet2` async reader in `physical_plan/parquet`

2022-10-19 Thread GitBox
tustvold closed issue #1058: Use `parquet2` async reader in `physical_plan/parquet` URL: https://github.com/apache/arrow-datafusion/issues/1058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r140264 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -678,17 +693,48 @@ static EnumParser kOverflowParser = GetEnumParser(kOverflowOptions); template -

[GitHub] [arrow-datafusion] tustvold commented on issue #1058: Use `parquet2` async reader in `physical_plan/parquet`

2022-10-19 Thread GitBox
tustvold commented on issue #1058: URL: https://github.com/apache/arrow-datafusion/issues/1058#issuecomment-1284739319 Closing as duplicate of #1532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r139870 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -640,6 +640,155 @@ impl PhysicalExpr for BinaryExpr { self.evaluate_with_resolved_arg

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r139432 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -640,6 +640,155 @@ impl PhysicalExpr for BinaryExpr { self.evaluate_with_resolved_arg

[GitHub] [arrow-datafusion] isidentical opened a new issue, #3898: A framework for expression boundary analysis (and statistics)

2022-10-19 Thread GitBox
isidentical opened a new issue, #3898: URL: https://github.com/apache/arrow-datafusion/issues/3898 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** TBD **Describe the solution you'd like** TBD **Describe alternative

[GitHub] [arrow-datafusion] tustvold commented on issue #2044: wrong result when operation parquet

2022-10-19 Thread GitBox
tustvold commented on issue #2044: URL: https://github.com/apache/arrow-datafusion/issues/2044#issuecomment-1284735428 I think this should have been resolved by https://github.com/apache/arrow-rs/pull/1682, could you let me know if the issue still persists? -- This is an automated messa

[GitHub] [arrow-datafusion] tustvold closed issue #1341: Benchmark `constellation-rs/amadeus`'s parquet implementation

2022-10-19 Thread GitBox
tustvold closed issue #1341: Benchmark `constellation-rs/amadeus`'s parquet implementation URL: https://github.com/apache/arrow-datafusion/issues/1341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] tustvold commented on issue #1341: Benchmark `constellation-rs/amadeus`'s parquet implementation

2022-10-19 Thread GitBox
tustvold commented on issue #1341: URL: https://github.com/apache/arrow-datafusion/issues/1341#issuecomment-1284733387 https://github.com/constellation-rs/amadeus appears to be abandoned so closing. _FWIW there was work in February of this year to port across many of the ideas from

[GitHub] [arrow-datafusion] tustvold commented on issue #83: [Rust] Parquet data source does not support complex types

2022-10-19 Thread GitBox
tustvold commented on issue #83: URL: https://github.com/apache/arrow-datafusion/issues/83#issuecomment-1284728218 I believe this issue can now be closed, as of https://github.com/apache/arrow-rs/pull/2500 parquet has full support for arbitrarily nested types. Feel free to reopen if I have

[GitHub] [arrow-datafusion] tustvold closed issue #83: [Rust] Parquet data source does not support complex types

2022-10-19 Thread GitBox
tustvold closed issue #83: [Rust] Parquet data source does not support complex types URL: https://github.com/apache/arrow-datafusion/issues/83 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] amoeba commented on pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on PR #14452: URL: https://github.com/apache/arrow/pull/14452#issuecomment-1284724655 Thanks for the review @westonpace, this is really helpful. I have some language to work on and I'll update here when this is ready for another review. -- This is an automated message fro

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r131167 ## docs/source/cpp/csv.rst: ## @@ -56,19 +67,84 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r130908 ## docs/source/cpp/csv.rst: ## @@ -56,19 +67,84 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r130204 ## docs/source/cpp/csv.rst: ## @@ -56,19 +67,84 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] github-actions[bot] commented on pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
github-actions[bot] commented on PR #14462: URL: https://github.com/apache/arrow/pull/14462#issuecomment-1284719463 Revision: 9e48a87679cd9f6d9791e901222825ebe9732067 Submitted crossbow builds: [ursacomputing/crossbow @ actions-777aa91d6c](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
kou commented on PR #14462: URL: https://github.com/apache/arrow/pull/14462#issuecomment-1284715866 @github-actions crossbow submit -g linux -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] github-actions[bot] commented on pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
github-actions[bot] commented on PR #14462: URL: https://github.com/apache/arrow/pull/14462#issuecomment-1284715786 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
github-actions[bot] commented on PR #14462: URL: https://github.com/apache/arrow/pull/14462#issuecomment-1284715582 https://issues.apache.org/jira/browse/ARROW-18103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r125482 ## docs/source/cpp/csv.rst: ## @@ -56,19 +67,84 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] kou opened a new pull request, #14462: ARROW-18103: [Packaging][deb][RPM] Fix upload artifacts patterns

2022-10-19 Thread GitBox
kou opened a new pull request, #14462: URL: https://github.com/apache/arrow/pull/14462 The current patterns may match multiple files that have the same base name. For example: * arrow/dev/tasks/linux-packages/apache-arrow/apt/repositories/debian/pool/bookworm/main/a/apache-arrow/liba

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r124176 ## docs/source/cpp/csv.rst: ## @@ -25,15 +25,26 @@ Reading and Writing CSV files = Arrow provides a fast CSV reader allowing ingestion of

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r123812 ## docs/source/cpp/csv.rst: ## @@ -25,15 +25,26 @@ Reading and Writing CSV files = Arrow provides a fast CSV reader allowing ingestion of

[GitHub] [arrow] westonpace commented on a diff in pull request #14415: ARROW-17966: [C++] Adjust to new format for Substrait optional arguments

2022-10-19 Thread GitBox
westonpace commented on code in PR #14415: URL: https://github.com/apache/arrow/pull/14415#discussion_r121551 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -156,15 +156,33 @@ Result SubstraitCall::GetValueArg(uint32_t index) const { return value_arg_it->secon

[GitHub] [arrow] kou commented on pull request #14255: ARROW-17871: [Go] initial binary arithmetic implementation

2022-10-19 Thread GitBox
kou commented on PR #14255: URL: https://github.com/apache/arrow/pull/14255#issuecomment-1284706658 > @kou If you add the mingw64 path to your System level path, reboot, and then try running an exe through the Run Dialog (windows key+R) or through file explorer, then it'll give you those er

[GitHub] [arrow] westonpace commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
westonpace commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r108261 ## docs/source/cpp/csv.rst: ## @@ -25,15 +25,26 @@ Reading and Writing CSV files = Arrow provides a fast CSV reader allowing ingestio

[GitHub] [arrow] amoeba commented on pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on PR #14452: URL: https://github.com/apache/arrow/pull/14452#issuecomment-1284633523 Thanks so much @pitrou for taking a look. I made all your suggested changes in atomic commits. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r80886 ## docs/source/cpp/csv.rst: ## @@ -56,19 +65,88 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r81661 ## docs/source/cpp/csv.rst: ## @@ -56,19 +65,88 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r81582 ## docs/source/cpp/csv.rst: ## @@ -25,15 +25,24 @@ Reading and Writing CSV files = Arrow provides a fast CSV reader allowing ingestion of

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r81762 ## docs/source/cpp/csv.rst: ## @@ -56,19 +65,88 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r81582 ## docs/source/cpp/csv.rst: ## @@ -25,15 +25,24 @@ Reading and Writing CSV files = Arrow provides a fast CSV reader allowing ingestion of

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r81695 ## docs/source/cpp/csv.rst: ## @@ -275,11 +353,13 @@ Write Options The format of written CSV files can be customized via :class:`~arrow::csv::WriteOptions`. Currently

[GitHub] [arrow] amoeba commented on a diff in pull request #14452: ARROW-15328: [C++][Docs] Streaming CSV reader missing from documentation

2022-10-19 Thread GitBox
amoeba commented on code in PR #14452: URL: https://github.com/apache/arrow/pull/14452#discussion_r80886 ## docs/source/cpp/csv.rst: ## @@ -56,19 +65,88 @@ A CSV file is read from a :class:`~arrow::io::InputStream`. parse_options,

[GitHub] [arrow-ballista] yahoNanJing merged pull request #393: Cache encoded stage plan

2022-10-19 Thread GitBox
yahoNanJing merged PR #393: URL: https://github.com/apache/arrow-ballista/pull/393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@ar

[GitHub] [arrow-ballista] yahoNanJing closed issue #142: Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization

2022-10-19 Thread GitBox
yahoNanJing closed issue #142: Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization URL: https://github.com/apache/arrow-ballista/issues/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-ballista] yahoNanJing merged pull request #392: Remove active execution graph when the related job is successful or failed

2022-10-19 Thread GitBox
yahoNanJing merged PR #392: URL: https://github.com/apache/arrow-ballista/pull/392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@ar

[GitHub] [arrow-ballista] yahoNanJing closed issue #391: Remove active execution graph when the related job is successful or failed.

2022-10-19 Thread GitBox
yahoNanJing closed issue #391: Remove active execution graph when the related job is successful or failed. URL: https://github.com/apache/arrow-ballista/issues/391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r68364 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -61,6 +62,81 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { Ok(tmp_result)

[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #403: Enable scheduler UI in docker builds

2022-10-19 Thread GitBox
avantgardnerio commented on code in PR #403: URL: https://github.com/apache/arrow-ballista/pull/403#discussion_r66092 ## dev/docker/ballista-scheduler.Dockerfile: ## @@ -24,7 +24,16 @@ ENV RUST_LOG=info ENV RUST_BACKTRACE=full ENV DEBIAN_FRONTEND=noninteractive -RUN apt-

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
alamb commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r64895 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -61,6 +62,81 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { Ok(tmp_result)

[GitHub] [arrow-datafusion] alamb commented on pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
alamb commented on PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284609702 > My only worry is that a technically correct version should produce ExprBoundaries {min: false, max:true} Yes, you are correct -- and in fact that is good information (it m

[GitHub] [arrow-datafusion] isidentical commented on pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284600945 That is definitely an interesting point of view 👀 I was thinking more restricted towards what the filter's outcome would be (what sort of `a`'s can there be after we execute

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
isidentical commented on code in PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#discussion_r46718 ## datafusion/physical-expr/src/physical_expr.rs: ## @@ -61,6 +62,81 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug { Ok(tmp_result)

[GitHub] [arrow] amoeba commented on pull request #13983: ARROW-15006: [Python][CI][Doc] Enable numpydoc check PR03

2022-10-19 Thread GitBox
amoeba commented on PR #13983: URL: https://github.com/apache/arrow/pull/13983#issuecomment-1284588549 Thanks for the review @jorisvandenbossche. I didn't mean to remove that bit and I'm not sure how I did. Good catch! All three instances of that param docstring now have the expected text.

[GitHub] [arrow] amoeba commented on a diff in pull request #13983: ARROW-15006: [Python][CI][Doc] Enable numpydoc check PR03

2022-10-19 Thread GitBox
amoeba commented on code in PR #13983: URL: https://github.com/apache/arrow/pull/13983#discussion_r44630 ## python/pyarrow/_dataset.pyx: ## @@ -2397,12 +2396,14 @@ cdef class Scanner(_Weakrefable): record batches are overflowing memory then this method can be

[GitHub] [arrow] amoeba commented on a diff in pull request #13983: ARROW-15006: [Python][CI][Doc] Enable numpydoc check PR03

2022-10-19 Thread GitBox
amoeba commented on code in PR #13983: URL: https://github.com/apache/arrow/pull/13983#discussion_r44404 ## python/pyarrow/_dataset.pyx: ## @@ -2484,9 +2482,11 @@ cdef class Scanner(_Weakrefable): record batches are overflowing memory then this method can be

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2890: Respect Page Size Limits in ArrowWriter (#2853)

2022-10-19 Thread GitBox
tustvold commented on code in PR #2890: URL: https://github.com/apache/arrow-rs/pull/2890#discussion_r42326 ## parquet/tests/arrow_writer_layout.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2890: Respect Page Size Limits in ArrowWriter (#2853)

2022-10-19 Thread GitBox
tustvold commented on code in PR #2890: URL: https://github.com/apache/arrow-rs/pull/2890#discussion_r42326 ## parquet/tests/arrow_writer_layout.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2890: Respect Page Size Limits in ArrowWriter (#2853)

2022-10-19 Thread GitBox
tustvold commented on code in PR #2890: URL: https://github.com/apache/arrow-rs/pull/2890#discussion_r42326 ## parquet/tests/arrow_writer_layout.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2890: Respect Page Size Limits in ArrowWriter (#2853)

2022-10-19 Thread GitBox
tustvold commented on code in PR #2890: URL: https://github.com/apache/arrow-rs/pull/2890#discussion_r42326 ## parquet/tests/arrow_writer_layout.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

[GitHub] [arrow-rs] alamb commented on a diff in pull request #2890: Respect Page Size Limits in ArrowWriter (#2853)

2022-10-19 Thread GitBox
alamb commented on code in PR #2890: URL: https://github.com/apache/arrow-rs/pull/2890#discussion_r40817 ## parquet/tests/arrow_writer_layout.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-datafusion] comphead commented on pull request #3890: skip failing tests default to false

2022-10-19 Thread GitBox
comphead commented on PR #3890: URL: https://github.com/apache/arrow-datafusion/pull/3890#issuecomment-1284582166 I'm wondering what is the point of test that works with enabled `skip_failing_rules`, but will fail otherwise. Shouldn't be they just ignored and fixed later, as now it l

[GitHub] [arrow-datafusion] alamb commented on issue #3887: Consolidate SessionConfig and ConfigOptions

2022-10-19 Thread GitBox
alamb commented on issue #3887: URL: https://github.com/apache/arrow-datafusion/issues/3887#issuecomment-1284580793 > Do you think we should leave them in SessionConfig or move into ConfigOptions? I think if there are specialized setters they can do the validation -- so if left the

[GitHub] [arrow-datafusion] alamb commented on pull request #3868: Implement foundational filter selectivity analysis

2022-10-19 Thread GitBox
alamb commented on PR #3868: URL: https://github.com/apache/arrow-datafusion/pull/3868#issuecomment-1284578987 > @alamb would you mind giving an example? Expression boundaries for a boolean field (like a, or true) would be ExpressionBoundaries { min: true, max: true, distinct: 1}. And for

  1   2   3   4   >