[GitHub] [arrow] fgenoese commented on issue #29844: [Python] Cannot convert pd.DataFrame with geometry cells to pa.Table

2023-03-11 Thread via GitHub
fgenoese commented on issue #29844: URL: https://github.com/apache/arrow/issues/29844#issuecomment-1465114620 This issue seems to be connected: https://github.com/streamlit/streamlit/issues/1002#issuecomment-916885614 Basically, when loading a geojson with geopandas the shape fails

[GitHub] [arrow] dinimar commented on pull request #34537: GH-14939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
dinimar commented on PR #34537: URL: https://github.com/apache/arrow/pull/34537#issuecomment-1465107553 cc @rok @pitrou @benibus -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow-julia] codecov-commenter commented on pull request #395: Implement Tables.columnnames and Tables.schema for Arrow.Stream

2023-03-11 Thread via GitHub
codecov-commenter commented on PR #395: URL: https://github.com/apache/arrow-julia/pull/395#issuecomment-1465099481 ## [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/395?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apac

[GitHub] [arrow-datafusion] yukkit opened a new pull request, #5559: improve: support combining multiple grouping expressions

2023-03-11 Thread via GitHub
yukkit opened a new pull request, #5559: URL: https://github.com/apache/arrow-datafusion/pull/5559 # Which issue does this PR close? Closes #5361 . # Rationale for this change # What changes are included in this PR? # Are these changes teste

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5555: feat: extract (epoch from col)

2023-03-11 Thread via GitHub
comphead commented on code in PR #: URL: https://github.com/apache/arrow-datafusion/pull/#discussion_r1133172510 ## datafusion/core/tests/sql/expr.rs: ## @@ -1313,6 +1313,23 @@ async fn test_extract_date_part() -> Result<()> { Ok(()) } +#[tokio::test] +async fn t

[GitHub] [arrow-rs] viirya commented on pull request #3846: feat: add compression options

2023-03-11 Thread via GitHub
viirya commented on PR #3846: URL: https://github.com/apache/arrow-rs/pull/3846#issuecomment-1465067951 https://github.com/apache/arrow-rs/blob/9ce0ebb06550be943febc226f61bf083016d7652/parquet/src/format.rs#L453 -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow-julia] baumgold commented on issue #393: Benchmark of Arrow.jl vs Pyarrow (/Polars)

2023-03-11 Thread via GitHub
baumgold commented on issue #393: URL: https://github.com/apache/arrow-julia/issues/393#issuecomment-1465065885 I’m certainly interested! Thanks for this hard work, @svilupp ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] Weijun-H opened a new pull request, #3846: feat: add compression options

2023-03-11 Thread via GitHub
Weijun-H opened a new pull request, #3846: URL: https://github.com/apache/arrow-rs/pull/3846 # Which issue does this PR close? Closes #3844 # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-datafusion] Jefffrey opened a new issue, #5558: Dataframe describe() method cannot handle dataframes without a numeric type column

2023-03-11 Thread via GitHub
Jefffrey opened a new issue, #5558: URL: https://github.com/apache/arrow-datafusion/issues/5558 **Describe the bug** If calling `describe(...)` on a dataframe without a numeric type method then it returns an error. **To Reproduce** ```rust ctx.sql("select

[GitHub] [arrow-datafusion] Jefffrey opened a new issue, #5557: median on empty input returns confusing error

2023-03-11 Thread via GitHub
Jefffrey opened a new issue, #5557: URL: https://github.com/apache/arrow-datafusion/issues/5557 **Describe the bug** If trying to get median on empty input/dataframe a confusing error is returned **To Reproduce** ```sql DataFusion CLI v20.0.0 ❯ select media

[GitHub] [arrow-datafusion] Jefffrey commented on pull request #5556: Revert describe count() workaround

2023-03-11 Thread via GitHub
Jefffrey commented on PR #5556: URL: https://github.com/apache/arrow-datafusion/pull/5556#issuecomment-1465044015 cc @jiangzhx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] ianmcook commented on issue #33958: [Format][Docs] Document primitive types in the Arrow format docs

2023-03-11 Thread via GitHub
ianmcook commented on issue #33958: URL: https://github.com/apache/arrow/issues/33958#issuecomment-1465043962 Duplicate of 14752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-datafusion] Jefffrey opened a new pull request, #5556: Revert describe count() workaround

2023-03-11 Thread via GitHub
Jefffrey opened a new pull request, #5556: URL: https://github.com/apache/arrow-datafusion/pull/5556 # Which issue does this PR close? Closes #. # Rationale for this change Revert workaround applied by https://github.com/apache/arrow-datafusion/pull/5468

[GitHub] [arrow-datafusion] jychen7 commented on issue #5547: Improve the performance of COUNT DISTINCT queries for high cardinality groups

2023-03-11 Thread via GitHub
jychen7 commented on issue #5547: URL: https://github.com/apache/arrow-datafusion/issues/5547#issuecomment-1465043866 I am not sure how it may inspire Datafusion yet, just for reference, there are two improvements in DuckDB about parallelize `distinct` - without groupby, https://github.c

[GitHub] [arrow] ianmcook commented on issue #33958: [Format][Docs] Document primitive types in the Arrow format docs

2023-03-11 Thread via GitHub
ianmcook commented on issue #33958: URL: https://github.com/apache/arrow/issues/33958#issuecomment-1465043799 Duplicate of #4752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] ianmcook commented on issue #33958: [Format][Docs] Document primitive types in the Arrow format docs

2023-03-11 Thread via GitHub
ianmcook commented on issue #33958: URL: https://github.com/apache/arrow/issues/33958#issuecomment-1465043586 Closed as dup of #4752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow-datafusion] Weijun-H opened a new pull request, #5555: feat: extract (epoch from col)

2023-03-11 Thread via GitHub
Weijun-H opened a new pull request, #: URL: https://github.com/apache/arrow-datafusion/pull/ # Which issue does this PR close? Closes #2785 # Rationale for this change Explain #2785 # What changes are included in this PR? Add `EXTRACT( EPOCH

[GitHub] [arrow-datafusion] ozankabak commented on pull request #5290: TopDown EnforceSorting implementation

2023-03-11 Thread via GitHub
ozankabak commented on PR #5290: URL: https://github.com/apache/arrow-datafusion/pull/5290#issuecomment-1465039603 @mingmwang, it seems you are busy these days. I think it might be a good idea to create a PR to get the new/extended test suite and a base (passing) implementation in.

[GitHub] [arrow-datafusion] jaylmiller commented on issue #258: Improve performance of COUNT (distinct x) for dictionary columns

2023-03-11 Thread via GitHub
jaylmiller commented on issue #258: URL: https://github.com/apache/arrow-datafusion/issues/258#issuecomment-1465038530 I've made a little PR for this. But I'm not sure about how to go about measuring the performance improvements... @alamb do you know of any existing benches in the codebase

[GitHub] [arrow] rok commented on issue #34536: [C++][Parquet] Benchmark and maybe Override DeltaBitPackEncoder Defaults

2023-03-11 Thread via GitHub
rok commented on issue #34536: URL: https://github.com/apache/arrow/issues/34536#issuecomment-1465035138 Benchmarking with overridden defaults makes a lot of sense yes! Do you think different bitwidth ranges (of random data) have different optimal results? If they have strong influence we m

[GitHub] [arrow-datafusion] jaylmiller opened a new pull request, #5554: Improve performance of COUNT (distinct x) for dictionary columns #258

2023-03-11 Thread via GitHub
jaylmiller opened a new pull request, #5554: URL: https://github.com/apache/arrow-datafusion/pull/5554 # Which issue does this PR close? Closes #258. # Rationale for this change The count distinct physical expr was doing alot of unnecessary hashing when it is ran on dictionary t

[GitHub] [arrow] ianmcook commented on issue #34255: [Website] Add information about meetings to Community page

2023-03-11 Thread via GitHub
ianmcook commented on issue #34255: URL: https://github.com/apache/arrow/issues/34255#issuecomment-1465028475 https://arrow.apache.org/community/#meetings -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-julia] svilupp commented on issue #393: Benchmark of Arrow.jl vs Pyarrow (/Polars)

2023-03-11 Thread via GitHub
svilupp commented on issue #393: URL: https://github.com/apache/arrow-julia/issues/393#issuecomment-1465018021 I’ve already implemented most of the changes locally. I’ll post some benchmarks and learnings here tomorrow, and open the relevant PRs, if there is interest. -- This is an autom

[GitHub] [arrow-flight-sql-postgresql] lidavidm commented on pull request #28: Add benchmark for integer only data

2023-03-11 Thread via GitHub
lidavidm commented on PR #28: URL: https://github.com/apache/arrow-flight-sql-postgresql/pull/28#issuecomment-1465017593 Ah, thanks for the clarification. That's even better! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-flight-sql-postgresql] kou commented on pull request #28: Add benchmark for integer only data

2023-03-11 Thread via GitHub
kou commented on PR #28: URL: https://github.com/apache/arrow-flight-sql-postgresql/pull/28#issuecomment-1465001257 I reconsidered `COPY`. We may not need to use `COPY` because we don't use the PostgreSQL's wire protocol. We use SPI (Server Programming Interface, https://www.postgre

[GitHub] [arrow-datafusion-python] dependabot[bot] opened a new pull request, #267: build(deps): bump futures from 0.3.26 to 0.3.27

2023-03-11 Thread via GitHub
dependabot[bot] opened a new pull request, #267: URL: https://github.com/apache/arrow-datafusion-python/pull/267 Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.26 to 0.3.27. Release notes Sourced from https://github.com/rust-lang/futures-rs/releases";>futures's

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #5542: fix: failed to execute sql with subquery

2023-03-11 Thread via GitHub
comphead commented on code in PR #5542: URL: https://github.com/apache/arrow-datafusion/pull/5542#discussion_r1133137618 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -390,6 +390,22 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {

[GitHub] [arrow] dinimar commented on issue #33957: [C++] Add Rank chunked array benchmarks

2023-03-11 Thread via GitHub
dinimar commented on issue #33957: URL: https://github.com/apache/arrow/issues/33957#issuecomment-1464991495 @pitrou I'm new at benchmarks in this project. Could you please explain which file should be changed and provide some examples of benchmarks? -- This is an automated message from t

[GitHub] [arrow-datafusion] ursabot commented on pull request #5485: make AggregateStatistics return the same result whether optimizer disabled or enabled

2023-03-11 Thread via GitHub
ursabot commented on PR #5485: URL: https://github.com/apache/arrow-datafusion/pull/5485#issuecomment-1464976198 Benchmark runs are scheduled for baseline = c5ae3e80cde3ba4b70f6e2698652b87bd2302e81 and contender = ecbc843a1a8c38b2466748bc92a6e22ce08d51ed. ecbc843a1a8c38b2466748bc92a6e22ce

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5485: make AggregateStatistics return the same result whether optimizer disabled or enabled

2023-03-11 Thread via GitHub
alamb commented on code in PR #5485: URL: https://github.com/apache/arrow-datafusion/pull/5485#discussion_r1133130150 ## datafusion/core/tests/sql/aggregates.rs: ## @@ -99,11 +99,11 @@ async fn aggregate_timestamps_count() -> Result<()> { .await; let expected = vec![

[GitHub] [arrow-datafusion] alamb closed issue #5444: Expr.alias function not work with count aggregation

2023-03-11 Thread via GitHub
alamb closed issue #5444: Expr.alias function not work with count aggregation URL: https://github.com/apache/arrow-datafusion/issues/5444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] alamb merged pull request #5485: make AggregateStatistics return the same result whether optimizer disabled or enabled

2023-03-11 Thread via GitHub
alamb merged PR #5485: URL: https://github.com/apache/arrow-datafusion/pull/5485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] ursabot commented on pull request #5552: Remove unused dependencies found by cargo-machete

2023-03-11 Thread via GitHub
ursabot commented on PR #5552: URL: https://github.com/apache/arrow-datafusion/pull/5552#issuecomment-1464970160 Benchmark runs are scheduled for baseline = dd98aabdaebcfc30ec4c370be93f6663de50e02f and contender = c5ae3e80cde3ba4b70f6e2698652b87bd2302e81. c5ae3e80cde3ba4b70f6e2698652b87bd

[GitHub] [arrow-datafusion] alamb commented on pull request #5536: Avoid circular(ish) dependency parquet-test-utils on datafusion, try 2

2023-03-11 Thread via GitHub
alamb commented on PR #5536: URL: https://github.com/apache/arrow-datafusion/pull/5536#issuecomment-1464967666 It appears that using parquet-test-utils was masking a breakage in the feature flag for regular expressions. I fixed it in https://github.com/apache/arrow-datafusion/pull/5536/co

[GitHub] [arrow-datafusion] alamb merged pull request #5552: Remove unused dependencies found by cargo-machete

2023-03-11 Thread via GitHub
alamb merged PR #5552: URL: https://github.com/apache/arrow-datafusion/pull/5552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] mapleFU commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

2023-03-11 Thread via GitHub
mapleFU commented on issue #34510: URL: https://github.com/apache/arrow/issues/34510#issuecomment-1464950991 Learned a lot, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow-julia] quinnj merged pull request #394: define eltype for Stream

2023-03-11 Thread via GitHub
quinnj merged PR #394: URL: https://github.com/apache/arrow-julia/pull/394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-julia] quinnj commented on issue #391: Precompilation broken on Julia 1.9-rc1

2023-03-11 Thread via GitHub
quinnj commented on issue #391: URL: https://github.com/apache/arrow-julia/issues/391#issuecomment-1464949249 H.I do see the issue on 1.9-rc1, but not on a recent julia#master (as of 4 days ago). -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [arrow-julia] codecov-commenter commented on pull request #394: define eltype for Stream

2023-03-11 Thread via GitHub
codecov-commenter commented on PR #394: URL: https://github.com/apache/arrow-julia/pull/394#issuecomment-1464949140 ## [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/394?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apac

[GitHub] [arrow-rs] tustvold commented on pull request #3835: Support timestamp/time and date json decoding

2023-03-11 Thread via GitHub
tustvold commented on PR #3835: URL: https://github.com/apache/arrow-rs/pull/3835#issuecomment-1464948773 FYI added timezone support in https://github.com/apache/arrow-rs/pull/3845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3845: Add timezone support to JSON reader

2023-03-11 Thread via GitHub
tustvold commented on code in PR #3845: URL: https://github.com/apache/arrow-rs/pull/3845#discussion_r1133117710 ## arrow-array/src/types.rs: ## @@ -287,30 +287,28 @@ impl ArrowTemporalType for DurationMicrosecondType {} impl ArrowTemporalType for DurationNanosecondType {} /

[GitHub] [arrow-rs] tustvold opened a new pull request, #3845: Add timezone support to JSON reader

2023-03-11 Thread via GitHub
tustvold opened a new pull request, #3845: URL: https://github.com/apache/arrow-rs/pull/3845 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

[GitHub] [arrow-julia] ericphanson commented on issue #393: Benchmark of Arrow.jl vs Pyarrow (/Polars)

2023-03-11 Thread via GitHub
ericphanson commented on issue #393: URL: https://github.com/apache/arrow-julia/issues/393#issuecomment-1464942972 Oops, misclick -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow-julia] quinnj commented on issue #393: Benchmark of Arrow.jl vs Pyarrow (/Polars)

2023-03-11 Thread via GitHub
quinnj commented on issue #393: URL: https://github.com/apache/arrow-julia/issues/393#issuecomment-1464942431 Wow! Thanks for the detailed research/investigation/writeup @svilupp! I think most of what you mentioned all sounds like things we should indeed do. I'm currently bogged down in a f

[GitHub] [arrow-rs] alamb commented on pull request #3839: Add as_any() to the ObjectStore to make it able to identify which ObjectStore is using for the related trait object

2023-03-11 Thread via GitHub
alamb commented on PR #3839: URL: https://github.com/apache/arrow-rs/pull/3839#issuecomment-1464923448 > I would like to understand the use-case more before merging this, I wonder if it is something like flushing the cache, or being able to take some different action if the item i

[GitHub] [arrow-datafusion] ursabot commented on pull request #5553: minor: improve sqllogictest docs

2023-03-11 Thread via GitHub
ursabot commented on PR #5553: URL: https://github.com/apache/arrow-datafusion/pull/5553#issuecomment-1464920466 Benchmark runs are scheduled for baseline = aaa4e1496ba89daa03de47dc1feed7d7e57f62a8 and contender = dd98aabdaebcfc30ec4c370be93f6663de50e02f. dd98aabdaebcfc30ec4c370be93f6663d

[GitHub] [arrow-datafusion] jackwener merged pull request #5553: minor: improve sqllogictest docs

2023-03-11 Thread via GitHub
jackwener merged PR #5553: URL: https://github.com/apache/arrow-datafusion/pull/5553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5553: minor: improve sqllogictest docs

2023-03-11 Thread via GitHub
jackwener commented on code in PR #5553: URL: https://github.com/apache/arrow-datafusion/pull/5553#discussion_r1133088625 ## datafusion/core/tests/sqllogictests/README.md: ## @@ -19,25 +19,41 @@ Overview -This is the Datafusion implementation of [sqllogictest](https:/

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3802: object_store: (GCP) Add support for Workload Identity Federation from AWS

2023-03-11 Thread via GitHub
tustvold commented on code in PR #3802: URL: https://github.com/apache/arrow-rs/pull/3802#discussion_r1133079167 ## object_store/src/aws/credential.rs: ## @@ -84,7 +84,7 @@ const AUTH_HEADER: &str = "authorization"; const ALL_HEADERS: &[&str; 4] = &[DATE_HEADER, HASH_HEADER, TO

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3802: object_store: (GCP) Add support for Workload Identity Federation from AWS

2023-03-11 Thread via GitHub
tustvold commented on code in PR #3802: URL: https://github.com/apache/arrow-rs/pull/3802#discussion_r1133078969 ## object_store/src/gcp/credential.rs: ## @@ -466,33 +474,216 @@ impl ApplicationDefaultCredentialsFile { } } -const DEFAULT_TOKEN_GCP_URI: &str = "https://a

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3802: object_store: (GCP) Add support for Workload Identity Federation from AWS

2023-03-11 Thread via GitHub
tustvold commented on code in PR #3802: URL: https://github.com/apache/arrow-rs/pull/3802#discussion_r1133078886 ## object_store/Cargo.toml: ## @@ -41,6 +41,7 @@ tokio = { version = "1.25.0", features = ["sync", "macros", "rt", "time", "io-ut tracing = { version = "0.1" } url

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3802: object_store: (GCP) Add support for Workload Identity Federation from AWS

2023-03-11 Thread via GitHub
tustvold commented on code in PR #3802: URL: https://github.com/apache/arrow-rs/pull/3802#discussion_r1133078703 ## .github/workflows/object_store.yml: ## @@ -56,7 +56,7 @@ jobs: - name: Run clippy with aws_profile feature run: cargo clippy -p object_store --feat

[GitHub] [arrow-rs] tustvold commented on pull request #3839: Add as_any() to the ObjectStore to make it able to identify which ObjectStore is using for the related trait object

2023-03-11 Thread via GitHub
tustvold commented on PR #3839: URL: https://github.com/apache/arrow-rs/pull/3839#issuecomment-1464902004 I would like to understand the use-case more before merging this, I'm not a massive fan of breaking encapsulation in this way, it undermines the decorator pattern, and would rather avoi

[GitHub] [arrow-datafusion] ursabot commented on pull request #5511: Simplify simplify test cases, support `^`, `&`, `|`, `<<` and `>>` operators for building exprs

2023-03-11 Thread via GitHub
ursabot commented on PR #5511: URL: https://github.com/apache/arrow-datafusion/pull/5511#issuecomment-1464900576 Benchmark runs are scheduled for baseline = 9587339b0fb060f8d153bbb0f8de6a740195ccea and contender = aaa4e1496ba89daa03de47dc1feed7d7e57f62a8. aaa4e1496ba89daa03de47dc1feed7d7e

[GitHub] [arrow-datafusion] alamb merged pull request #5511: Simplify simplify test cases, support `^`, `&`, `|`, `<<` and `>>` operators for building exprs

2023-03-11 Thread via GitHub
alamb merged PR #5511: URL: https://github.com/apache/arrow-datafusion/pull/5511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on pull request #5553: minor: improve sqllogictest docs

2023-03-11 Thread via GitHub
alamb commented on PR #5553: URL: https://github.com/apache/arrow-datafusion/pull/5553#issuecomment-1464899289 FYI @melgenek -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5553: minor: improve sqllogictest docs

2023-03-11 Thread via GitHub
alamb opened a new pull request, #5553: URL: https://github.com/apache/arrow-datafusion/pull/5553 # Which issue does this PR close? N/A # Rationale for this change I keep pointing / suggesting people use these tests so I want to make it as easy as possible to use

[GitHub] [arrow] dinimar commented on pull request #34537: GH-14939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
dinimar commented on PR #34537: URL: https://github.com/apache/arrow/pull/34537#issuecomment-1464897845 @benibus please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow-datafusion] alamb commented on pull request #5536: Avoid circular(ish) dependency parquet-test-utils on datafusion, try 2

2023-03-11 Thread via GitHub
alamb commented on PR #5536: URL: https://github.com/apache/arrow-datafusion/pull/5536#issuecomment-1464897563 > I do wonder if the test_util module should be behind some feature flag, but definitely not a blocker to this PR Yeah, I agree that would probably be a good idea; Let's do

[GitHub] [arrow-datafusion] alamb commented on pull request #5509: Enforce ambiguity check whilst normalizing columns

2023-03-11 Thread via GitHub
alamb commented on PR #5509: URL: https://github.com/apache/arrow-datafusion/pull/5509#issuecomment-1464897079 Thank you @ygf11 for the review I plan to merge this PR tomorrow unless anyone else would like more time to review. -- This is an automated message from the Apache Git

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5542: fix: failed to execute sql with subquery

2023-03-11 Thread via GitHub
alamb commented on code in PR #5542: URL: https://github.com/apache/arrow-datafusion/pull/5542#discussion_r1133074974 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -390,6 +390,22 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3839: Add as_any() to the ObjectStore to make it able to identify which ObjectStore is using for the related trait object

2023-03-11 Thread via GitHub
alamb commented on code in PR #3839: URL: https://github.com/apache/arrow-rs/pull/3839#discussion_r1133074897 ## object_store/src/aws/mod.rs: ## @@ -169,6 +170,10 @@ impl std::fmt::Display for AmazonS3 { #[async_trait] impl ObjectStore for AmazonS3 { +fn as_any(&self) ->

[GitHub] [arrow-rs] alamb commented on pull request #3839: Add as_any() to the ObjectStore to make it able to identify which ObjectStore is using for the related trait object

2023-03-11 Thread via GitHub
alamb commented on PR #3839: URL: https://github.com/apache/arrow-rs/pull/3839#issuecomment-1464895729 There appears to be some CI errors. I think adding a `as_any` function is probably fine> it would have wider support from older rust versions, and there are many existing examples o

[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #5509: Enforce ambiguity check whilst normalizing columns

2023-03-11 Thread via GitHub
ygf11 commented on code in PR #5509: URL: https://github.com/apache/arrow-datafusion/pull/5509#discussion_r1133074480 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -947,16 +939,16 @@ impl LogicalPlanBuilder { let right_key = r.into();

[GitHub] [arrow-rs] ursabot commented on pull request #3824: [ObjectStore] Add `append` API impl for `LocalFileSystem`

2023-03-11 Thread via GitHub
ursabot commented on PR #3824: URL: https://github.com/apache/arrow-rs/pull/3824#issuecomment-1464891545 Benchmark runs are scheduled for baseline = c96274a562625f091ca4c06fca21ac35ef330358 and contender = 9ce0ebb06550be943febc226f61bf083016d7652. 9ce0ebb06550be943febc226f61bf083016d7652 i

[GitHub] [arrow-rs] alamb closed issue #3742: Support for Async JSON Writer

2023-03-11 Thread via GitHub
alamb closed issue #3742: Support for Async JSON Writer URL: https://github.com/apache/arrow-rs/issues/3742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

[GitHub] [arrow-rs] alamb merged pull request #3824: [ObjectStore] Add `append` API impl for `LocalFileSystem`

2023-03-11 Thread via GitHub
alamb merged PR #3824: URL: https://github.com/apache/arrow-rs/pull/3824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb commented on pull request #3824: [ObjectStore] Add `append` API impl for `LocalFileSystem`

2023-03-11 Thread via GitHub
alamb commented on PR #3824: URL: https://github.com/apache/arrow-rs/pull/3824#issuecomment-1464891217 Thanks everyone@! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-rs] alamb closed issue #3740: Support for Async CSV Writer

2023-03-11 Thread via GitHub
alamb closed issue #3740: Support for Async CSV Writer URL: https://github.com/apache/arrow-rs/issues/3740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5520: INSERT INTO support for MemTable

2023-03-11 Thread via GitHub
alamb commented on code in PR #5520: URL: https://github.com/apache/arrow-datafusion/pull/5520#discussion_r1133071239 ## datafusion/core/src/datasource/memory.rs: ## @@ -143,22 +147,95 @@ impl TableProvider for MemTable { _filters: &[Expr], _limit: Option,

[GitHub] [arrow] dinimar commented on issue #14939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
dinimar commented on issue #14939: URL: https://github.com/apache/arrow/issues/14939#issuecomment-1464886636 @benibus Big thanks! You helped me a lot. I completed work on the issue. PR in progress -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [arrow] github-actions[bot] commented on pull request #34537: GH-14939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
github-actions[bot] commented on PR #34537: URL: https://github.com/apache/arrow/pull/34537#issuecomment-1464886291 * Closes: #14939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] github-actions[bot] commented on pull request #34537: GH014939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
github-actions[bot] commented on PR #34537: URL: https://github.com/apache/arrow/pull/34537#issuecomment-1464886131 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[GitHub] [arrow] dinimar opened a new pull request, #34537: GH014939: [C++] Support Table lookups in FieldRef and FieldPath

2023-03-11 Thread via GitHub
dinimar opened a new pull request, #34537: URL: https://github.com/apache/arrow/pull/34537 ### Rationale for this change Described in the issue ### What changes are included in this PR? - added implementations for `FieldPath::Get(const Table& table)` and `FindAll

[GitHub] [arrow-datafusion] ursabot commented on pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-11 Thread via GitHub
ursabot commented on PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345#issuecomment-1464881728 Benchmark runs are scheduled for baseline = 860918d17b6bde396b04d718ee1c76d93054bf11 and contender = 9587339b0fb060f8d153bbb0f8de6a740195ccea. 9587339b0fb060f8d153bbb0f8de6a740

[GitHub] [arrow-datafusion] jackwener closed issue #5344: Add back Distinct for where-exists if subquery is a DISTINCT

2023-03-11 Thread via GitHub
jackwener closed issue #5344: Add back Distinct for where-exists if subquery is a DISTINCT URL: https://github.com/apache/arrow-datafusion/issues/5344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] jackwener merged pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-11 Thread via GitHub
jackwener merged PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] ygf11 commented on pull request #5345: Refactor DecorrelateWhereExists and add back Distinct if needs

2023-03-11 Thread via GitHub
ygf11 commented on PR #5345: URL: https://github.com/apache/arrow-datafusion/pull/5345#issuecomment-1464870297 Thanks from your ideas, learned a lot! @mingmwang @alamb @jackwener And I fixed the merge conflict. -- This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow-rs] tustvold commented on pull request #3824: [ObjectStore] Add `append` API impl for `LocalFileSystem`

2023-03-11 Thread via GitHub
tustvold commented on PR #3824: URL: https://github.com/apache/arrow-rs/pull/3824#issuecomment-1464870290 Fine with me, was just a suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] tustvold commented on issue #34510: Reading FixedSizeList from parquet is slower than reading values into more rows

2023-03-11 Thread via GitHub
tustvold commented on issue #34510: URL: https://github.com/apache/arrow/issues/34510#issuecomment-1464864258 > why DELTA_BINARY_PACKED is deeply flawed The paper they link to actually explains why the approach is problematic - http://arxiv.org/pdf/1209.2137v5.pdf. The whole paper is

[GitHub] [arrow] leprechaunt33 commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

2023-03-11 Thread via GitHub
leprechaunt33 commented on issue #33049: URL: https://github.com/apache/arrow/issues/33049#issuecomment-1464857986 > > which only occurs when vaex is forced to do a df.take on rows which contain a string column whose unfiltered in memory representation is larger than 2GB. > > That so