Re: [PR] feat: impl the basic `string_agg` function [arrow-datafusion]

2023-11-14 Thread via GitHub
2010YOUY01 commented on PR #8148: URL: https://github.com/apache/arrow-datafusion/pull/8148#issuecomment-1811962384 Thank you! The implementation looks good to me, one minor suggestion is to add some empty string cases to sqllogictest like `select string_agg('', '|'), string_agg('a', '');`

Re: [PR] Implement StreamTable and StreamTableProvider (#7994) [arrow-datafusion]

2023-11-14 Thread via GitHub
metesynnada commented on PR #8021: URL: https://github.com/apache/arrow-datafusion/pull/8021#issuecomment-1811955691 The previous locks ensure that the FIFO is read and written in a streaming manner, simulating infinite data streams. This guarantees that each batch produced contains a spec

Re: [PR] MINOR: [Docs] Make cards on index page of docs responsive [arrow]

2023-11-14 Thread via GitHub
github-actions[bot] commented on PR #38693: URL: https://github.com/apache/arrow/pull/38693#issuecomment-1811950398 Revision: 8b4ab43b3c1d52cbca7d03b29863f93f31dbe3ec Submitted crossbow builds: [ursacomputing/crossbow @ actions-50427bf095](https://github.com/ursacomputing/crossbow/bra

Re: [PR] Minor: add `apply_filter` to Precision [arrow-datafusion]

2023-11-14 Thread via GitHub
berkaysynnada commented on code in PR #8177: URL: https://github.com/apache/arrow-datafusion/pull/8177#discussion_r1393772792 ## datafusion/common/src/stats.rs: ## @@ -151,6 +151,13 @@ impl Precision { (_, _) => Precision::Absent, } } + +/// Return

Re: [PR] MINOR: [Docs] Make cards on index page of docs responsive [arrow]

2023-11-14 Thread via GitHub
AlenkaF commented on PR #38693: URL: https://github.com/apache/arrow/pull/38693#issuecomment-1811947973 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [CI][Docs] Docs preview jobs failing [arrow]

2023-11-14 Thread via GitHub
AlenkaF commented on issue #38711: URL: https://github.com/apache/arrow/issues/38711#issuecomment-1811947371 Thank you @llama90 for the help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat(csharp): separate arrowType and providerType [arrow-adbc]

2023-11-14 Thread via GitHub
ruowan commented on PR #1183: URL: https://github.com/apache/arrow-adbc/pull/1183#issuecomment-1811940161 Closed this stale PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat(csharp): separate arrowType and providerType [arrow-adbc]

2023-11-14 Thread via GitHub
ruowan closed pull request #1183: feat(csharp): separate arrowType and providerType URL: https://github.com/apache/arrow-adbc/pull/1183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on PR #37868: URL: https://github.com/apache/arrow/pull/37868#issuecomment-1811939882 Would you mind fix the lint here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] [Parquet][Python] Variable download speed from threads when reading S3 [arrow]

2023-11-14 Thread via GitHub
eeroel commented on issue #38664: URL: https://github.com/apache/arrow/issues/38664#issuecomment-1811938124 > I mean that maybe some flags in https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html helps, but I didn't test them. S3FS in arrow fs just wraps the S3 SDK, ma

Re: [PR] Minor: simplify DataSource statistics code [arrow-datafusion]

2023-11-14 Thread via GitHub
berkaysynnada commented on code in PR #8172: URL: https://github.com/apache/arrow-datafusion/pull/8172#discussion_r1393733385 ## datafusion/core/src/datasource/statistics.rs: ## @@ -211,49 +199,3 @@ pub(crate) fn get_col_stats( }) .collect() } - -/// If the gi

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811921674 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e49d8ae15583ceff03237571569099a6ad62be32. There were no

Re: [I] [Parquet][Python] Variable download speed from threads when reading S3 [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38664: URL: https://github.com/apache/arrow/issues/38664#issuecomment-1811916635 I mean that maybe some flags in https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html helps, but I didn't test them. S3FS in arrow fs just wraps the S3 SDK, may

Re: [I] [Parquet][Python] Variable download speed from threads when reading S3 [arrow]

2023-11-14 Thread via GitHub
eeroel commented on issue #38664: URL: https://github.com/apache/arrow/issues/38664#issuecomment-1811904585 > [#38664 (comment)](https://github.com/apache/arrow/issues/38664#issuecomment-1806837877) > > Some AWS EC env is able to control using `export ...`, I think maybe try that wou

Re: [PR] GH-38697: [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva [arrow]

2023-11-14 Thread via GitHub
niyue commented on PR #38698: URL: https://github.com/apache/arrow/pull/38698#issuecomment-1811887213 > Perhaps we should use PlatformFilename::FromString and DCHECK the result instead? Thanks for the suggestion. I took the approach and it did work. And Windows build should be okay n

Re: [PR] Support no distinct aggregate sum/min/max in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-14 Thread via GitHub
haohuaijin closed pull request #8124: Support no distinct aggregate sum/min/max in `single_distinct_to_group_by` rule URL: https://github.com/apache/arrow-datafusion/pull/8124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Support no distinct aggregate sum/min/max in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-14 Thread via GitHub
haohuaijin commented on PR #8124: URL: https://github.com/apache/arrow-datafusion/pull/8124#issuecomment-1811845757 Due to #8176, I close this pr, and then reorganize the code and submit a new pr. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Preserve all of the valid orderings during merging. [arrow-datafusion]

2023-11-14 Thread via GitHub
mustafasrepo commented on code in PR #8169: URL: https://github.com/apache/arrow-datafusion/pull/8169#discussion_r1393660010 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -575,10 +588,13 @@ impl OrderingEquivalenceClass { } } -/// Gets the first order

Re: [PR] GH-38715: [R] Fix possible bashism in configure script [arrow]

2023-11-14 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38716: URL: https://github.com/apache/arrow/pull/38716#issuecomment-1811829147 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a886fdaa2d80a2e7f56cf8a3cf94b367443b6e8e. There were no

Re: [PR] Revert "Minor: remove unnecessary projection in `single_distinct_to_g… [arrow-datafusion]

2023-11-14 Thread via GitHub
haohuaijin commented on PR #8176: URL: https://github.com/apache/arrow-datafusion/pull/8176#issuecomment-1811825131 Thanks for this fix @NGA-TRAN @alamb . I apologize for introducing this bug. The issue arises from mishandling aliases in the group by clause. Consequently, any queries like

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
kou merged PR #38723: URL: https://github.com/apache/arrow/pull/38723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] fix(glib): Vala's vapi's name should be same as pkg-config package [arrow-adbc]

2023-11-14 Thread via GitHub
kou merged PR #1298: URL: https://github.com/apache/arrow-adbc/pull/1298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] GH-37841: [Java] Dictionary decoding not using the compression factory from the ArrowReader [arrow]

2023-11-14 Thread via GitHub
vibhatha commented on PR #38371: URL: https://github.com/apache/arrow/pull/38371#issuecomment-1811818551 @danepitkin Thanks a lot for the review comments, I will address them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: Timestamp with timezone not considered `join on` [arrow-datafusion]

2023-11-14 Thread via GitHub
ACking-you commented on code in PR #8150: URL: https://github.com/apache/arrow-datafusion/pull/8150#discussion_r1393641816 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -2468,6 +2489,14 @@ SELECT * FROM test_timestamps_table as t1 JOIN (SELECT * FROM test_timestamps_ta

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1811799713 Nice, let matt decide that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Add Library Guide for User Defined Functions: Window/Aggregate [arrow-datafusion]

2023-11-14 Thread via GitHub
andygrove commented on PR #8171: URL: https://github.com/apache/arrow-datafusion/pull/8171#issuecomment-1811795377 Thanks for contributing more documentation @Veeupup -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] GH-38503: [Go][Parquet] Make the arrow column writer internal [arrow]

2023-11-14 Thread via GitHub
tschaub opened a new pull request, #38727: URL: https://github.com/apache/arrow/pull/38727 This makes it so the Arrow column writer is not exported from the `pqarrow` package. This follows up on comments from #38581. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

2023-11-14 Thread via GitHub
kylebarron commented on PR #5070: URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1811784975 I added some tests. Do you want to test against old versions of pyarrow as well? We're effectively no longer testing `_export_to_c` and `_import_from_c` because the rust code will alw

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-14 Thread via GitHub
tschaub commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1811776138 Feel free to cherry pick if this is what you had in mind: https://github.com/apache/arrow/compare/main...tschaub:arrow:inernal-arrow-column-writer -- This is an automated message from t

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1811773465 @zeroshade I found there're some tests using ArrowColumnWriter in other module, like encoding, should I also remove them or how to fix them? -- This is an automated message from the Apa

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811764922 > Only the metadata describes that the column might have RLE (it doesn't have it). If the data is nullable, it will containing Def level, def level is usally encoded by RLE :-

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811764719 > Only the metadata describes that the column might have RLE (it doesn't have it). -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] GH-36133: [C#] Support Set on the `BinaryArray` Builder [arrow]

2023-11-14 Thread via GitHub
danmoseley commented on code in PR #36134: URL: https://github.com/apache/arrow/pull/36134#discussion_r1393599083 ## csharp/src/Apache.Arrow/Arrays/BinaryArray.cs: ## @@ -258,8 +259,77 @@ public TBuilder Swap(int i, int j) public TBuilder Set(int index, byte value

Re: [I] [R] CRAN packaging checklist for 14.0.0 [arrow]

2023-11-14 Thread via GitHub
thisisnic commented on issue #38141: URL: https://github.com/apache/arrow/issues/38141#issuecomment-1811757535 Cherry-picked #38716 across -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-14 Thread via GitHub
tschaub commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1811754007 > @mapleFU Do you think it would make more sense to simply change it and no longer expose the `ArrowColumnWriter` directly, and direct users to using the `FileWriter` apis instead?

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
alippai commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811749735 Thanks @mapleFU! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
alippai commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811749298 Ok, I see the data and pages look good. Only the metadata describes that the column _might_ have RLE (it doesn't have it). ``` // And for now, we always add RLE even if ther

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811743196 Aha, @alippai . Parquet-format has an encoding in the file, it will do: 1. mark `RLE` if rep-def level using rle (for nullable type ) 2. Since this not using dictionary, so dict

Re: [PR] GH-38697: [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva [arrow]

2023-11-14 Thread via GitHub
js8544 commented on PR #38698: URL: https://github.com/apache/arrow/pull/38698#issuecomment-1811738656 The AppVeyor check is still failing: https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/48524728. `std::wstring` can't be constructed directly without encoding. Perha

Re: [PR] GH-28994: [C++][JSON] Add support for customizing the max rows [arrow]

2023-11-14 Thread via GitHub
Ox0400 commented on PR #38582: URL: https://github.com/apache/arrow/pull/38582#issuecomment-1811734892 Hi @bkietz, the CI some jobs is failed, some fail job look like because losted git runners. `The hosted runner: GitHub Actions 389 lost communication with the server. Anything in your work

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
alippai commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811734609 parquet-mr looks good? ``` Row group 0: count: 1 76.00 B records start: 4 total(compressed): 76 B total(uncompressed):74 B --

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
alippai commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811730028 Making the field not nullable doesn't change the output -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
alippai commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811728162 A small example: ```python import pyarrow as pa import pyarrow.parquet as pq import duckdb print(pa.__version__) #13.0.0 print(duckdb.__version__) #0.9.2 t

Re: [PR] GH-38503: [Go][Parquet] Style improvement for using ArrowColumnWriter [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on PR #38581: URL: https://github.com/apache/arrow/pull/38581#issuecomment-1811727839 Hmmm I know you meaning but I think maybe separate or close this patch is better? Since already some other modules using `NewArrowColumnWriter`. I don't know if this is ok to just set it

Re: [I] [C++] Ensure compatibility between std::span and arrow::util::span [arrow]

2023-11-14 Thread via GitHub
felipecrv commented on issue #36612: URL: https://github.com/apache/arrow/issues/36612#issuecomment-1811718936 @ShaiviAgarwal2 is this a ChatGPT-generated answer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [C++] Ensure compatibility between std::span and arrow::util::span [arrow]

2023-11-14 Thread via GitHub
ShaiviAgarwal2 commented on issue #36612: URL: https://github.com/apache/arrow/issues/36612#issuecomment-1811714791 @pitrou To ensure compatibility between `std::span `and `arrow::util::span`, we need to modify the `arrow::util::span` class to meet the requirements of `std::span`. This

Re: [PR] GH-38699: [C++][FS][Azure] Implement `CreateDir()` [arrow]

2023-11-14 Thread via GitHub
felipecrv commented on code in PR #38708: URL: https://github.com/apache/arrow/pull/38708#discussion_r1393558114 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -611,6 +611,110 @@ class AzureFileSystem::Impl { RETURN_NOT_OK(ptr->Init()); return ptr; } + + Status Crea

Re: [PR] GH-38699: [C++][FS][Azure] Implement `CreateDir()` [arrow]

2023-11-14 Thread via GitHub
felipecrv commented on code in PR #38708: URL: https://github.com/apache/arrow/pull/38708#discussion_r1393558114 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -611,6 +611,110 @@ class AzureFileSystem::Impl { RETURN_NOT_OK(ptr->Init()); return ptr; } + + Status Crea

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-14 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1393559588 ## format/Flight.proto: ## @@ -503,3 +504,100 @@ message FlightData { message PutResult { bytes app_metadata = 1; } + +/* + * Request message for the "Close Ses

Re: [I] [Python] Implement `__arrow_c_stream__` on ChunkedArray [arrow]

2023-11-14 Thread via GitHub
paleolimbot commented on issue #38717: URL: https://github.com/apache/arrow/issues/38717#issuecomment-1811704310 Yes, I think this needs C++-level support (but I think it's worth it!). I will try to take a stab at an implementation before 15.0.0...I would like to use it in the R bindings as

Re: [PR] GH-38697: [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva [arrow]

2023-11-14 Thread via GitHub
niyue commented on PR #38698: URL: https://github.com/apache/arrow/pull/38698#issuecomment-1811701289 > It would be better to make the PR description self-contained Sure. Updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] GH-38715: [R] Fix possible bashism in configure script [arrow]

2023-11-14 Thread via GitHub
paleolimbot merged PR #38716: URL: https://github.com/apache/arrow/pull/38716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [I] [C++] Reduce number of internal APIs that define default_memory_pool() as default argument value [arrow]

2023-11-14 Thread via GitHub
ShaiviAgarwal2 commented on issue #36360: URL: https://github.com/apache/arrow/issues/36360#issuecomment-1811694640 @felipecrv Could you please check whether I'm understanding it correctly or not!! To enhance the code, we need to modify the internal functions to not define `default_m

[PR] feat: refactor udf/udaf/udwf ReturnType [arrow-datafusion]

2023-11-14 Thread via GitHub
JasonLi-cn opened a new pull request, #8183: URL: https://github.com/apache/arrow-datafusion/pull/8183 ## Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/8182 ## Rationale for this change In some cases, I need to de

Re: [PR] GH-38699: [C++][FS][Azure] Implement `CreateDir()` [arrow]

2023-11-14 Thread via GitHub
felipecrv commented on code in PR #38708: URL: https://github.com/apache/arrow/pull/38708#discussion_r1393530485 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -611,6 +611,110 @@ class AzureFileSystem::Impl { RETURN_NOT_OK(ptr->Init()); return ptr; } + + Status Crea

Re: [I] UDF/UDAF/UDWF: refactor ReturnType [arrow-datafusion]

2023-11-14 Thread via GitHub
JasonLi-cn commented on issue #8182: URL: https://github.com/apache/arrow-datafusion/issues/8182#issuecomment-1811667774 https://github.com/apache/arrow-datafusion/discussions/7657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[I] UDF/UDAF/UDWF: refactor ReturnType [arrow-datafusion]

2023-11-14 Thread via GitHub
JasonLi-cn opened a new issue, #8182: URL: https://github.com/apache/arrow-datafusion/issues/8182 ### Def one UDF: ```rust my_udf(expr, 'return_type') ``` ### Using: case1: ```rust my_duf(col, 'UInt32') Return Type is: DataType::UInt32 ```

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
github-actions[bot] commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811659842 Revision: 82f7be610db73c7bb8522697e528c16bc0f03304 Submitted crossbow builds: [ursacomputing/crossbow @ actions-a74317da1d](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
kou commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811658871 Sure! (I think that you can also submit a job by `@github-actions crossbow submit preview-docs` because your commits exist in apache/arrow.) -- This is an automated message from the Apac

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
kou commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811658098 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] test: Added more tests for verifying GetObjects (Depths and Patterns) [arrow-adbc]

2023-11-14 Thread via GitHub
ryan-syed opened a new pull request, #1299: URL: https://github.com/apache/arrow-adbc/pull/1299 Added tests to verify GetObjects for different depths like: Catalogs DbSchemas Tables All Also added tests to verify GetObjects with catalog, schema, and table names passed as patt

Re: [PR] Replace macro with function for `array_position` and `array_positions` [arrow-datafusion]

2023-11-14 Thread via GitHub
jayzhan211 commented on code in PR #8170: URL: https://github.com/apache/arrow-datafusion/pull/8170#discussion_r1393513521 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -953,114 +1025,68 @@ fn general_list_repeat( )?)) } -macro_rules! position { -($ARRA

Re: [PR] Replace macro with function for `array_position` and `array_positions` [arrow-datafusion]

2023-11-14 Thread via GitHub
jayzhan211 commented on code in PR #8170: URL: https://github.com/apache/arrow-datafusion/pull/8170#discussion_r1393513521 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -953,114 +1025,68 @@ fn general_list_repeat( )?)) } -macro_rules! position { -($ARRA

Re: [PR] Replace macro with function for `array_position` and `array_positions` [arrow-datafusion]

2023-11-14 Thread via GitHub
Veeupup commented on code in PR #8170: URL: https://github.com/apache/arrow-datafusion/pull/8170#discussion_r1393510396 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -953,114 +1025,68 @@ fn general_list_repeat( )?)) } -macro_rules! position { -($ARRAY:e

Re: [PR] Replace macro with function for `array_position` and `array_positions` [arrow-datafusion]

2023-11-14 Thread via GitHub
Veeupup commented on code in PR #8170: URL: https://github.com/apache/arrow-datafusion/pull/8170#discussion_r1393510396 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -953,114 +1025,68 @@ fn general_list_repeat( )?)) } -macro_rules! position { -($ARRAY:e

Re: [I] [Parquet][Python] Variable download speed from threads when reading S3 [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38664: URL: https://github.com/apache/arrow/issues/38664#issuecomment-1811644845 https://github.com/apache/arrow/issues/38664#issuecomment-1806837877 Some AWS EC env is able to control using `export ...`, I think maybe try that would helps -- This is an

[I] Support empty array for `array_union`, `array_intersect`, and `array_except` [arrow-datafusion]

2023-11-14 Thread via GitHub
jayzhan211 opened a new issue, #8181: URL: https://github.com/apache/arrow-datafusion/issues/8181 ### Is your feature request related to a problem or challenge? We cant deal with empty array for these three array function # Result ```text query ? select array_union([],

Re: [I] [C++] Parquet RecordReader Skip performance is bad [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38707: URL: https://github.com/apache/arrow/issues/38707#issuecomment-1811638762 Would you mind try to set these record "cached" like `valid_bits_` and `Resize` without thrink to fit? I think this might help in some cases. -- This is an automated message

Re: [I] Is there a way to turn off RLE encoding for float columns in parquet (using `write_table()`)? [arrow]

2023-11-14 Thread via GitHub
mapleFU commented on issue #38722: URL: https://github.com/apache/arrow/issues/38722#issuecomment-1811634713 Can you help how to reproduce this? And how RLE is represented? `RLE` might occurs when: 1. Page has null, so we have RLE for rep/def levels 2. `RLE_DICTIONARY` for dict

Re: [PR] Implement func `array_pop_front` [arrow-datafusion]

2023-11-14 Thread via GitHub
Veeupup commented on code in PR #8142: URL: https://github.com/apache/arrow-datafusion/pull/8142#discussion_r1393495905 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -563,13 +563,42 @@ pub fn array_slice(args: &[ArrayRef]) -> Result { define_array_slice(list_

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
llama90 commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811593809 @kou Would it be possible for you to assist in testing this part? I understood `aws-sam` to be related to SAM (Serverless Application Model) and therefore did not restore it. -- This i

Re: [PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
github-actions[bot] commented on PR #38723: URL: https://github.com/apache/arrow/pull/38723#issuecomment-1811590673 :warning: GitHub issue #38711 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38711: [CI] Rollback aws-cli for preview documentation [arrow]

2023-11-14 Thread via GitHub
llama90 opened a new pull request, #38723: URL: https://github.com/apache/arrow/pull/38723 ### Rationale for this change Restored the Runner image to not delete `aws-cli` for the execution of the `preview-docs` command. ### Are these changes tested?

Re: [PR] Draft: Extend `Expr::ScalarFunction` to support `Expr` for `ScalarUDF` [arrow-datafusion]

2023-11-14 Thread via GitHub
2010YOUY01 commented on code in PR #8180: URL: https://github.com/apache/arrow-datafusion/pull/8180#discussion_r1393450192 ## datafusion/expr/src/expr_schema.rs: ## @@ -89,25 +89,39 @@ impl ExprSchemable for Expr { .collect::>>()?; Ok((fun.r

[PR] Draft: Extend `Expr::ScalarFunction` to support `Expr` for `ScalarUDF` [arrow-datafusion]

2023-11-14 Thread via GitHub
2010YOUY01 opened a new pull request, #8180: URL: https://github.com/apache/arrow-datafusion/pull/8180 ## Which issue does this PR close? POC for https://github.com/apache/arrow-datafusion/issues/8157 ## Rationale for this change Based on discussion in #8157 ,

Re: [I] [CI][Docs] Docs preview jobs failing [arrow]

2023-11-14 Thread via GitHub
kou commented on issue #38711: URL: https://github.com/apache/arrow/issues/38711#issuecomment-1811563655 Oh, sorry. I should have not removed `aws-cli` related files in GH-38233. Could you remove `aws-cli` related `rm -rf` from `ci/scripts/util_free_space.sh`? -- This is an automate

Re: [PR] Update red-arrow.gemspec to add xsimd msys2 dependency [arrow]

2023-11-14 Thread via GitHub
kou commented on PR #38720: URL: https://github.com/apache/arrow/pull/38720#issuecomment-1811558810 Why do we need this? Could you show a full error log you got? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] GH-36760: [Go] Add Avro OCF reader [arrow]

2023-11-14 Thread via GitHub
loicalleyne commented on PR #37115: URL: https://github.com/apache/arrow/pull/37115#issuecomment-1811544686 Now the builds are failing because of the bug in SchemaEqual(). Can I remove that test until we figure out what's wrong with that function? The test that compares the schema string re

[PR] MINOR: [C++][Docs] Add \deprecated tag to deprecated BufferReader constructors [arrow]

2023-11-14 Thread via GitHub
amoeba opened a new pull request, #38721: URL: https://github.com/apache/arrow/pull/38721 ### Rationale for this change The PR [GH-37360](https://github.com/apache/arrow/pull/37360) for issue [GH-37212](https://github.com/apache/arrow/issues/37212) deprecated three BufferReader const

Re: [PR] GH-37739: [Java] Add experimental arrow-memory-ffm module [arrow]

2023-11-14 Thread via GitHub
danepitkin commented on code in PR #38016: URL: https://github.com/apache/arrow/pull/38016#discussion_r1393425015 ## java/memory/memory-core/src/test/java/org/apache/arrow/memory/JavaForeignAllocationManager.java: ## @@ -0,0 +1,77 @@ +/* Review Comment: This file was added a

Re: [PR] Preserve all of the valid orderings during merging. [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on code in PR #8169: URL: https://github.com/apache/arrow-datafusion/pull/8169#discussion_r1393420890 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -575,10 +588,13 @@ impl OrderingEquivalenceClass { } } -/// Gets the first ordering ent

Re: [PR] Preserve all of the valid orderings during merging. [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on PR #8169: URL: https://github.com/apache/arrow-datafusion/pull/8169#issuecomment-1811540241 Thank you @mustafasrepo -- this is very neat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] GH-37739: [Java] Add experimental arrow-memory-ffm module [arrow]

2023-11-14 Thread via GitHub
danepitkin commented on code in PR #38016: URL: https://github.com/apache/arrow/pull/38016#discussion_r1393425015 ## java/memory/memory-core/src/test/java/org/apache/arrow/memory/JavaForeignAllocationManager.java: ## @@ -0,0 +1,77 @@ +/* Review Comment: This was added as a h

Re: [PR] GH-37739: [Java] Add experimental arrow-memory-ffm module [arrow]

2023-11-14 Thread via GitHub
danepitkin commented on PR #38016: URL: https://github.com/apache/arrow/pull/38016#issuecomment-1811537255 I'm going to put this on hold for now. The next steps are to refactor out the usage of `UNSAFE` since it's not accessible in Java 16+. This is a huge task, but would allow us to actual

Re: [I] Improve GADBC.Statement.bind/bind_stream for Vala API [arrow-adbc]

2023-11-14 Thread via GitHub
esodan commented on issue #1280: URL: https://github.com/apache/arrow-adbc/issues/1280#issuecomment-1811531519 So as for other languages, having its own directory, I can add a folder to create a pure GObject/Vala bindings library, with interfaces and classes to make easy to use ADBC with GO

Re: [PR] GH-38599: [Docs] Update Headers [arrow]

2023-11-14 Thread via GitHub
llama90 commented on code in PR #38696: URL: https://github.com/apache/arrow/pull/38696#discussion_r1393417799 ## docs/source/format/CDeviceDataInterface.rst: ## @@ -627,38 +626,38 @@ streaming source of Arrow arrays. It has the following fields: handled by the producer, a

Re: [PR] feat: impl the basic `string_agg` function [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on PR #8148: URL: https://github.com/apache/arrow-datafusion/pull/8148#issuecomment-1811529857 cc @universalmind303 -- do you have time to help review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Replace macro with function for `array_position` and `array_positions` [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on code in PR #8170: URL: https://github.com/apache/arrow-datafusion/pull/8170#discussion_r1393410774 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -131,6 +131,78 @@ macro_rules! array { }}; } +/// Computes a BooleanArray indicating equality

Re: [PR] GH-38599: [Docs] Update Headers [arrow]

2023-11-14 Thread via GitHub
llama90 commented on code in PR #38696: URL: https://github.com/apache/arrow/pull/38696#discussion_r1393416177 ## docs/source/format/CDeviceDataInterface.rst: ## @@ -627,38 +626,38 @@ streaming source of Arrow arrays. It has the following fields: handled by the producer, a

[PR] fix(glib): Vala's vapí's name should be same as pkg-config package [arrow-adbc]

2023-11-14 Thread via GitHub
esodan opened a new pull request, #1298: URL: https://github.com/apache/arrow-adbc/pull/1298 In sync with arrow-glib, GADBC should name its Vala's VAPI file's name to adbc-glib and is the same as the pkg-config adbc-glib.pc file's name -- This is an automated message from the Apache Git S

Re: [PR] Support no distinct aggregate sum/min/max in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on PR #8124: URL: https://github.com/apache/arrow-datafusion/pull/8124#issuecomment-1811520131 I am sorry -- I did not have a chance to get to this one. Note that this PR seems to have non trivial conflicts now (perhaps due to https://github.com/apache/arrow-datafusion/pul

Re: [PR] test: show stats in explain of two representative queries [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on code in PR #8173: URL: https://github.com/apache/arrow-datafusion/pull/8173#discussion_r1393407460 ## datafusion/sqllogictest/test_files/explain.slt: ## @@ -279,13 +279,97 @@ physical_plan GlobalLimitExec: skip=0, fetch=10, statistics=[Rows=Inexact(10), Bytes

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#issuecomment-1811515561 There appears to be a non trivial number of conflicts in this PR now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] GH-38718: [Go][Format][Integration] Add StringView/BinaryView to Go implementation [arrow]

2023-11-14 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #35769: URL: https://github.com/apache/arrow/pull/35769#issuecomment-1811509118 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 26149d9fab0360e6d4d9a295f934100470c4bc37. There were no

Re: [PR] GH-37199: [C++] Expose a span converter for Buffer and ArraySpan [arrow]

2023-11-14 Thread via GitHub
felipecrv commented on code in PR #38027: URL: https://github.com/apache/arrow/pull/38027#discussion_r1393396653 ## cpp/src/arrow/array/data.h: ## @@ -434,6 +434,21 @@ struct ARROW_EXPORT ArraySpan { return GetValues(i, this->offset); } + // Access a buffer's data as

Re: [I] adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory [arrow-adbc]

2023-11-14 Thread via GitHub
zeroshade commented on issue #1283: URL: https://github.com/apache/arrow-adbc/issues/1283#issuecomment-1811500175 I'll try to use the updated repo scripts and see if i can reproduce it myself and figure out what I'm missing -- This is an automated message from the Apache Git Service. To r

Re: [I] adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory [arrow-adbc]

2023-11-14 Thread via GitHub
bascheibler commented on issue #1283: URL: https://github.com/apache/arrow-adbc/issues/1283#issuecomment-1811488775 I believe this branch hasn't totally fixed the issue yet. It is still returning the `panic: close of nil channel` error. My repo has been fixed and now it's able to reproduc

Re: [PR] MINOR: [Python] Fix name of new keyword in the concat_tables future warning [arrow]

2023-11-14 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38710: URL: https://github.com/apache/arrow/pull/38710#issuecomment-1811488068 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit cd0d7f53b3ab7dfac7a3477751a87586d4da3782. There were no

Re: [PR] GH-36760: [Go] Add Avro OCF reader [arrow]

2023-11-14 Thread via GitHub
loicalleyne commented on PR #37115: URL: https://github.com/apache/arrow/pull/37115#issuecomment-1811450437 Static check was failing because error variable name did not conform to ErrFoo. Renamed NullStructData to ErrNullStructData. -- This is an automated message from the Apache Git S

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Nov 13, 2023 [arrow-datafusion]

2023-11-14 Thread via GitHub
alamb commented on issue #8151: URL: https://github.com/apache/arrow-datafusion/issues/8151#issuecomment-1811447614 Wrote up https://github.com/apache/arrow-datafusion/discussions/8152# as part of communty growth -- This is an automated message from the Apache Git Service. To respond to

  1   2   3   4   >