Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
jorisvandenbossche commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1746244230 @IkeNefcy the file you uploaded (https://github.com/apache/arrow/issues/38000#issuecomment-1745759195) is created with pyarrow 12.0, and is the file that works fine, is th

Re: [I] Implement Value(Null) [arrow-datafusion]

2023-10-03 Thread via GitHub
qrilka commented on issue #130: URL: https://github.com/apache/arrow-datafusion/issues/130#issuecomment-1746239806 @alamb wasn't it implemented already? E.g. in datafusion-cli I see: ``` ❯ select char_length(null),a from test; ++---+ | character_length(NUL

Re: [PR] GH-37996: [MATLAB] Add a static constructor method named `fromMATLAB` to `arrow.array.StructArray` [arrow]

2023-10-03 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37998: URL: https://github.com/apache/arrow/pull/37998#issuecomment-1746227811 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 081e354e6fc9ded8f7d3d19d8a785cff31fa5cfb. There were no

Re: [I] Remove input_schema from AggregateExec [arrow-datafusion]

2023-10-03 Thread via GitHub
viirya closed issue #7728: Remove input_schema from AggregateExec URL: https://github.com/apache/arrow-datafusion/issues/7728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-23221: [C++] Add support for building with Emscripten [arrow]

2023-10-03 Thread via GitHub
joemarshall commented on PR #37821: URL: https://github.com/apache/arrow/pull/37821#issuecomment-1746161481 I've done those various review fixes, and fixed the debug build now (here at least). Should build fine with emsdk 3.1.45. You're right you need `emcmake cmake` by the way, docs

Re: [I] [R] Read CSV with comma as decimal mark [arrow]

2023-10-03 Thread via GitHub
thisisnic commented on issue #29184: URL: https://github.com/apache/arrow/issues/29184#issuecomment-1746150275 @paleolimbot Your instructions here are 🔥 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-37002: [C++][Parquet] Add api to get RecordReader from RowGroupReader [arrow]

2023-10-03 Thread via GitHub
mapleFU commented on code in PR #37003: URL: https://github.com/apache/arrow/pull/37003#discussion_r1345165118 ## cpp/src/parquet/file_reader.h: ## @@ -24,6 +24,7 @@ #include "arrow/io/caching.h" #include "arrow/util/type_fwd.h" +#include "parquet/column_reader.h" Review Co

Re: [PR] GH-38005: [Java] disable the debug log when running Java tests [arrow]

2023-10-03 Thread via GitHub
github-actions[bot] commented on PR #38006: URL: https://github.com/apache/arrow/pull/38006#issuecomment-1746097952 :warning: GitHub issue #38005 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38005: [Java] disable the debug log when running Java tests [arrow]

2023-10-03 Thread via GitHub
davisusanibar opened a new pull request, #38006: URL: https://github.com/apache/arrow/pull/38006 ### Rationale for this change To disable the debug log when running Java tests ### What changes are included in this PR? Java Testing resource that configure SLF4J API and

Re: [PR] GH-37984: [Release] Use ISO 8601 format for YAML date value [arrow]

2023-10-03 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37985: URL: https://github.com/apache/arrow/pull/37985#issuecomment-1746088197 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 45588a77af3548720dc64fc8ae917e469e6f106a. There were no

Re: [I] js: when creating a Table from Vectors, arrow always infers non-nullable Fields [arrow]

2023-10-03 Thread via GitHub
domoritz commented on issue #37983: URL: https://github.com/apache/arrow/issues/37983#issuecomment-1746081583 Hmm, the vector is nullable so it seems like we should pass that to the table constructor. I don't think it's good to always make vectors nullable if we can avoid it. -- This is

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-03 Thread via GitHub
ongchi commented on PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#issuecomment-1746056637 Folder links does not work in the document. Change all file links to GitHub, which would be easier to browse in this commit 046e020a1b7eab4c84aee6da0716efc92694e16d -- This is

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-03 Thread via GitHub
ongchi commented on PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#issuecomment-1746050317 Methods added to `Expr` in this PR: - Methods already listed in documentation but not implemented: - bitwise_and - bitwise_or - bitwise_xor - bit

Re: [PR] Add operator section to user guide [arrow-datafusion]

2023-10-03 Thread via GitHub
ongchi commented on PR #7732: URL: https://github.com/apache/arrow-datafusion/pull/7732#issuecomment-1746036365 Several functions added in this PR: - Functions already listed in documentation but not implemented: - not - eq - not_eq - gt - gt_eq

Re: [PR] GH-37735: [C++][FreeBSD] Suppress a shorten-64-to-32 warning [arrow]

2023-10-03 Thread via GitHub
github-actions[bot] commented on PR #38004: URL: https://github.com/apache/arrow/pull/38004#issuecomment-1746028795 :warning: GitHub issue #37735 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-37735: [C++][FreeBSD] Suppress a shorten-64-to-32 warning [arrow]

2023-10-03 Thread via GitHub
kou opened a new pull request, #38004: URL: https://github.com/apache/arrow/pull/38004 ### Rationale for this change It's caused by `backtrace()` and `backtrace_symbols_fd()` signatures are different on Linux and FreeBSD (`int` vs `size_t`). Linux: ```c extern int bac

Re: [PR] GH-37767: [C++][CMake] Don't touch .git/index [arrow]

2023-10-03 Thread via GitHub
github-actions[bot] commented on PR #38003: URL: https://github.com/apache/arrow/pull/38003#issuecomment-1746001397 :warning: GitHub issue #37767 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-37767: [C++][CMake] Don't touch .git/index [arrow]

2023-10-03 Thread via GitHub
kou opened a new pull request, #38003: URL: https://github.com/apache/arrow/pull/38003 ### Rationale for this change We run "git describe --tag --dirty" implicitly in cpp/cmake_modules/DefineOptions.cmake. If we use "--dirty", .git/index's owner may be changed. Because "git describ

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745972658 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5514b223ebc1345c7bbd501fe3cc57dcac668e86. There were no

Re: [PR] GH-33618: [C++] add option needs_extended_file_info and implement localFS [arrow]

2023-10-03 Thread via GitHub
amoeba commented on PR #34170: URL: https://github.com/apache/arrow/pull/34170#issuecomment-1745967429 I'm making good progress finishing off the remainder of this over on [my fork](https://github.com/amoeba/arrow/tree/ARROW-33618/FileSelector) and should have this ready for review within o

Re: [I] [C++] examples for self-defined compute function [arrow]

2023-10-03 Thread via GitHub
kou commented on issue #37924: URL: https://github.com/apache/arrow/issues/37924#issuecomment-1745962195 Sorry. We don't have a document that describes how to implement compute functions... If you're interesting in Gandiva not compute functions https://arrow.apache.org/docs/cpp/compute.h

Re: [PR] GH-37994: [R] Create wrapper functions for the CSV*Options classes [arrow]

2023-10-03 Thread via GitHub
thisisnic commented on PR #37995: URL: https://github.com/apache/arrow/pull/37995#issuecomment-1745958662 We should rebate this after #38001 merges and incorporate changes from that PR too -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] MINOR: [Python][Docs] Fix two typos in data.rst [arrow]

2023-10-03 Thread via GitHub
kou merged PR #37997: URL: https://github.com/apache/arrow/pull/37997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-37978: [C++] Add support for specifying custom Array element delimiter to `arrow::PrettyPrintOptions` [arrow]

2023-10-03 Thread via GitHub
kou commented on code in PR #37981: URL: https://github.com/apache/arrow/pull/37981#discussion_r1345016394 ## cpp/src/arrow/pretty_print.h: ## @@ -77,6 +77,9 @@ struct PrettyPrintOptions { /// If true, display schema metadata when pretty-printing a Schema bool show_schem

Re: [PR] GH-23221: [Python] python changes for pyodide build [arrow]

2023-10-03 Thread via GitHub
kou commented on code in PR #37822: URL: https://github.com/apache/arrow/pull/37822#discussion_r1344887227 ## python/setup.py: ## @@ -133,8 +143,68 @@ def run(self): 'bundle the Arrow C++ headers')] + _build_ext.user_options) +de

Re: [PR] GH-23221: [Python] python changes for pyodide build [arrow]

2023-10-03 Thread via GitHub
kou commented on code in PR #37822: URL: https://github.com/apache/arrow/pull/37822#discussion_r1344881620 ## python/CMakeLists.txt: ## @@ -68,6 +68,25 @@ if(POLICY CMP0095) cmake_policy(SET CMP0095 NEW) endif() +# this option is used to auto-set defaults for pyarrow build

Re: [PR] GH-37429: [C++] Add arrow::ipc::StreamDecoder::Reset() [arrow]

2023-10-03 Thread via GitHub
kou commented on code in PR #37970: URL: https://github.com/apache/arrow/pull/37970#discussion_r1344878886 ## cpp/src/arrow/ipc/reader.h: ## @@ -425,6 +425,37 @@ class ARROW_EXPORT StreamDecoder { /// \return Status Status Consume(std::shared_ptr buffer); + /// \brief R

Re: [PR] GH-29184: [R] Read CSV with comma as decimal mark [arrow]

2023-10-03 Thread via GitHub
thisisnic commented on PR #38002: URL: https://github.com/apache/arrow/pull/38002#issuecomment-1745891755 Also closes #38001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-29184: [R] Read CSV with comma as decimal mark [arrow]

2023-10-03 Thread via GitHub
github-actions[bot] commented on PR #38002: URL: https://github.com/apache/arrow/pull/38002#issuecomment-1745890443 :warning: GitHub issue #29184 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-29184: [R] Read CSV with comma as decimal mark [arrow]

2023-10-03 Thread via GitHub
thisisnic opened a new pull request, #38002: URL: https://github.com/apache/arrow/pull/38002 ### Rationale for this change Allow customisable decimal points when reading data ### What changes are included in this PR? Expose the C++ option in R ### Are these changes

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745880514 nvm I was able to print something similar, but using the kwarg above for version it did not change the format used. ``` created_by: parquet-cpp-arrow version 13.0.0

Re: [PR] GH-37002: [C++][Parquet] Add api to get RecordReader from RowGroupReader [arrow]

2023-10-03 Thread via GitHub
fatemehp commented on code in PR #37003: URL: https://github.com/apache/arrow/pull/37003#discussion_r1344868650 ## cpp/src/parquet/file_reader.h: ## @@ -58,6 +59,11 @@ class PARQUET_EXPORT RowGroupReader { // column. Ownership is shared with the RowGroupReader. std::shared

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745877518 Sorry for delay. I was able to test ``` data.to_parquet(index=False, version='2.4') ``` With no error. @mapleFU can I ask how you were able to view that metadata includi

Re: [I] python: Postgres.execute commits even with a rollback [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on issue #1158: URL: https://github.com/apache/arrow-adbc/issues/1158#issuecomment-1745837582 Ooh, that's bad. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] Add settings description in `SHOW ALL` output [arrow-datafusion]

2023-10-03 Thread via GitHub
comphead opened a new issue, #7736: URL: https://github.com/apache/arrow-datafusion/issues/7736 ### Is your feature request related to a problem or challenge? I found its required lots of time to get both settings description, current value and name. The code contains descripti

Re: [PR] Expand SHOW ALL stmt to show settings description [arrow-datafusion]

2023-10-03 Thread via GitHub
comphead commented on code in PR #7735: URL: https://github.com/apache/arrow-datafusion/pull/7735#discussion_r1344822902 ## datafusion/sqllogictest/test_files/information_schema.slt: ## @@ -136,91 +136,91 @@ statement ok SET datafusion.execution.parquet.created_by=datafusion

[PR] Expand SHOW ALL stmt to show settings description [arrow-datafusion]

2023-10-03 Thread via GitHub
comphead opened a new pull request, #7735: URL: https://github.com/apache/arrow-datafusion/pull/7735 ## Which issue does this PR close? Closes #. ## Rationale for this change Expand SHOW ALL stmt to show settings description. Currently it requires going through the c

Re: [PR] GH-37635: [Format][C++][Go] Add app_metadata to FlightInfo and FlightEndpoint [arrow]

2023-10-03 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37679: URL: https://github.com/apache/arrow/pull/37679#issuecomment-1745814464 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 92de9a3028fdfb76e0ca1e216845e09bf68ab4ac. There were no

Re: [PR] GH-37861: [C#] Fix StringArray.GetString returning null instead of empty [arrow]

2023-10-03 Thread via GitHub
spanglerco commented on code in PR #37862: URL: https://github.com/apache/arrow/pull/37862#discussion_r1344797320 ## csharp/test/Apache.Arrow.Tests/StringArrayTests.cs: ## @@ -0,0 +1,50 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor li

Re: [PR] GH-37002: [C++][Parquet] Add api to get RecordReader from RowGroupReader [arrow]

2023-10-03 Thread via GitHub
fatemehp commented on code in PR #37003: URL: https://github.com/apache/arrow/pull/37003#discussion_r1344786897 ## cpp/src/parquet/file_reader.h: ## @@ -24,6 +24,7 @@ #include "arrow/io/caching.h" #include "arrow/util/type_fwd.h" +#include "parquet/column_reader.h" Review C

Re: [PR] GH-32439: [Python] Fix off by one bug when chunking nested structs [arrow]

2023-10-03 Thread via GitHub
mikelui commented on PR #37376: URL: https://github.com/apache/arrow/pull/37376#issuecomment-1745779283 @AlenkaF @wjones127 There are some backlog PRs is there any chance of getting this reviewed? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745774963 I'm not seeing any options for versions here other than kwargs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html. But I will upgrade back to 13.0.0 and t

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745772284 And this works no issues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745771975 I will point out that by chance we found a fix last week when we didn't think this was a real issue, we thought it was related to the data we were uploading at the time. We woul

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
mapleFU commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745770404 ``` { "Version": "2.6", "CreatedBy": "parquet-cpp-arrow version 12.0.0", "TotalRows": "1", "NumberOfRowGroups": "1", "NumberOfRealColumns": "4", "Numbe

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745759195 This is how I am writing this sample ``` import pandas as pd import boto3 def main(): s3 = boto3.resource('s3') schedule = [] row = {

[I] Deployment on AWS [arrow-ballista]

2023-10-03 Thread via GitHub
ehenry2 opened a new issue, #886: URL: https://github.com/apache/arrow-ballista/issues/886 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to deploy Ballista on AWS using ECS and AWS Batch. Although many organizations

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745751806 I'll grab a sample in a moment yeah. For error message there is no error, from upload to query there is no push back from any systems, it allows the parquet to be read during the co

Re: [I] conda-forge missing Windows Packages [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on issue #1149: URL: https://github.com/apache/arrow-adbc/issues/1149#issuecomment-1745749449 Ah, so we need to trick CMake into using gcc for the go build and cl for everything else... -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
mapleFU commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745747689 Would you mind upload a sample Parquet file or code, and report the error message here 🤔 By the way, I strongly suspect that it's because https://github.com/apache/arrow/pull/

Re: [I] pd.to_parquet() Writes Timestamps In a method unreadable by Redshift Spectrum [arrow]

2023-10-03 Thread via GitHub
IkeNefcy commented on issue #38000: URL: https://github.com/apache/arrow/issues/38000#issuecomment-1745742229 Adding this is using Python 3.10.9, library is 13.0.0, this does not happen with any version from 10.0.1 to 12.0.1 -- This is an automated message from the Apache Git Service. To

Re: [PR] preserve array type / timezone in `date_bin` and `date_trunc` functions [arrow-datafusion]

2023-10-03 Thread via GitHub
alamb commented on code in PR #7729: URL: https://github.com/apache/arrow-datafusion/pull/7729#discussion_r1344732638 ## datafusion/physical-expr/src/datetime_expressions.rs: ## @@ -1051,6 +1077,91 @@ mod tests { }); } +#[test] +fn test_date_trunc_timezon

Re: [PR] chore(ci): bump Go for Windows to 1.20.8 [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on PR #1157: URL: https://github.com/apache/arrow-adbc/pull/1157#issuecomment-1745712545 Hmm, supposedly it would resolve `__imp___iob_func` not being found but I still see it... -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] chore(ci): bump Go for Windows to 1.20.8 [arrow-adbc]

2023-10-03 Thread via GitHub
zeroshade commented on PR #1157: URL: https://github.com/apache/arrow-adbc/pull/1157#issuecomment-1745711253 Was there something specific in go 1.20 that fixed a windows cgo issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] preserve array type in date_bin and date_trunc functions [arrow-datafusion]

2023-10-03 Thread via GitHub
alamb commented on PR #7729: URL: https://github.com/apache/arrow-datafusion/pull/7729#issuecomment-1745709891 I am sorry @mhilton -- I ran out of time today to review this PR. I plan to do so tomorrow -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] MINOR: change file to column index in page_filter trace log [arrow-datafusion]

2023-10-03 Thread via GitHub
alamb merged PR #7730: URL: https://github.com/apache/arrow-datafusion/pull/7730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-37721: [C++] Dependency: bump aws-sdk to 1.11.68 and avoid memory leak [arrow]

2023-10-03 Thread via GitHub
kou commented on PR #37736: URL: https://github.com/apache/arrow/pull/37736#issuecomment-1745687756 No problem. :-) Thanks for working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-37996: [MATLAB] Add a static constructor method named `fromMATLAB` to `arrow.array.StructArray` [arrow]

2023-10-03 Thread via GitHub
kevingurney merged PR #37998: URL: https://github.com/apache/arrow/pull/37998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-37996: [MATLAB] Add a static constructor method named `fromMATLAB` to `arrow.array.StructArray` [arrow]

2023-10-03 Thread via GitHub
kevingurney commented on PR #37998: URL: https://github.com/apache/arrow/pull/37998#issuecomment-1745684734 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] GH-37984: [Release] Use ISO 8601 format for YAML date value [arrow]

2023-10-03 Thread via GitHub
kou merged PR #37985: URL: https://github.com/apache/arrow/pull/37985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-37984: [Release] Use ISO 8601 format for YAML date value [arrow]

2023-10-03 Thread via GitHub
kou commented on PR #37985: URL: https://github.com/apache/arrow/pull/37985#issuecomment-1745679715 Thanks for double checking! I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] python/snowflake: cursor not able to parse timestamp columns correctly [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on issue #1154: URL: https://github.com/apache/arrow-adbc/issues/1154#issuecomment-1745673662 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix(go/adbc/driver/snowflake): proper timezone for timestamp_ltz [arrow-adbc]

2023-10-03 Thread via GitHub
zeroshade merged PR #1155: URL: https://github.com/apache/arrow-adbc/pull/1155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.

Re: [I] [CI][Dev][Archery] ARM self-hosted runners fail to set up archery due to missing Python.h when installing ruamel.yaml [arrow]

2023-10-03 Thread via GitHub
assignUser commented on issue #37999: URL: https://github.com/apache/arrow/issues/37999#issuecomment-1745652051 I think it's missing libpython3.12-dev? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
mapleFU commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745630985 > There are 2 DELTA_BINARY_PACKED streams in DELTA_BYTE_ARRAY, so if the deltas are small and varying +/-, this could still be a big benefit. In fact, I discovered this problem while impl

Re: [PR] chore(go/adbc): update go.mod dependencies [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm merged PR #1156: URL: https://github.com/apache/arrow-adbc/pull/1156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] chore(go/adbc): update go.mod dependencies [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on PR #1156: URL: https://github.com/apache/arrow-adbc/pull/1156#issuecomment-1745622080 Filed https://github.com/apache/arrow-adbc/pull/1157 to bump the Go version in the other pipeline, too -- This is an automated message from the Apache Git Service. To respond to the

[PR] chore: bump Go for Windows to 1.20.8 [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm opened a new pull request, #1157: URL: https://github.com/apache/arrow-adbc/pull/1157 Needed to avoid linker issues with cgo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] GH-37002: [C++][Parquet] Add api to get RecordReader from RowGroupReader [arrow]

2023-10-03 Thread via GitHub
fatemehp commented on PR #37003: URL: https://github.com/apache/arrow/pull/37003#issuecomment-1745617673 I am sorry I got distracted for a while. I will make the changes requested and update here. @mapleFU are you suggesting that it is not necessary to remove the internal namespace? -- T

Re: [PR] GH-37993: [CI] Fix conda-integration build [arrow]

2023-10-03 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37990: URL: https://github.com/apache/arrow/pull/37990#issuecomment-1745602216 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit f6afc33a9e1b4cbd336982bf99b330ccacbf8566. There were no

Re: [I] [CI][Dev][Archery] ARM self-hosted runners fail to set up archery due to missing Python.h when installing ruamel.yaml [arrow]

2023-10-03 Thread via GitHub
raulcd commented on issue #37999: URL: https://github.com/apache/arrow/issues/37999#issuecomment-1745600120 @assignUser @amaldonadomat can you investigate this issue with our self-hosted runners? I'll be traveling for the rest of the week and won't be able to follow up -- This is an auto

Re: [PR] GH-37996: [MATLAB] Add a static constructor method named `fromMATLAB` to `arrow.array.StructArray` [arrow]

2023-10-03 Thread via GitHub
kevingurney commented on code in PR #37998: URL: https://github.com/apache/arrow/pull/37998#discussion_r1344627052 ## matlab/test/arrow/array/tStructArray.m: ## @@ -273,5 +273,91 @@ function IsEqualFalse(tc) tc.verifyFalse(isequal(array1, array3)); end +

Re: [PR] Remove redundant is_numeric for DataType [arrow-datafusion]

2023-10-03 Thread via GitHub
qrilka commented on PR #7734: URL: https://github.com/apache/arrow-datafusion/pull/7734#issuecomment-1745592310 I didn't find any other methods from `DataTypes` which could replace functions from the `type_coercion` module -- This is an automated message from the Apache Git Service. To r

[PR] Remove redundant is_numeric for DataType [arrow-datafusion]

2023-10-03 Thread via GitHub
qrilka opened a new pull request, #7734: URL: https://github.com/apache/arrow-datafusion/pull/7734 The method `is_numeric` is available in `arrow-rs` since version 3.0.0 ## Which issue does this PR close? Closes #1613 . ## Rationale for this change Remo

Re: [I] [Packaging][Release] Reduce disk requirements for linux packaging jobs [arrow]

2023-10-03 Thread via GitHub
raulcd commented on issue #35964: URL: https://github.com/apache/arrow/issues/35964#issuecomment-1745584476 There is not much update yet, as a temporary workaround we do clean some unused binaries and cached packages on the GitHub runner by using the following script: https://github.com/apa

[PR] fix(go/adbc/driver/snowflake): proper timezone for timestamp_ltz [arrow-adbc]

2023-10-03 Thread via GitHub
zeroshade opened a new pull request, #1155: URL: https://github.com/apache/arrow-adbc/pull/1155 Fixes #1154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] chore(go/adbc): update go.mod dependencies [arrow-adbc]

2023-10-03 Thread via GitHub
zeroshade opened a new pull request, #1156: URL: https://github.com/apache/arrow-adbc/pull/1156 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
etseidl commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745578469 Thanks again @pitrou @mapleFU @rok for shepherding this PR through. It wound up much better than when I started :) -- This is an automated message from the Apache Git Service. To resp

[PR] GH-37996: [MATLAB] Add a static constructor method named `fromMATLAB` to `arrow.array.StructArray` [arrow]

2023-10-03 Thread via GitHub
sgilmore10 opened a new pull request, #37998: URL: https://github.com/apache/arrow/pull/37998 ### Rationale for this change Right now, the only way to construct an `arrow.array.StructArray` is to call its static method `fromArrays` method. Doing so requires users to first con

Re: [PR] GH-37917: [Parquet] Add OpenAsync for FileSource [arrow]

2023-10-03 Thread via GitHub
eeroel commented on PR #37918: URL: https://github.com/apache/arrow/pull/37918#issuecomment-1745576851 > @eeroel could you please rebase to pick up the fix ( https://github.com/apache/arrow/pull/37867/files#diff-1bba462ab050e89360fd88110a689e85ee037749cea091a1848ab574381d3795R155 ) for seve

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
pitrou commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745573758 Thanks a lot for this @etseidl . It was embarassing not to get any space-saving benefits from the encoding... -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
pitrou merged PR #37940: URL: https://github.com/apache/arrow/pull/37940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] conda-forge missing Windows Packages [arrow-adbc]

2023-10-03 Thread via GitHub
lidavidm commented on issue #1149: URL: https://github.com/apache/arrow-adbc/issues/1149#issuecomment-1745558324 Let's see how this goes: https://github.com/conda-forge/arrow-adbc-split-feedstock/pull/17 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] fix: avro_to_arrow: Handle avro nested nullable struct (union) [arrow-datafusion]

2023-10-03 Thread via GitHub
sarutak commented on PR #7663: URL: https://github.com/apache/arrow-datafusion/pull/7663#issuecomment-1745557946 @alamb Could you trigger GA workflows? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [FlightRPC] FlightSQL: Expose custom query metadata [arrow]

2023-10-03 Thread via GitHub
lidavidm commented on issue #37635: URL: https://github.com/apache/arrow/issues/37635#issuecomment-1745539127 they're generated during the build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
etseidl commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745530511 > > The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well. > > Yeah, but I guess you mean `DELTA_BINARY_PACKED`, and not `DELTA_BYTE_ARRAY`...

Re: [I] [FlightRPC] FlightSQL: Expose custom query metadata [arrow]

2023-10-03 Thread via GitHub
aiguofer commented on issue #37635: URL: https://github.com/apache/arrow/issues/37635#issuecomment-1745530012 I looked around, but I'm not sure how generate the grpc stubs for java. Are there any docs on that? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-10-03 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1344563962 ## cpp/src/arrow/array/builder_nested.h: ## @@ -40,37 +40,46 @@ namespace arrow { /// @{ // --

[PR] MINOR: [docs] fix two typos in data.rst [arrow]

2023-10-03 Thread via GitHub
Erik-McKelvey opened a new pull request, #37997: URL: https://github.com/apache/arrow/pull/37997 ### What changes are included in this PR? Fixed two minor typos in data.rst -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-10-03 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1344560663 ## cpp/src/arrow/array/array_nested.h: ## @@ -216,6 +231,170 @@ class ARROW_EXPORT LargeListArray : public BaseListArray { void SetData(const std::shared_ptr& dat

Re: [I] [Packaging][Release] Reduce disk requirements for linux packaging jobs [arrow]

2023-10-03 Thread via GitHub
lriggs commented on issue #35964: URL: https://github.com/apache/arrow/issues/35964#issuecomment-1745525445 Are there any updates on work towards a long term solution? I'm running into this problem trying to build arrow jars using the default ubuntu github runner. The problem started recent

Re: [PR] MINOR: change file to column index in page_filter trace log [arrow-datafusion]

2023-10-03 Thread via GitHub
mapleFU commented on PR #7730: URL: https://github.com/apache/arrow-datafusion/pull/7730#issuecomment-1745525744 I've rebase and update the description now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] GH-37939: [C++] Use signed arithmetic for frame of reference when encoding DELTA_BINARY_PACKED [arrow]

2023-10-03 Thread via GitHub
mapleFU commented on PR #37940: URL: https://github.com/apache/arrow/pull/37940#issuecomment-1745523217 > The compression ratio in the DELTA_BYTE_ARRAY benchmarks is now much better as well. Yeah, but I guess you mean `DELTA_BINARY_PACKED`, and not `DELTA_BYTE_ARRAY`... -- Thi

Re: [PR] Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator [arrow-datafusion]

2023-10-03 Thread via GitHub
Dandandan commented on code in PR #7721: URL: https://github.com/apache/arrow-datafusion/pull/7721#discussion_r1344548482 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -0,0 +1,647 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] GH-37917: [Parquet] Add OpenAsync for FileSource [arrow]

2023-10-03 Thread via GitHub
bkietz commented on PR #37918: URL: https://github.com/apache/arrow/pull/37918#issuecomment-1745507640 @eeroel could you please rebase to pick up the fix ( https://github.com/apache/arrow/pull/37867/files#diff-1bba462ab050e89360fd88110a689e85ee037749cea091a1848ab574381d3795R155 ) for severa

Re: [PR] GH-37876: [Format] Add list-view specification to arrow format [arrow]

2023-10-03 Thread via GitHub
felipecrv commented on code in PR #37877: URL: https://github.com/apache/arrow/pull/37877#discussion_r1344545981 ## docs/source/format/Columnar.rst: ## @@ -618,8 +726,8 @@ for the null struct but they are "hidden" by the struct array's validity bitmap. However, when treated in

Re: [PR] Optimize "ORDER BY + LIMIT" queries for speed / memory with special TopK operator [arrow-datafusion]

2023-10-03 Thread via GitHub
Dandandan commented on code in PR #7721: URL: https://github.com/apache/arrow-datafusion/pull/7721#discussion_r1344546762 ## datafusion/sqllogictest/test_files/aal.slt: ## @@ -0,0 +1,232 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Aha I w

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-03 Thread via GitHub
github-actions[bot] commented on PR #34616: URL: https://github.com/apache/arrow/pull/34616#issuecomment-1745498694 Revision: d5ba855ea4804c5baa6bd8a4a0852b8e634a5ed4 Submitted crossbow builds: [ursacomputing/crossbow @ actions-0500a0c90f](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API [arrow]

2023-10-03 Thread via GitHub
anjakefala commented on PR #34616: URL: https://github.com/apache/arrow/pull/34616#issuecomment-1745494358 @github-actions crossbow submit test-conda-python-3.10-pandas-latest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Support Parsing Avro File Headers [arrow-rs]

2023-10-03 Thread via GitHub
tustvold commented on code in PR #4888: URL: https://github.com/apache/arrow-rs/pull/4888#discussion_r1344537602 ## arrow-avro/src/reader/mod.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See t

  1   2   3   >