Re: [PR] GH-38893: [R] Fix printf syntax in altrep.cpp [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38894: URL: https://github.com/apache/arrow/pull/38894#issuecomment-1829270150 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 9cece9dc3e63956a8cbfc125026a17eb3a7ae3dc. There were no

Re: [PR] MINOR: [Docs] Replace "have" with "indicate" in the "Struct validity" section of the docs [arrow]

2023-11-27 Thread via GitHub
kou merged PR #38895: URL: https://github.com/apache/arrow/pull/38895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] MINOR: [Docs] Replace "have" with "indicate" in the "Struct validity" section of the docs [arrow]

2023-11-27 Thread via GitHub
kou commented on PR #38895: URL: https://github.com/apache/arrow/pull/38895#issuecomment-1829260182 Ah, I see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] GH-38883: [Docs] Replace "have" with "indicate" in the "Struct validity" section of the docs [arrow]

2023-11-27 Thread via GitHub
stfdxv commented on PR #38895: URL: https://github.com/apache/arrow/pull/38895#issuecomment-1829191830 I've made this PR for training purposes, as I'm new to this. It's useful but it's not connected with the issue GH-38883. Please see my second PR https://github.com/apache/arrow/pull/3889

Re: [PR] GH-36831: [C++] DictionaryArray support for MinMax Function [arrow]

2023-11-27 Thread via GitHub
R-JunmingChen commented on code in PR #37100: URL: https://github.com/apache/arrow/pull/37100#discussion_r1371683818 ## cpp/src/arrow/compute/kernels/aggregate_basic_internal.h: ## @@ -912,6 +912,122 @@ struct NullMinMaxImpl : public ScalarAggregator { } }; +template +str

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-27 Thread via GitHub
eeroel commented on code in PR #37868: URL: https://github.com/apache/arrow/pull/37868#discussion_r1407276532 ## python/pyarrow/_dataset.pyx: ## @@ -96,27 +96,33 @@ def _get_parquet_symbol(name): return _dataset_pq and getattr(_dataset_pq, name) -cdef CFileSource _make_

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-27 Thread via GitHub
eeroel commented on code in PR #37868: URL: https://github.com/apache/arrow/pull/37868#discussion_r1407274503 ## python/pyarrow/_dataset.pyx: ## @@ -96,27 +96,33 @@ def _get_parquet_symbol(name): return _dataset_pq and getattr(_dataset_pq, name) -cdef CFileSource _make_

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-27 Thread via GitHub
eeroel commented on code in PR #37868: URL: https://github.com/apache/arrow/pull/37868#discussion_r1407275703 ## python/pyarrow/_dataset.pyx: ## @@ -96,27 +96,33 @@ def _get_parquet_symbol(name): return _dataset_pq and getattr(_dataset_pq, name) -cdef CFileSource _make_

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-27 Thread via GitHub
eeroel commented on code in PR #37868: URL: https://github.com/apache/arrow/pull/37868#discussion_r1407275703 ## python/pyarrow/_dataset.pyx: ## @@ -96,27 +96,33 @@ def _get_parquet_symbol(name): return _dataset_pq and getattr(_dataset_pq, name) -cdef CFileSource _make_

Re: [PR] GH-37857: [Python][Dataset] Expose file size to python dataset [arrow]

2023-11-27 Thread via GitHub
eeroel commented on code in PR #37868: URL: https://github.com/apache/arrow/pull/37868#discussion_r1407274503 ## python/pyarrow/_dataset.pyx: ## @@ -96,27 +96,33 @@ def _get_parquet_symbol(name): return _dataset_pq and getattr(_dataset_pq, name) -cdef CFileSource _make_

Re: [PR] GH-38883: [Docs] Replace "have" with "indicate" in the "Struct validity" section of the docs [arrow]

2023-11-27 Thread via GitHub
kou commented on PR #38895: URL: https://github.com/apache/arrow/pull/38895#issuecomment-1829171375 Is it intentional that you change the text instead of example validities? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Change output ordering display of sources [arrow-datafusion]

2023-11-27 Thread via GitHub
mustafasrepo commented on issue #8297: URL: https://github.com/apache/arrow-datafusion/issues/8297#issuecomment-1829163971 Thanks @QuenKar! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] GH-38909: [Packaging] Drop support for Ubuntu 23.04 [arrow]

2023-11-27 Thread via GitHub
github-actions[bot] commented on PR #38910: URL: https://github.com/apache/arrow/pull/38910#issuecomment-1829142097 Revision: ad853ca73c5275db595cba5b07fc655190e55fb0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-377bacf654](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-38738: [C++] Check variadic buffer counts in bounds [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38740: URL: https://github.com/apache/arrow/pull/38740#issuecomment-1829142549 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 84c15da1997559c37841dc16f9e2c70c643dd9d2. There were no

Re: [PR] GH-38909: [Packaging] Drop support for Ubuntu 23.04 [arrow]

2023-11-27 Thread via GitHub
github-actions[bot] commented on PR #38910: URL: https://github.com/apache/arrow/pull/38910#issuecomment-1829141277 ``` Invalid group(s) {'ubuntu'}. Must be one of {'conan', 'vcpkg', 'nightly-packaging', 'linux-arm64', 'linux-amd64', 'linux', 'fuzz', 'c-glib', 'homebrew', 'example-cpp',

Re: [PR] GH-38909: [Packaging] Drop support for Ubuntu 23.04 [arrow]

2023-11-27 Thread via GitHub
github-actions[bot] commented on PR #38910: URL: https://github.com/apache/arrow/pull/38910#issuecomment-1829140451 :warning: GitHub issue #38909 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] GH-38909: [Packaging] Drop support for Ubuntu 23.04 [arrow]

2023-11-27 Thread via GitHub
kou commented on PR #38910: URL: https://github.com/apache/arrow/pull/38910#issuecomment-1829140188 @github-actions crossbow submit -g ubuntu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] GH-38909: [Packaging] Drop support for Ubuntu 23.04 [arrow]

2023-11-27 Thread via GitHub
kou opened a new pull request, #38910: URL: https://github.com/apache/arrow/pull/38910 ### Rationale for this change It will reach EOL on 2024-01 and our next major release will not be happen in this year. ### What changes are included in this PR? Remove Ubuntu 23.04 rel

Re: [PR] GH-38701: [C++][FS][Azure] Implement `DeleteDirContents()` [arrow]

2023-11-27 Thread via GitHub
kou commented on PR #3: URL: https://github.com/apache/arrow/pull/3#issuecomment-1829133709 Updated: * Use `PathNotFound()` * Use `for` * Add an argument name comment * Skip a test on macOS that has a problem with Azurite -- This is an automated message from the Apache

Re: [PR] GH-38701: [C++][FS][Azure] Implement `DeleteDirContents()` [arrow]

2023-11-27 Thread via GitHub
kou commented on code in PR #3: URL: https://github.com/apache/arrow/pull/3#discussion_r1407195527 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -970,6 +970,78 @@ class AzureFileSystem::Impl { return stream; } + private: + Status DeleteDirContentsWihtoutHierar

Re: [PR] GH-38884: [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed [arrow]

2023-11-27 Thread via GitHub
mapleFU commented on code in PR #38885: URL: https://github.com/apache/arrow/pull/38885#discussion_r1407194544 ## cpp/src/arrow/dataset/dataset_writer.cc: ## @@ -621,11 +621,18 @@ class DatasetWriter::DatasetWriterImpl { backpressure = writer_state_.open_files_throttle.

Re: [PR] perf: Better bit packing-unpacking algorithms [arrow-nanoarrow]

2023-11-27 Thread via GitHub
WillAyd commented on code in PR #326: URL: https://github.com/apache/arrow-nanoarrow/pull/326#discussion_r1407063468 ## src/nanoarrow/buffer_inline.h: ## @@ -223,14 +224,11 @@ static inline int64_t _ArrowBytesForBits(int64_t bits) { } static inline void _ArrowBitsUnpackInt8(

Re: [PR] perf: Better bit packing-unpacking algorithms [arrow-nanoarrow]

2023-11-27 Thread via GitHub
codecov-commenter commented on PR #326: URL: https://github.com/apache/arrow-nanoarrow/pull/326#issuecomment-1829046319 ## [Codecov](https://app.codecov.io/gh/apache/arrow-nanoarrow/pull/326?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_t

[PR] perf: Better bit packing-unpacking algorithms [arrow-nanoarrow]

2023-11-27 Thread via GitHub
WillAyd opened a new pull request, #326: URL: https://github.com/apache/arrow-nanoarrow/pull/326 In https://github.com/apache/arrow-nanoarrow/pull/280 the algorithms seemed to make a huge difference in performance on my intel x86 chip, but other platforms didn't see as much. I [asked on SO

Re: [I] [R] Fix failing windows build on CI [arrow]

2023-11-27 Thread via GitHub
assignUser commented on issue #38906: URL: https://github.com/apache/arrow/issues/38906#issuecomment-1829037283 I think disabling until that PR is merged is fine? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [R] Include and LIB flags are missing on macOS [arrow]

2023-11-27 Thread via GitHub
assignUser commented on issue #38902: URL: https://github.com/apache/arrow/issues/38902#issuecomment-1829036364 I am pretty sure that I was able to reproduce the error on a gha mac runner by using the pkg-config binary from CRAN. ``` Package libcurl was not found in the pkg-config sea

Re: [PR] GH-38007: [C++][Python] Add VariableShapeTensor implementation [arrow]

2023-11-27 Thread via GitHub
rok commented on code in PR #38008: URL: https://github.com/apache/arrow/pull/38008#discussion_r1407051615 ## python/pyarrow/scalar.pxi: ## @@ -1027,6 +1027,53 @@ cdef class ExtensionScalar(Scalar): return pyarrow_wrap_scalar( sp_scalar) +cdef class VariableShapeTen

Re: [PR] GH-38824: [Go] Enable GC checks [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38826: URL: https://github.com/apache/arrow/pull/38826#issuecomment-1829029324 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 1cd22df3fbf69840fa96a22596a79ccab4a666cc. There were no

Re: [PR] GH-38907: [C++] Stop installing internal bpacking_simd* headers [arrow]

2023-11-27 Thread via GitHub
github-actions[bot] commented on PR #38908: URL: https://github.com/apache/arrow/pull/38908#issuecomment-1829028979 :warning: GitHub issue #38907 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38907: [C++] Stop installing internal bpacking_simd* headers [arrow]

2023-11-27 Thread via GitHub
kou opened a new pull request, #38908: URL: https://github.com/apache/arrow/pull/38908 ### Rationale for this change They are for internal use. We don't need to install them. ### What changes are included in this PR? Use `_internal.h` suffix to avoid installing them.

Re: [PR] GH-38701: [C++][FS][Azure] Implement `DeleteDirContents()` [arrow]

2023-11-27 Thread via GitHub
felipecrv commented on code in PR #3: URL: https://github.com/apache/arrow/pull/3#discussion_r1407044310 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -970,6 +970,78 @@ class AzureFileSystem::Impl { return stream; } + private: + Status DeleteDirContentsWihtout

Re: [PR] GH-38007: [C++][Python] Add VariableShapeTensor implementation [arrow]

2023-11-27 Thread via GitHub
rok commented on code in PR #38008: URL: https://github.com/apache/arrow/pull/38008#discussion_r1407043145 ## python/pyarrow/scalar.pxi: ## @@ -1027,6 +1027,53 @@ cdef class ExtensionScalar(Scalar): return pyarrow_wrap_scalar( sp_scalar) +cdef class VariableShapeTen

Re: [PR] GH-38874: [C++][Parquet] Minor: making parquet TypedComparator operation as const method [arrow]

2023-11-27 Thread via GitHub
mapleFU commented on PR #38875: URL: https://github.com/apache/arrow/pull/38875#issuecomment-1829007479 Will merge it tonight if no further comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] feat:implement sql style 'find_in_set' string function [arrow-datafusion]

2023-11-27 Thread via GitHub
Syleechan commented on code in PR #8328: URL: https://github.com/apache/arrow-datafusion/pull/8328#discussion_r1406997686 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1170,6 +1171,20 @@ substr_index(str, delim, count) - **delim**: the string to find in str to split

Re: [PR] GH-38865 [C++][Parquet] support passing a RowRange to RecordBatchReader [arrow]

2023-11-27 Thread via GitHub
binmahone commented on PR #38867: URL: https://github.com/apache/arrow/pull/38867#issuecomment-1828959046 hi @emkornfield , thanks for your thorough review. Looks like I should have provided some background to reviewers: 1. We're working on native parquet reading in Clickhouse, which

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-27 Thread via GitHub
indigophox commented on PR #34817: URL: https://github.com/apache/arrow/pull/34817#issuecomment-1828958425 > Thanks for the suggestions. I haven't try this yet but can I reuse `ServerSessionMiddleware` for my case? Or should I implement a similar middleware from scratch? There are ar

Re: [PR] GH-38432: [C++][Parquet] Try to fix performance regression in the DictByteArrayDecoderImpl [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38784: URL: https://github.com/apache/arrow/pull/38784#issuecomment-1828943728 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 60708150f1e668ccf51302959921d3a9977d7118. There was 1 b

Re: [I] [C++] GCC complains about redundant move [arrow]

2023-11-27 Thread via GitHub
MrJia1997 commented on issue #38889: URL: https://github.com/apache/arrow/issues/38889#issuecomment-1828943662 Sorry, it turns out there's a problem on our build setup. Closed the issue. Thank you for your quick response! -- This is an automated message from the Apache Git Service. To res

Re: [PR] GH-38865 [C++][Parquet] support passing a RowRange to RecordBatchReader [arrow]

2023-11-27 Thread via GitHub
wgtmac commented on code in PR #38867: URL: https://github.com/apache/arrow/pull/38867#discussion_r1406978633 ## cpp/src/parquet/column_reader.h: ## @@ -302,8 +303,274 @@ class TypedColumnReader : public ColumnReader { int32_t* dict_len

Re: [I] [C++][Parquet] Using BMI to implement filter pushdown [arrow]

2023-11-27 Thread via GitHub
wgtmac commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1828931646 > > Finally I have got some time to complete the design doc drafted by @mapleFU: https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gVzjkgM/. > > This prop

Re: [I] [C++][Parquet] Using BMI to implement filter pushdown [arrow]

2023-11-27 Thread via GitHub
emkornfield commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1828913912 > > Finally I have got some time to complete the design doc drafted by @mapleFU: https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gVzjkgM/. > > This

Re: [I] [C++][Parquet] support passing a RowRange to RecordBatchReader [arrow]

2023-11-27 Thread via GitHub
emkornfield commented on issue #38865: URL: https://github.com/apache/arrow/issues/38865#issuecomment-1828912196 @wgtmac Yes, left some comments on the doc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Avoid concat for `array_replace` [arrow-datafusion]

2023-11-27 Thread via GitHub
jayzhan211 opened a new pull request, #8337: URL: https://github.com/apache/arrow-datafusion/pull/8337 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] [C++][Parquet] support passing a RowRange to RecordBatchReader [arrow]

2023-11-27 Thread via GitHub
wgtmac commented on issue #38865: URL: https://github.com/apache/arrow/issues/38865#issuecomment-1828902852 Oops. Sorry about that. I have changed the doc to allow comment. Could you confirm that? @emkornfield -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Upgrade Arrow Java project to JPMS Java Platform Module System [arrow]

2023-11-27 Thread via GitHub
jduo commented on PR #38876: URL: https://github.com/apache/arrow/pull/38876#issuecomment-1828899964 I'm looking for some ideas on how to fix up a maven issue. I've rigged the general build configuration in the parent POM to continue to use JDK 8 and exclude module-info.java files. I've als

Re: [PR] GH-38728: [Go] ipc: put lz4 decompression buffers back into sync.Pool [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38729: URL: https://github.com/apache/arrow/pull/38729#issuecomment-1828849001 After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b0e1f748f5f96d3a7f79edf2b959a01af347dfda. There were no

Re: [PR] GH-38905: Spelling fixes [arrow]

2023-11-27 Thread via GitHub
kou commented on PR #38896: URL: https://github.com/apache/arrow/pull/38896#issuecomment-1828825286 This is too large to review. Could you split this to small PRs? For example, you can open one PR per subdirectory such as `.github/`, `c_glib/` and `ci/`. But one PR per subdirectory

Re: [PR] Introduce Boolean Coercion [arrow-datafusion]

2023-11-27 Thread via GitHub
jayzhan211 commented on code in PR #8331: URL: https://github.com/apache/arrow-datafusion/pull/8331#discussion_r1406899986 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -557,10 +557,16 @@ select column1[0:5], column2[0:3], column3[0:9] from arrays; ## make_array (alia

Re: [PR] Fix negative decimal string [arrow-rs]

2023-11-27 Thread via GitHub
viirya commented on code in PR #5128: URL: https://github.com/apache/arrow-rs/pull/5128#discussion_r1406872076 ## arrow-cast/src/cast.rs: ## @@ -2677,7 +2688,17 @@ where i256::ZERO }; -format!("{}", integers.add_wrapping(adjusted)) +let in

Re: [PR] Fix negative decimal string [arrow-rs]

2023-11-27 Thread via GitHub
viirya commented on code in PR #5128: URL: https://github.com/apache/arrow-rs/pull/5128#discussion_r1406870924 ## arrow-cast/src/cast.rs: ## @@ -8304,6 +8331,8 @@ mod tests { Some(""), Some(" "), None, +Some("-1.2349"), +

Re: [PR] Fix negative decimal string [arrow-rs]

2023-11-27 Thread via GitHub
viirya commented on code in PR #5128: URL: https://github.com/apache/arrow-rs/pull/5128#discussion_r1406870642 ## arrow-cast/src/cast.rs: ## @@ -2642,7 +2642,18 @@ where } let integers = parts[0].trim_start_matches('0'); -let decimals = if parts.len() == 2 { part

Re: [I] Timestamp overflows for extreme low/high values [arrow-datafusion]

2023-11-27 Thread via GitHub
comphead commented on issue #8336: URL: https://github.com/apache/arrow-datafusion/issues/8336#issuecomment-1828765504 > > The overflow happens because Datafusion treats underlying i64 as nanoseconds which is obviously not big enough. > > I'm confused by this statement, DataFusion fo

Re: [PR] GH-38905: Spelling fixes [arrow]

2023-11-27 Thread via GitHub
github-actions[bot] commented on PR #38896: URL: https://github.com/apache/arrow/pull/38896#issuecomment-1828762302 :warning: GitHub issue #38905 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Detect when filters on unique constraints make subqueries scalar [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on PR #8312: URL: https://github.com/apache/arrow-datafusion/pull/8312#issuecomment-1828758686 Thank you @Jesse-Bakker -- I plan to review this carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Double type argument for to_timestamp function [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on PR #8159: URL: https://github.com/apache/arrow-datafusion/pull/8159#issuecomment-1828758196 Marking as draft to signify this isn't waiting on feedback anymore. Please mark it as ready for review when it is -- This is an automated message from the Apache Git Service. To

Re: [PR] feat:implement sql style 'find_in_set' string function [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8328: URL: https://github.com/apache/arrow-datafusion/pull/8328#discussion_r1406862383 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1170,6 +1171,20 @@ substr_index(str, delim, count) - **delim**: the string to find in str to split str

Re: [PR] GH-38900:[JS] Fix spelling [arrow]

2023-11-27 Thread via GitHub
domoritz merged PR #38901: URL: https://github.com/apache/arrow/pull/38901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [I] Change output ordering display of sources [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb closed issue #8297: Change output ordering display of sources URL: https://github.com/apache/arrow-datafusion/issues/8297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: output ordering [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb merged PR #8304: URL: https://github.com/apache/arrow-datafusion/pull/8304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: rename parquet.rs to parquet/mod.rs [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb merged PR #8301: URL: https://github.com/apache/arrow-datafusion/pull/8301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Introduce Boolean Coercion [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8331: URL: https://github.com/apache/arrow-datafusion/pull/8331#discussion_r1406858141 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -353,6 +354,20 @@ fn string_temporal_coercion( } } +/// Coerce `Boolean` to other larger types, lik

Re: [I] Timestamp overflows for extreme low/high values [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on issue #8336: URL: https://github.com/apache/arrow-datafusion/issues/8336#issuecomment-1828748601 > The overflow happens because Datafusion treats underlying i64 as nanoseconds which is obviously not big enough. I'm confused by this statement, DataFusion follows

Re: [I] Timestamp overflows for extreme low/high values [arrow-datafusion]

2023-11-27 Thread via GitHub
comphead commented on issue #8336: URL: https://github.com/apache/arrow-datafusion/issues/8336#issuecomment-1828745837 The overflow happens because Datafusion treats underlying `i64` as nanoseconds which is obviously not big enough. Other sql engines like PG/Spark/DuckDB treat `i64` as m

Re: [PR] fix: wrong result of range function [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb merged PR #8313: URL: https://github.com/apache/arrow-datafusion/pull/8313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] fix: wrong result of range function [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on PR #8313: URL: https://github.com/apache/arrow-datafusion/pull/8313#issuecomment-1828739177 Thank you everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] The range function results in an error when step is a negative number [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb closed issue #8311: The range function results in an error when step is a negative number URL: https://github.com/apache/arrow-datafusion/issues/8311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] minor: fix documentation [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb merged PR #8323: URL: https://github.com/apache/arrow-datafusion/pull/8323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Nov 27, 2023 [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on issue #8329: URL: https://github.com/apache/arrow-datafusion/issues/8329#issuecomment-1828735234 I am very backed up on reviews. I will try and work through them over the next few days -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] move array function unit_tests to sqllogictest [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on PR #8332: URL: https://github.com/apache/arrow-datafusion/pull/8332#issuecomment-1828734218 Let me know if this looks good to you @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Timestamp overflows for extreme low/high values [arrow-datafusion]

2023-11-27 Thread via GitHub
comphead opened a new issue, #8336: URL: https://github.com/apache/arrow-datafusion/issues/8336 ### Describe the bug Timestamp literal conversion fails to be created from extreme values. ### To Reproduce ``` ❯ SELECT to_timestamp(-62125747200); Optimizer rule 'simpl

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406836605 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,925 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [I] [C++] GCS: report common prefixes as directories [arrow]

2023-11-27 Thread via GitHub
drauschenbach commented on issue #32403: URL: https://github.com/apache/arrow/issues/32403#issuecomment-1828724118 I wanted to leave this breadcrumb somewhere, but not sure where. I noticed a discrepancy between "directories" created via Arrow vs directories created via the GCS cloud consol

Re: [I] [R] Update NEWS.md for 14.0.0.1 [arrow]

2023-11-27 Thread via GitHub
assignUser commented on issue #38864: URL: https://github.com/apache/arrow/issues/38864#issuecomment-1828720613 I have opened https://github.com/apache/arrow/issues/38904 for the 14.0.2 news -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-36760: [Go] Add Avro OCF reader [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37115: URL: https://github.com/apache/arrow/pull/37115#issuecomment-1828708787 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5ab60eaea3afaf1ff58e9f70bed481d6e726dd69. There were no

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406816141 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,925 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[I] Parquet pruning will be incorrect if field names are repeated [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb opened a new issue, #8335: URL: https://github.com/apache/arrow-datafusion/issues/8335 ### Describe the bug Basically if you have a parquet file with a nested field that has the same name as a top level field, the datafusion parquet reader will read statistics for the nested fi

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406810742 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,925 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] GH-38864: [R] Update NEWS.md for 14.0.0.1 [arrow]

2023-11-27 Thread via GitHub
thisisnic merged PR #38866: URL: https://github.com/apache/arrow/pull/38866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [I] [R] Include and LIB flags are missing on macOS [arrow]

2023-11-27 Thread via GitHub
assignUser commented on issue #38902: URL: https://github.com/apache/arrow/issues/38902#issuecomment-1828661489 I have marked this as a blocker for now, if it seems to hold up the 14.0.2 release we will likely be able to cherry pick it before the release to cran as it is most likely an issu

Re: [I] [R] CRAN packaging checklist for 14.0.0 [arrow]

2023-11-27 Thread via GitHub
assignUser commented on issue #38141: URL: https://github.com/apache/arrow/issues/38141#issuecomment-1828647744 A RC for 14.0.2 will be cut this week so I am adding all of the cherrypicked issues (including the cmake change as there is no issue adding that to a new release :tada: ) -- Th

Re: [PR] GH-38893: [R] Fix printf syntax in altrep.cpp [arrow]

2023-11-27 Thread via GitHub
assignUser merged PR #38894: URL: https://github.com/apache/arrow/pull/38894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[PR] GH-38900: js Fix spelling [arrow]

2023-11-27 Thread via GitHub
jsoref opened a new pull request, #38901: URL: https://github.com/apache/arrow/pull/38901 # ### Rationale for this change ### What changes are included in this PR? Spelling fixes to js/ ### Are these changes tested? ### Are there any

Re: [PR] GH-38893: [R] Fix printf syntax in altrep.cpp [arrow]

2023-11-27 Thread via GitHub
assignUser commented on PR #38894: URL: https://github.com/apache/arrow/pull/38894#issuecomment-1828630024 > The Windows error is from cpp11 It looks like we are using most recent cpp11 and that doesn't show this issue on r devel, are we not vendoring the newest headers or something?

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406752147 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406752147 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406752147 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406748077 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,925 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406745661 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406743880 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,925 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] GH-28994: [C++][JSON] Change the max rows to Unlimited(int_32) [arrow]

2023-11-27 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38582: URL: https://github.com/apache/arrow/pull/38582#issuecomment-1828605442 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit ca4655764900f3e216d4d0a9586a03b78dee7f01. There were no

Re: [I] Support parquet statistics for struct columns [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on issue #8334: URL: https://github.com/apache/arrow-datafusion/issues/8334#issuecomment-1828596893 > Yes, this is what I would expect in the absence of a column name collision. My greater concern is that the presence of the struct column will mess up the ordinals of the ot

Re: [PR] Parquet: derive boundary order when writing [arrow-rs]

2023-11-27 Thread via GitHub
Jefffrey commented on code in PR #5110: URL: https://github.com/apache/arrow-rs/pull/5110#discussion_r1406735823 ## parquet/src/column/writer/mod.rs: ## @@ -2891,6 +2938,158 @@ mod tests { assert!(incremented.is_none()) } +#[test] +fn test_boundary_order(

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406736395 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406731439 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] Extract parquet statistics to its own module, add tests [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb commented on code in PR #8294: URL: https://github.com/apache/arrow-datafusion/pull/8294#discussion_r1406585965 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -0,0 +1,807 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] feat: Add integration testing reader for column and batch [arrow-nanoarrow]

2023-11-27 Thread via GitHub
codecov-commenter commented on PR #325: URL: https://github.com/apache/arrow-nanoarrow/pull/325#issuecomment-1828576328 ## [Codecov](https://app.codecov.io/gh/apache/arrow-nanoarrow/pull/325?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_t

[PR] feat: Add integration testing reader for column and batch [arrow-nanoarrow]

2023-11-27 Thread via GitHub
paleolimbot opened a new pull request, #325: URL: https://github.com/apache/arrow-nanoarrow/pull/325 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Support parquet statistics for struct columns [arrow-datafusion]

2023-11-27 Thread via GitHub
tustvold commented on issue #8334: URL: https://github.com/apache/arrow-datafusion/issues/8334#issuecomment-1828563816 Yes, this is what I would expect in the absence of a column name collision. My greater concern is that the presence of the struct column will mess up the ordinals of the o

Re: [I] Extension type not preserved on reading from the serialized schema [arrow]

2023-11-27 Thread via GitHub
danepitkin commented on issue #38891: URL: https://github.com/apache/arrow/issues/38891#issuecomment-1828549646 No problem! Glad I could help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Support parquet statistics for struct columns [arrow-datafusion]

2023-11-27 Thread via GitHub
alamb opened a new issue, #8334: URL: https://github.com/apache/arrow-datafusion/issues/8334 ### Is your feature request related to a problem or challenge? While working on https://github.com/apache/arrow-datafusion/pull/8294 @tustvold noted that the statistics extraction code does n

  1   2   3   4   >