[GitHub] [arrow] Fokko commented on pull request #36846: GH-36845: [C++][Python] Allow type promotion on `pa.concat_tables`

2023-08-22 Thread via GitHub
Fokko commented on PR #36846: URL: https://github.com/apache/arrow/pull/36846#issuecomment-1689372348 Alright, doing casts during reads has its [own issues](https://github.com/apache/arrow/issues/36845) (this might be faster, because it reads it right into the correct format?). Also, other

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302547704 ## cpp/src/parquet/page_index.cc: ## @@ -830,14 +894,13 @@ RowGroupIndexReadRange PageIndexReader::DeterminePageIndexRangesInRowGroup( // -

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302546880 ## cpp/src/parquet/encryption/test_encryption_util.cc: ## @@ -509,4 +513,178 @@ void FileDecryptor::CheckFile(parquet::ParquetFileReader* file_reader, } } +void F

[GitHub] [arrow] rsm-23 commented on a diff in pull request #37301: GH-35167: [Docs][C++] Updated arguments count in example for TableReader

2023-08-22 Thread via GitHub
rsm-23 commented on code in PR #37301: URL: https://github.com/apache/arrow/pull/37301#discussion_r1302538388 ## docs/source/cpp/json.rst: ## @@ -58,9 +58,8 @@ the output table. // Instantiate TableReader from input stream and options std::shared_ptr reader; Rev

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302537048 ## cpp/src/parquet/page_index.h: ## @@ -332,7 +340,8 @@ class PARQUET_EXPORT OffsetIndexBuilder { class PARQUET_EXPORT PageIndexBuilder { public: /// \brief API co

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302534492 ## cpp/src/parquet/page_index.h: ## @@ -186,8 +191,7 @@ class PARQUET_EXPORT PageIndexReader { /// that creates this PageIndexReader. static std::shared_ptr Make(

[GitHub] [arrow-datafusion] smallzhongfeng commented on issue #7289: Implement `array_distinct` function

2023-08-22 Thread via GitHub
smallzhongfeng commented on issue #7289: URL: https://github.com/apache/arrow-datafusion/issues/7289#issuecomment-1689346970 Can you assign it to me, I am more interested. @izveigor :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] Light-City commented on a diff in pull request #37171: GH-37170: [C++] Support schema rewriting of RecordBatch.

2023-08-22 Thread via GitHub
Light-City commented on code in PR #37171: URL: https://github.com/apache/arrow/pull/37171#discussion_r1302524312 ## cpp/src/arrow/record_batch.cc: ## @@ -283,6 +283,25 @@ bool RecordBatch::ApproxEquals(const RecordBatch& other, const EqualOptions& opt return true; } +Sta

[GitHub] [arrow] Light-City commented on a diff in pull request #37171: GH-37170: [C++] Support schema rewriting of RecordBatch.

2023-08-22 Thread via GitHub
Light-City commented on code in PR #37171: URL: https://github.com/apache/arrow/pull/37171#discussion_r1302523577 ## cpp/src/arrow/record_batch.cc: ## @@ -283,6 +283,25 @@ bool RecordBatch::ApproxEquals(const RecordBatch& other, const EqualOptions& opt return true; } +Sta

[GitHub] [arrow] zinking commented on pull request #36704: GH-36703: [Java] Enable HDFS by default on Java Dataset module

2023-08-22 Thread via GitHub
zinking commented on PR #36704: URL: https://github.com/apache/arrow/pull/36704#issuecomment-1689332969 https://github.com/apache/arrow/issues/37323 @davisusanibar could this be pushed forward? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302509826 ## cpp/src/parquet/encryption/test_encryption_util.cc: ## @@ -509,4 +513,178 @@ void FileDecryptor::CheckFile(parquet::ParquetFileReader* file_reader, } } +void F

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302503911 ## cpp/src/parquet/encryption/test_encryption_util.cc: ## @@ -509,4 +513,178 @@ void FileDecryptor::CheckFile(parquet::ParquetFileReader* file_reader, } } +void F

[GitHub] [arrow-datafusion] spaydar commented on a diff in pull request #7362: DML documentation

2023-08-22 Thread via GitHub
spaydar commented on code in PR #7362: URL: https://github.com/apache/arrow-datafusion/pull/7362#discussion_r1302501616 ## docs/source/user-guide/sql/dml.md: ## @@ -0,0 +1,60 @@ + + +# DML + +## COPY + +Copy a table to file(s). Supported file formats are `parquet`, `csv`, and `

[GitHub] [arrow] wgtmac commented on a diff in pull request #36574: GH-34950: [C++][Parquet] Support encryption for page index

2023-08-22 Thread via GitHub
wgtmac commented on code in PR #36574: URL: https://github.com/apache/arrow/pull/36574#discussion_r1302499077 ## cpp/src/parquet/encryption/test_encryption_util.cc: ## @@ -509,4 +513,178 @@ void FileDecryptor::CheckFile(parquet::ParquetFileReader* file_reader, } } +void F

[GitHub] [arrow-datafusion] spaydar commented on a diff in pull request #7362: DML documentation

2023-08-22 Thread via GitHub
spaydar commented on code in PR #7362: URL: https://github.com/apache/arrow-datafusion/pull/7362#discussion_r1302497654 ## docs/source/user-guide/sql/dml.md: ## @@ -0,0 +1,60 @@ + + +# DML + +## COPY + +Copy a table to file(s). Supported file formats are `parquet`, `csv`, and `

[GitHub] [arrow] mapleFU commented on pull request #36967: GH-36924: [Java] support offset/length and filter in scan option

2023-08-22 Thread via GitHub
mapleFU commented on PR #36967: URL: https://github.com/apache/arrow/pull/36967#issuecomment-1689296443 @westonpace @pitrou would you mind take a look at this interface? it split a parquet file scanner by offset-length -- This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow] AlenkaF commented on a diff in pull request #35865: GH-35740: Add documentation for list arrays' values property

2023-08-22 Thread via GitHub
AlenkaF commented on code in PR #35865: URL: https://github.com/apache/arrow/pull/35865#discussion_r1302476258 ## python/pyarrow/array.pxi: ## @@ -2053,6 +2053,38 @@ cdef class ListArray(BaseListArray): @property def values(self): +""" +Return the und

[GitHub] [arrow] AlenkaF commented on a diff in pull request #35865: GH-35740: Add documentation for list arrays' values property

2023-08-22 Thread via GitHub
AlenkaF commented on code in PR #35865: URL: https://github.com/apache/arrow/pull/35865#discussion_r1302476258 ## python/pyarrow/array.pxi: ## @@ -2053,6 +2053,38 @@ cdef class ListArray(BaseListArray): @property def values(self): +""" +Return the und

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37272: GH-37266: [CI][C++] Use ARROW_CMAKE_ARGS not CMAKE_ARGS

2023-08-22 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37272: URL: https://github.com/apache/arrow/pull/37272#issuecomment-1689267864 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 9ddd8d5c52796f20cf619b8f43538b9d454fb9c0. There were no

[GitHub] [arrow-rs] JayjeetAtGithub opened a new issue, #4725: Support like operation on binary operands in arrow-string

2023-08-22 Thread via GitHub
JayjeetAtGithub opened a new issue, #4725: URL: https://github.com/apache/arrow-rs/issues/4725 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [arrow-datafusion] JayjeetAtGithub commented on issue #7342: Error: There isn't a common type to coerce Binary and Utf8 in LIKE expression

2023-08-22 Thread via GitHub
JayjeetAtGithub commented on issue #7342: URL: https://github.com/apache/arrow-datafusion/issues/7342#issuecomment-1689266808 I looked into this issue a little bit. Looks like there needs to be changes in `arrow-string` which is basically a part of `arrow-rs`. Specifically, I found out tha

[GitHub] [arrow] mapleFU commented on pull request #37264: GH-37268: [C++] adding move in some ctor in fs and dataset

2023-08-22 Thread via GitHub
mapleFU commented on PR #37264: URL: https://github.com/apache/arrow/pull/37264#issuecomment-1689259680 I've fixed the comment, would you mind take a look again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion] yjshen merged pull request #7329: Minor: add `WriteOp::name` and `DmlStatement::name`

2023-08-22 Thread via GitHub
yjshen merged PR #7329: URL: https://github.com/apache/arrow-datafusion/pull/7329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[GitHub] [arrow] mapleFU commented on a diff in pull request #37171: GH-37170: [C++] Support schema rewriting of RecordBatch.

2023-08-22 Thread via GitHub
mapleFU commented on code in PR #37171: URL: https://github.com/apache/arrow/pull/37171#discussion_r1302430191 ## cpp/src/arrow/record_batch.cc: ## @@ -283,6 +283,25 @@ bool RecordBatch::ApproxEquals(const RecordBatch& other, const EqualOptions& opt return true; } +Status

[GitHub] [arrow] mapleFU commented on a diff in pull request #36073: GH-36036: [C++][Python][Parquet] Implement Float16 logical type

2023-08-22 Thread via GitHub
mapleFU commented on code in PR #36073: URL: https://github.com/apache/arrow/pull/36073#discussion_r1302427543 ## cpp/src/parquet/column_writer.cc: ## @@ -2305,6 +2307,74 @@ struct SerializeFunctor< int64_t* scratch; }; +// -

[GitHub] [arrow-datafusion] nseekhao opened a new pull request, #7382: Add ROLLUP and GROUPING SETS support

2023-08-22 Thread via GitHub
nseekhao opened a new pull request, #7382: URL: https://github.com/apache/arrow-datafusion/pull/7382 ## Which issue does this PR close? Closes #7381 . ## Rationale for this change To add support for aggregation with `ROLLUP` and `GROUPING SETS`. ## What

[GitHub] [arrow-datafusion] jiangzhx commented on issue #7380: Arrays with elements other than literal are not supported

2023-08-22 Thread via GitHub
jiangzhx commented on issue #7380: URL: https://github.com/apache/arrow-datafusion/issues/7380#issuecomment-1689204698 I found another way to make this work. select make_array(case when 1>0 then true else false end,case when 2>0 then true else false end); -- This is an automated m

[GitHub] [arrow] kou commented on issue #37296: [Dev][Release] Verification script fails detecting CUDA

2023-08-22 Thread via GitHub
kou commented on issue #37296: URL: https://github.com/apache/arrow/issues/37296#issuecomment-1689198643 Do you have `nvcc`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] mapleFU commented on issue #31678: PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing _metadata file

2023-08-22 Thread via GitHub
mapleFU commented on issue #31678: URL: https://github.com/apache/arrow/issues/31678#issuecomment-1689196976 ``` >>> metadata_collector[0].schema required group field_id=-1 schema { optional int64 field_id=-1 A; optional binary field_id=-1 B (String); } >>> metad

[GitHub] [arrow-datafusion] nseekhao opened a new issue, #7381: Substrait: ROLL UP and GROUPING SETS support

2023-08-22 Thread via GitHub
nseekhao opened a new issue, #7381: URL: https://github.com/apache/arrow-datafusion/issues/7381 ### Is your feature request related to a problem or challenge? The Substrait producer currently throws an error if `ROLL UP` or `GROUPING SETS` is used in the query. ### Describe the

[GitHub] [arrow] wgtmac commented on pull request #36519: GH-36518: [Java] Fix ArrowFlightJdbcTimeStampVectorAccessor to return Timestamp objects with date and time that corresponds with local time in

2023-08-22 Thread via GitHub
wgtmac commented on PR #36519: URL: https://github.com/apache/arrow/pull/36519#issuecomment-1689190499 Thanks, I agree with you @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] kou merged pull request #37315: GH-37290: [MATLAB] Add `arrow.array.Time32Array` class

2023-08-22 Thread via GitHub
kou merged PR #37315: URL: https://github.com/apache/arrow/pull/37315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] mhkeller commented on issue #35041: [JavaScript] How to write an arrow file in Node.JS from an IPC stream?

2023-08-22 Thread via GitHub
mhkeller commented on issue #35041: URL: https://github.com/apache/arrow/issues/35041#issuecomment-1689171844 Is there any update on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-datafusion] jiangzhx commented on issue #7380: Arrays with elements other than literal are not supported

2023-08-22 Thread via GitHub
jiangzhx commented on issue #7380: URL: https://github.com/apache/arrow-datafusion/issues/7380#issuecomment-1689169251 I'm not sure if this feature should be included in the discussion on the following issue: https://github.com/apache/arrow-datafusion/issues/6980. -- This is an autom

[GitHub] [arrow-datafusion] jiangzhx opened a new issue, #7380: Arrays with elements other than literal are not supported

2023-08-22 Thread via GitHub
jiangzhx opened a new issue, #7380: URL: https://github.com/apache/arrow-datafusion/issues/7380 ### Is your feature request related to a problem or challenge? Using the CASE WHEN statement in ARRAY. `select [case when col1>0 then true else false end,case when col1>0 then true e

[GitHub] [arrow] github-actions[bot] commented on pull request #37321: GH-37320: [C++] Feature: support is distinct and is not distinct expression.

2023-08-22 Thread via GitHub
github-actions[bot] commented on PR #37321: URL: https://github.com/apache/arrow/pull/37321#issuecomment-1689163906 :warning: GitHub issue #37320 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] Light-City opened a new pull request, #37321: GH-37320: [C++] Feature: support is distinct and is not distinct expression.

2023-08-22 Thread via GitHub
Light-City opened a new pull request, #37321: URL: https://github.com/apache/arrow/pull/37321 ### Rationale for this change Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> N

[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #7192: Create a Priority Queue based Aggregation with `limit`

2023-08-22 Thread via GitHub
avantgardnerio commented on PR #7192: URL: https://github.com/apache/arrow-datafusion/pull/7192#issuecomment-1689158528 > Reported performance results I'd like to reiterate that this PR is really about using constant memory (which it does), not increasing throughput, but here's some

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #7364: Projection Order Propagation

2023-08-22 Thread via GitHub
ozankabak commented on code in PR #7364: URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1302364213 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -410,3 +410,38 @@ SELECT DISTINCT time as "first_seen" FROM t ORDER BY 1; ## Cleanup statement ok d

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36977: GH-36240: [Python] Refactor CumulativeSumOptions to a separate class for independent deprecation

2023-08-22 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #36977: URL: https://github.com/apache/arrow/pull/36977#issuecomment-1689147489 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit fe750ed10531c47131b447397e67486656cf8135. There were no

[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #7192: Create a Priority Queue based Aggregation with `limit`

2023-08-22 Thread via GitHub
avantgardnerio commented on PR #7192: URL: https://github.com/apache/arrow-datafusion/pull/7192#issuecomment-1689144857 > Tests for the optimizer pass @alamb this bothered me as well. Would you be able to direct me to the most exemplary test to reference? -- This is an automated m

[GitHub] [arrow] zinking commented on issue #37005: Dataset JNI bridge for rust target

2023-08-22 Thread via GitHub
zinking commented on issue #37005: URL: https://github.com/apache/arrow/issues/37005#issuecomment-1689137038 > Sorry, I deleted the comment I just posted. You mean call into Arrow Rust _from_ Java, right? that's correct. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #280: perf: Improved Bit (Un)packing Performance

2023-08-22 Thread via GitHub
paleolimbot commented on PR #280: URL: https://github.com/apache/arrow-nanoarrow/pull/280#issuecomment-1689124104 Yes, I'm on M1. If I change the unpacking to a macro, I get 3x faster unpacking (and no difference between shift/no shift): ```c #define ARROW_BITS_UNPACK1(word,

[GitHub] [arrow] kou commented on a diff in pull request #37238: GH-37237: [C++] Set extraction time to all downloaded contents timestamp

2023-08-22 Thread via GitHub
kou commented on code in PR #37238: URL: https://github.com/apache/arrow/pull/37238#discussion_r1302344970 ## cpp/CMakeLists.txt: ## @@ -18,35 +18,55 @@ cmake_minimum_required(VERSION 3.16) message(STATUS "Building using CMake version: ${CMAKE_VERSION}") -# Compiler id for A

[GitHub] [arrow] kou merged pull request #37258: GH-37257: [Ruby][FlightSQL] Use the same options for auto prepared statement close request

2023-08-22 Thread via GitHub
kou merged PR #37258: URL: https://github.com/apache/arrow/pull/37258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #37258: GH-37257: [Ruby][FlightSQL] Use the same options for auto prepared statement close request

2023-08-22 Thread via GitHub
kou commented on PR #37258: URL: https://github.com/apache/arrow/pull/37258#issuecomment-1689118588 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-datafusion] parkma99 commented on pull request #7350: feat: The "character_length" function handle "Binary" type

2023-08-22 Thread via GitHub
parkma99 commented on PR #7350: URL: https://github.com/apache/arrow-datafusion/pull/7350#issuecomment-1689114310 > Perhaps we could implement this as part of the coercion rules as opposed to internal to the evaluation logic? See coerce_arguments_for_fun perhaps? Thank you, it looks

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
lidavidm commented on code in PR #989: URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302336577 ## python/adbc_driver_manager/adbc_driver_manager/dbapi.py: ## @@ -973,7 +1012,7 @@ def fetchone(self) -> Optional[tuple]: self.rownumber += 1 retur

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
lidavidm commented on code in PR #989: URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302335936 ## python/adbc_driver_manager/adbc_driver_manager/dbapi.py: ## @@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame": ) return self._res

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
lidavidm commented on code in PR #989: URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302335724 ## python/adbc_driver_manager/adbc_driver_manager/dbapi.py: ## @@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame": ) return self._res

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
lidavidm commented on code in PR #989: URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302335522 ## python/adbc_driver_manager/adbc_driver_manager/dbapi.py: ## @@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame": ) return self._res

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
lidavidm commented on code in PR #989: URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302334457 ## python/adbc_driver_manager/adbc_driver_manager/dbapi.py: ## @@ -973,7 +1012,7 @@ def fetchone(self) -> Optional[tuple]: self.rownumber += 1 retur

[GitHub] [arrow] lidavidm commented on issue #37318: pyarrow.ChunkedArray.combine_chunks is slow

2023-08-22 Thread via GitHub
lidavidm commented on issue #37318: URL: https://github.com/apache/arrow/issues/37318#issuecomment-1689106759 Hmm, interesting. Ideally on the C++ side we would change concat_arrays to not allocate a new array if there's only one chunk. But that would actually be a breaking change, since th

[GitHub] [arrow] R-JunmingChen commented on a diff in pull request #37100: GH-36831: [C++] DictionaryArray support for MinMax Function

2023-08-22 Thread via GitHub
R-JunmingChen commented on code in PR #37100: URL: https://github.com/apache/arrow/pull/37100#discussion_r1302332197 ## cpp/src/arrow/compute/kernels/aggregate_basic.cc: ## @@ -492,11 +492,24 @@ Result> MinMaxInit(KernelContext* ctx, return visitor.Create(); } +namespace

[GitHub] [arrow-datafusion] 2010YOUY01 commented on pull request #7337: feat: Implement quantile_cont()/quantile_disc() aggregate functions

2023-08-22 Thread via GitHub
2010YOUY01 commented on PR #7337: URL: https://github.com/apache/arrow-datafusion/pull/7337#issuecomment-1689072998 > # Does this belong in Datafusion core? Or does it belong as an add on? > With this level of specialization required, I wonder where shall we stop adding built in aggregat

[GitHub] [arrow-adbc] ywc88 opened a new pull request, #989: feat(python/adbc_driver_manager): add fetch_record_batch

2023-08-22 Thread via GitHub
ywc88 opened a new pull request, #989: URL: https://github.com/apache/arrow-adbc/pull/989 Fixes #968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

[GitHub] [arrow-datafusion] 2010YOUY01 commented on pull request #7337: feat: Implement quantile_cont()/quantile_disc() aggregate functions

2023-08-22 Thread via GitHub
2010YOUY01 commented on PR #7337: URL: https://github.com/apache/arrow-datafusion/pull/7337#issuecomment-1689064579 > Thank you @2010YOUY01 . This PR, as all your others, is well written, documented and tested and is easy to read and understand. Thank you so much. > > # Sorting >

[GitHub] [arrow] spenczar commented on pull request #35865: GH-35740: Add documentation for list arrays' values property

2023-08-22 Thread via GitHub
spenczar commented on PR #35865: URL: https://github.com/apache/arrow/pull/35865#issuecomment-1689059622 Is there anything I can do to get this merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #37319: GH-37318: [Python]: Optimize combine_chunks when there is only one chunk

2023-08-22 Thread via GitHub
github-actions[bot] commented on PR #37319: URL: https://github.com/apache/arrow/pull/37319#issuecomment-1689054222 :warning: GitHub issue #37318 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] spenczar opened a new pull request, #37319: GH-37318: [Python]: Optimize combine_chunks when there is only one chunk

2023-08-22 Thread via GitHub
spenczar opened a new pull request, #37319: URL: https://github.com/apache/arrow/pull/37319 ### Rationale for this change The associated issue explains the rationale. I'd love to add benchmarks, but don't really know how; Arrow's benchmark system is pretty daunting for adding a

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37275: GH-37273: [C++] Bump vendored xxhash version

2023-08-22 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37275: URL: https://github.com/apache/arrow/pull/37275#issuecomment-1689043221 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 369bb318b016e26db1c9933418a8855975eeab01. There were no

[GitHub] [arrow-datafusion] wiedld commented on a diff in pull request #7379: feat(7181): cascading loser tree merges

2023-08-22 Thread via GitHub
wiedld commented on code in PR #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379#discussion_r1302269471 ## datafusion/core/src/physical_plan/sorts/streaming_merge.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

[GitHub] [arrow-datafusion] alamb commented on pull request #7355: Change error type of invalid argument to PlanError rather than InternalError, remove misleading comments

2023-08-22 Thread via GitHub
alamb commented on PR #7355: URL: https://github.com/apache/arrow-datafusion/pull/7355#issuecomment-1689036656 Thank you @DDtKey ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow-datafusion] wiedld commented on a diff in pull request #7379: feat(7181): cascading loser tree merges

2023-08-22 Thread via GitHub
wiedld commented on code in PR #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379#discussion_r1302270314 ## datafusion/core/src/physical_plan/sorts/merge.rs: ## @@ -15,95 +15,20 @@ // specific language governing permissions and limitations // under the License.

[GitHub] [arrow-datafusion] wiedld commented on a diff in pull request #7379: feat(7181): cascading loser tree merges

2023-08-22 Thread via GitHub
wiedld commented on code in PR #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379#discussion_r1302269471 ## datafusion/core/src/physical_plan/sorts/streaming_merge.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

[GitHub] [arrow-datafusion] wiedld commented on a diff in pull request #7379: feat(7181): cascading loser tree merges

2023-08-22 Thread via GitHub
wiedld commented on code in PR #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379#discussion_r1302264850 ## datafusion/core/src/physical_plan/sorts/cursor.rs: ## @@ -99,6 +100,16 @@ pub trait Cursor: Ord { /// Advance the cursor, returning the previous row i

[GitHub] [arrow-datafusion] wiedld opened a new pull request, #7379: feat(7181): cascading loser tree merges

2023-08-22 Thread via GitHub
wiedld opened a new pull request, #7379: URL: https://github.com/apache/arrow-datafusion/pull/7379 **WIP: have a few optimizations todo, including those noted in this code.** ## Which issue does this PR close? External sorting (cascading merges) of the internal-sorted (in-memory

[GitHub] [arrow] github-actions[bot] commented on pull request #37255: GH-37254: [Python] Parametrize all pickling tests to use both the pickle and cloudpickle modules

2023-08-22 Thread via GitHub
github-actions[bot] commented on PR #37255: URL: https://github.com/apache/arrow/pull/37255#issuecomment-1689012808 Revision: 8eb925d16c808109b174ca1e82e53e6e4f87b06b Submitted crossbow builds: [ursacomputing/crossbow @ actions-a1c4d5ead7](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] danepitkin commented on pull request #37255: GH-37254: [Python] Parametrize all pickling tests to use both the pickle and cloudpickle modules

2023-08-22 Thread via GitHub
danepitkin commented on PR #37255: URL: https://github.com/apache/arrow/pull/37255#issuecomment-1689011035 @github-actions crossbow submit test-conda-python-3.10-hdfs* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow] github-actions[bot] commented on pull request #37317: GH-37310: [Python][CI] Enable warnings in PyArrow pytests

2023-08-22 Thread via GitHub
github-actions[bot] commented on PR #37317: URL: https://github.com/apache/arrow/pull/37317#issuecomment-1688996632 :warning: GitHub issue #37310 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] danepitkin opened a new pull request, #37317: GH-37310: [Python][CI] Enable warnings in PyArrow pytests

2023-08-22 Thread via GitHub
danepitkin opened a new pull request, #37317: URL: https://github.com/apache/arrow/pull/37317 ### Rationale for this change Warnings are enabled for some nightly jobs, but not for CI jobs. This is not helpful since devs typically rely on the CI job as part of the PR process. ##

[GitHub] [arrow] github-actions[bot] commented on pull request #37255: GH-37254: [Python] Parametrize all pickling tests to use both the pickle and cloudpickle modules

2023-08-22 Thread via GitHub
github-actions[bot] commented on PR #37255: URL: https://github.com/apache/arrow/pull/37255#issuecomment-1688948279 Revision: 7710a866a309148304da9873f77c5f0b7637cc33 Submitted crossbow builds: [ursacomputing/crossbow @ actions-c53268ddd6](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] danepitkin commented on pull request #37255: GH-37254: [Python] Parametrize all pickling tests to use both the pickle and cloudpickle modules

2023-08-22 Thread via GitHub
danepitkin commented on PR #37255: URL: https://github.com/apache/arrow/pull/37255#issuecomment-1688943523 @github-actions crossbow submit *python* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] Fokko commented on issue #37219: [Python] Nullability not maintained when casting a column

2023-08-22 Thread via GitHub
Fokko commented on issue #37219: URL: https://github.com/apache/arrow/issues/37219#issuecomment-1688941267 Thanks @bkietz for the context and pointer. I was digging into the code, but was unable to see when the loop is actually executed. It seems that the size of `exprs` is always zero. I w

[GitHub] [arrow-datafusion] viirya commented on a diff in pull request #7378: Fix IN expr for NaN

2023-08-22 Thread via GitHub
viirya commented on code in PR #7378: URL: https://github.com/apache/arrow-datafusion/pull/7378#discussion_r1302200666 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -94,7 +94,7 @@ impl Set for ArraySet where T: Array + 'static, for<'a> &'a T: ArrayAcce

[GitHub] [arrow] kou commented on a diff in pull request #37311: GH-37308: [C++][Docs] Change name for CPP tutorial and minor fixes to the job

2023-08-22 Thread via GitHub
kou commented on code in PR #37311: URL: https://github.com/apache/arrow/pull/37311#discussion_r1302186757 ## cpp/examples/tutorial_examples/CMakeLists.txt: ## @@ -23,6 +23,7 @@ find_package(Arrow REQUIRED) get_filename_component(ARROW_CONFIG_PATH ${Arrow_CONFIG} DIRECTORY)

[GitHub] [arrow] zeroshade commented on pull request #37174: GH-37173: [C++][Format] C-export/import Run-End Encoded Arrays

2023-08-22 Thread via GitHub
zeroshade commented on PR #37174: URL: https://github.com/apache/arrow/pull/37174#issuecomment-1688930246 @felipecrv I've pushed the Go implementation for REE with c-export/import -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow-datafusion] sarutak commented on a diff in pull request #7378: Fix IN expr for NaN

2023-08-22 Thread via GitHub
sarutak commented on code in PR #7378: URL: https://github.com/apache/arrow-datafusion/pull/7378#discussion_r1302178468 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -609,50 +643,100 @@ mod tests { #[test] fn in_list_float64() -> Result<()> { l

[GitHub] [arrow] kou commented on a diff in pull request #37301: GH-35167: [Docs][C++] Updated arguments count in example for TableReader

2023-08-22 Thread via GitHub
kou commented on code in PR #37301: URL: https://github.com/apache/arrow/pull/37301#discussion_r1302178475 ## docs/source/cpp/json.rst: ## @@ -58,9 +58,8 @@ the output table. // Instantiate TableReader from input stream and options std::shared_ptr reader; Review

[GitHub] [arrow] kou commented on a diff in pull request #37301: GH-35167: [Docs][C++] Updated arguments count in example for TableReader

2023-08-22 Thread via GitHub
kou commented on code in PR #37301: URL: https://github.com/apache/arrow/pull/37301#discussion_r1302178475 ## docs/source/cpp/json.rst: ## @@ -58,9 +58,8 @@ the output table. // Instantiate TableReader from input stream and options std::shared_ptr reader; Review

[GitHub] [arrow-datafusion] sarutak commented on a diff in pull request #7378: Fix IN expr for NaN

2023-08-22 Thread via GitHub
sarutak commented on code in PR #7378: URL: https://github.com/apache/arrow-datafusion/pull/7378#discussion_r1302178468 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -609,50 +643,100 @@ mod tests { #[test] fn in_list_float64() -> Result<()> { l

[GitHub] [arrow-datafusion] sarutak opened a new pull request, #7378: Fix IN expr for NaN

2023-08-22 Thread via GitHub
sarutak opened a new pull request, #7378: URL: https://github.com/apache/arrow-datafusion/pull/7378 ## Which issue does this PR close? Closes #7377 ## Rationale for this change This PR fixes an issue that `'NaN'::double in ('NaN'::double)` is evaluated as `false`, which is inco

[GitHub] [arrow] legout commented on issue #31678: PyArrow: RuntimeError: AppendRowGroups requires equal schemas when writing _metadata file

2023-08-22 Thread via GitHub
legout commented on issue #31678: URL: https://github.com/apache/arrow/issues/31678#issuecomment-1688905002 Create toy dataset with parquet files having identical column types, but different column ordering. ```python import os import tempfile import pyarrow as pa impo

[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #37285: MINOR: [C#] Bump K4os.Compression.LZ4.Streams from 1.3.5 to 1.3.6 in /csharp

2023-08-22 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37285: URL: https://github.com/apache/arrow/pull/37285#issuecomment-1688897859 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 9ecd0f2a5fb76cca859269a6ff13eaf315abac62. There were no

[GitHub] [arrow] kou merged pull request #37300: GH-37299: [C++] Fix clang-format version mismatch error with Homebrew's clang-format

2023-08-22 Thread via GitHub
kou merged PR #37300: URL: https://github.com/apache/arrow/pull/37300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-datafusion] sarutak opened a new issue, #7377: IN expr doesn't work properly for NaN

2023-08-22 Thread via GitHub
sarutak opened a new issue, #7377: URL: https://github.com/apache/arrow-datafusion/issues/7377 ### Describe the bug Given the following query. ``` SELECT 'NAN'::double in ('NAN'::double); ``` I expected the result is `true` but the actual is `false`. It's not inconsisten

[GitHub] [arrow-nanoarrow] jorisvandenbossche commented on pull request #280: perf: Improved Bit (Un)packing Performance

2023-08-22 Thread via GitHub
jorisvandenbossche commented on PR #280: URL: https://github.com/apache/arrow-nanoarrow/pull/280#issuecomment-1688896563 (I am on ubuntu / intel cpu) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #7376: Specialize Median Accumulator

2023-08-22 Thread via GitHub
Dandandan commented on code in PR #7376: URL: https://github.com/apache/arrow-datafusion/pull/7376#discussion_r1302163472 ## datafusion/physical-expr/src/aggregate/median.rs: ## @@ -106,159 +126,75 @@ impl PartialEq for Median { } } -#[derive(Debug)] /// The median accu

[GitHub] [arrow] sgilmore10 commented on a diff in pull request #37315: GH-37290: [MATLAB] Add `arrow.array.Time32Array` class

2023-08-22 Thread via GitHub
sgilmore10 commented on code in PR #37315: URL: https://github.com/apache/arrow/pull/37315#discussion_r1302141482 ## matlab/src/matlab/+arrow/+array/Time32Array.m: ## @@ -0,0 +1,84 @@ +% arrow.array.Time32Array + +% Licensed to the Apache Software Foundation (ASF) under one or m

[GitHub] [arrow] rsm-23 commented on pull request #37301: GH-35167: [Doc][C++] Updated arguments count in example for TableReader

2023-08-22 Thread via GitHub
rsm-23 commented on PR #37301: URL: https://github.com/apache/arrow/pull/37301#issuecomment-1688850144 @wjones127 resolved all the comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #280: perf: Improved Bit (Un)packing Performance

2023-08-22 Thread via GitHub
paleolimbot commented on PR #280: URL: https://github.com/apache/arrow-nanoarrow/pull/280#issuecomment-1688849511 FWIW I also had to compile slightly differently because I got a bunch of missing symbol errors. ``` gcc -O3 -Wall -Werror -shared -fPIC \ -I$(python -c "import s

[GitHub] [arrow] sgilmore10 commented on a diff in pull request #37315: GH-37290: [MATLAB] Add `arrow.array.Time32Array` class

2023-08-22 Thread via GitHub
sgilmore10 commented on code in PR #37315: URL: https://github.com/apache/arrow/pull/37315#discussion_r1302132211 ## matlab/src/cpp/arrow/matlab/array/proxy/time32_array.cc: ## @@ -0,0 +1,62 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #7362: DML documentation

2023-08-22 Thread via GitHub
Dandandan commented on code in PR #7362: URL: https://github.com/apache/arrow-datafusion/pull/7362#discussion_r1302125394 ## docs/source/user-guide/sql/dml.md: ## @@ -0,0 +1,60 @@ + + +# DML + +## COPY + +Copy a table to file(s). Supported file formats are `parquet`, `csv`, and

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #7362: DML documentation

2023-08-22 Thread via GitHub
Dandandan commented on code in PR #7362: URL: https://github.com/apache/arrow-datafusion/pull/7362#discussion_r1302124882 ## docs/source/user-guide/sql/dml.md: ## @@ -0,0 +1,60 @@ + + +# DML + +## COPY + +Copy a table to file(s). Supported file formats are `parquet`, `csv`, and

[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #280: perf: Improved Bit (Un)packing Performance

2023-08-22 Thread via GitHub
paleolimbot commented on PR #280: URL: https://github.com/apache/arrow-nanoarrow/pull/280#issuecomment-1688836990 The difference is more subtle for me on packing (but more pronounced for packing)...I'm game! Can you add a comment above each hard-coded shift and explain that it was do

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #7362: DML documentation

2023-08-22 Thread via GitHub
Dandandan commented on code in PR #7362: URL: https://github.com/apache/arrow-datafusion/pull/7362#discussion_r1302124087 ## docs/source/user-guide/sql/dml.md: ## @@ -0,0 +1,60 @@ + + +# DML + +## COPY + +Copy a table to file(s). Supported file formats are `parquet`, `csv`, and

[GitHub] [arrow] kevingurney merged pull request #37316: GH-37253: [MATLAB] Add test cases which verify that the `NumFields`, `BitWidth`, and `ID` properties can not be modified to `hFixedWidth` test

2023-08-22 Thread via GitHub
kevingurney merged PR #37316: URL: https://github.com/apache/arrow/pull/37316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

[GitHub] [arrow] kevingurney commented on pull request #37316: GH-37253: [MATLAB] Add test cases which verify that the `NumFields`, `BitWidth`, and `ID` properties can not be modified to `hFixedWidth`

2023-08-22 Thread via GitHub
kevingurney commented on PR #37316: URL: https://github.com/apache/arrow/pull/37316#issuecomment-1688831897 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #7364: Projection Order Propagation

2023-08-22 Thread via GitHub
Dandandan commented on code in PR #7364: URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1302119255 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -410,3 +410,38 @@ SELECT DISTINCT time as "first_seen" FROM t ORDER BY 1; ## Cleanup statement ok d

[GitHub] [arrow-adbc] lidavidm commented on pull request #988: docs: pin furo version

2023-08-22 Thread via GitHub
lidavidm commented on PR #988: URL: https://github.com/apache/arrow-adbc/pull/988#issuecomment-1688825412 I subscribed to the issue for making Breathe compatible with Sphinx 7. Once that passes, I'll unpin this and bump the minimum Sphinx version. (That said, I've been seeing colleagues eva

  1   2   3   4   5   >