[GitHub] [arrow] kou commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
kou commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048123988 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -2203,6 +2214,51 @@ if(ARROW_WITH_RAPIDJSON) endif() endif() +macro(build_qpl) + message(STATUS "Building QPL f

[GitHub] [arrow] github-actions[bot] commented on pull request #14944: GH-14943: [Python] Fix pyarrow.get_libraries() order

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14944: URL: https://github.com/apache/arrow/pull/14944#issuecomment-1350578429 :warning: GitHub issue #14943 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #14944: GH-14943: [Python] Fix pyarrow.get_libraries() order

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14944: URL: https://github.com/apache/arrow/pull/14944#issuecomment-1350578455 :warning: GitHub issue #14943 **has no components**, please add labels for components. -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow] github-actions[bot] commented on pull request #14944: GH-14943: [Python] Fix pyarrow.get_libraries() order

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14944: URL: https://github.com/apache/arrow/pull/14944#issuecomment-1350578366 * Closes: #14943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] kou opened a new pull request, #14944: GH-14943: [Python] Fix pyarrow.get_libraries() order

2022-12-13 Thread GitBox
kou opened a new pull request, #14944: URL: https://github.com/apache/arrow/pull/14944 pyarrow.get_libraries() returns ['arrow', 'arrow_python'] but it should be ['arrow_python', 'arrow'] because libarrow_python.so depends on libarrow.so. -- This is an automated message from the Apache Gi

[GitHub] [arrow] yaqi-zhao commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048119591 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -2203,6 +2214,51 @@ if(ARROW_WITH_RAPIDJSON) endif() endif() +macro(build_qpl) + message(STATUS "Building

[GitHub] [arrow] kou commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
kou commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048117411 ## cpp/src/arrow/util/bit_stream_utils.h: ## @@ -398,6 +412,27 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) { return batch_size; } +#ifdef A

[GitHub] [arrow] kou commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
kou commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048116742 ## cpp/src/arrow/util/bit_stream_utils.h: ## @@ -398,6 +412,27 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) { return batch_size; } +#ifdef A

[GitHub] [arrow-datafusion] mingmwang opened a new issue, #4610: Should not allow Window expressions in Filter or Having clause

2022-12-13 Thread GitBox
mingmwang opened a new issue, #4610: URL: https://github.com/apache/arrow-datafusion/issues/4610 **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: **Expected behavior** A clear and concise descript

[GitHub] [arrow] kou commented on pull request #14235: ARROW-17692: [R] Add support for building with system AWS SDK C++

2022-12-13 Thread GitBox
kou commented on PR #14235: URL: https://github.com/apache/arrow/pull/14235#issuecomment-1350558076 > (how many people are building aws-sdk-cpp separately from source--it's not available in package managers so that's what this entails, right?--and then trying to install the R package?)

[GitHub] [arrow] yaqi-zhao commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048109312 ## cpp/src/arrow/util/bit_stream_utils.h: ## @@ -398,6 +412,27 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) { return batch_size; } +#i

[GitHub] [arrow] yaqi-zhao commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048109312 ## cpp/src/arrow/util/bit_stream_utils.h: ## @@ -398,6 +412,27 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) { return batch_size; } +#i

[GitHub] [arrow] yaqi-zhao commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048109312 ## cpp/src/arrow/util/bit_stream_utils.h: ## @@ -398,6 +412,27 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) { return batch_size; } +#i

[GitHub] [arrow-ballista] r4ntix opened a new pull request, #567: Support Alibaba Cloud OSS with ObjectStore

2022-12-13 Thread GitBox
r4ntix opened a new pull request, #567: URL: https://github.com/apache/arrow-ballista/pull/567 # Which issue does this PR close? Closes #566 # Rationale for this change See #566 # What changes are included in this PR? * update object_store version to "0.5.2"

[GitHub] [arrow] kou commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
kou commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1048070297 ## cpp/src/arrow/util/qpl_job_pool.cc: ## @@ -0,0 +1,122 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

[GitHub] [arrow-ballista] r4ntix opened a new issue, #566: Support Alibaba Cloud OSS with ObjectStore

2022-12-13 Thread GitBox
r4ntix opened a new issue, #566: URL: https://github.com/apache/arrow-ballista/issues/566 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The current version 0.5.2 of `object_store` already supports Alibaba Cloud OSS, use S3 compa

[GitHub] [arrow-datafusion] metesynnada commented on issue #4603: [window function] support min max with self define sliding window.

2022-12-13 Thread GitBox
metesynnada commented on issue #4603: URL: https://github.com/apache/arrow-datafusion/issues/4603#issuecomment-1350488338 cc @mustafasrepo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] ursabot commented on pull request #14832: GH-14828: [CI][Conda] Sync with conda-forge, fix nightly jobs

2022-12-13 Thread GitBox
ursabot commented on PR #14832: URL: https://github.com/apache/arrow/pull/14832#issuecomment-1350480043 Benchmark runs are scheduled for baseline = 3e2a2242d547c270785aecab757e55ff504ec8ba and contender = 16d0eb4dd25fdeb2229b7cd845ccfd0dc54a1c73. 16d0eb4dd25fdeb2229b7cd845ccfd0dc54a1c73 is

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #4586: Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange

2022-12-13 Thread GitBox
Dandandan commented on code in PR #4586: URL: https://github.com/apache/arrow-datafusion/pull/4586#discussion_r1048066022 ## datafusion/core/tests/sql/joins.rs: ## @@ -2040,8 +2040,8 @@ async fn left_semi_join() -> Result<()> { let physical_plan = state.create_physical_

[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

2022-12-13 Thread GitBox
emkornfield commented on issue #14748: URL: https://github.com/apache/arrow/issues/14748#issuecomment-1350476049 Apologies for the late reply. > So, we need to update all the doc or comments mentioned "ownership" of a returned shared_ptr, right? Yes, I believe so. Would need t

[GitHub] [arrow] yaqi-zhao closed pull request #14217: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao closed pull request #14217: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode URL: https://github.com/apache/arrow/pull/14217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] wgtmac commented on pull request #14942: ARROW-18435: [C++][Java] Update ORC to 1.8.1

2022-12-13 Thread GitBox
wgtmac commented on PR #14942: URL: https://github.com/apache/arrow/pull/14942#issuecomment-1350473330 @kou Can you please take a look when you have time? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] yaqi-zhao commented on pull request #14217: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on PR #14217: URL: https://github.com/apache/arrow/pull/14217#issuecomment-1350471436 > Can we close this in favor of #14585 ? @kou Sure. I'll close this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kou commented on pull request #14217: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
kou commented on PR #14217: URL: https://github.com/apache/arrow/pull/14217#issuecomment-1350467854 Can we close this in favor of #14585 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] yaqi-zhao commented on pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-12-13 Thread GitBox
yaqi-zhao commented on PR #14585: URL: https://github.com/apache/arrow/pull/14585#issuecomment-1350460281 @kou Thanks for your comments and the code have been updated. Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] kou merged pull request #14941: MINOR: [Docs] Fix a typo and remove a duplicated item

2022-12-13 Thread GitBox
kou merged PR #14941: URL: https://github.com/apache/arrow/pull/14941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] emkornfield commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-12-13 Thread GitBox
emkornfield commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1048052988 ## cpp/src/parquet/column_reader.h: ## @@ -55,6 +56,29 @@ static constexpr uint32_t kDefaultMaxPageHeaderSize = 16 * 1024 * 1024; // 16 KB is the default expected

[GitHub] [arrow] emkornfield commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-12-13 Thread GitBox
emkornfield commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1048052298 ## cpp/src/parquet/column_reader.h: ## @@ -115,11 +141,27 @@ class PARQUET_EXPORT PageReader { bool always_compressed = f

[GitHub] [arrow] emkornfield commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-12-13 Thread GitBox
emkornfield commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1048051811 ## cpp/src/parquet/column_reader.h: ## @@ -55,6 +56,29 @@ static constexpr uint32_t kDefaultMaxPageHeaderSize = 16 * 1024 * 1024; // 16 KB is the default expected

[GitHub] [arrow] emkornfield commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-12-13 Thread GitBox
emkornfield commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1048051616 ## cpp/src/parquet/column_reader.h: ## @@ -55,6 +56,29 @@ static constexpr uint32_t kDefaultMaxPageHeaderSize = 16 * 1024 * 1024; // 16 KB is the default expected

[GitHub] [arrow] emkornfield commented on pull request #14803: ARROW-18420: [C++][Parquet] Introduce ColumnIndex & OffsetIndex

2022-12-13 Thread GitBox
emkornfield commented on PR #14803: URL: https://github.com/apache/arrow/pull/14803#issuecomment-1350438573 CC @fatemehp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] github-actions[bot] commented on pull request #14942: ARROW-18435: [C++][Java] Update ORC to 1.8.1

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14942: URL: https://github.com/apache/arrow/pull/14942#issuecomment-1350421364 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #14942: ARROW-18435: [C++][Java] Update ORC to 1.8.1

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14942: URL: https://github.com/apache/arrow/pull/14942#issuecomment-1350421353 https://issues.apache.org/jira/browse/ARROW-18435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] wgtmac opened a new pull request, #14942: ARROW-18435: [C++][Java] Update ORC to 1.8.1

2022-12-13 Thread GitBox
wgtmac opened a new pull request, #14942: URL: https://github.com/apache/arrow/pull/14942 This PR aims to upgrade ORC to version 1.8.1. Apache ORC 1.8.1 is the most recent release: https://github.com/apache/orc/releases/tag/v1.8.1 https://orc.apache.org/news/2022/12/02/ORC-1.8.1/

[GitHub] [arrow-adbc] kou commented on a diff in pull request #174: chore: set up release process

2022-12-13 Thread GitBox
kou commented on code in PR #174: URL: https://github.com/apache/arrow-adbc/pull/174#discussion_r1048023024 ## .gitignore: ## @@ -15,9 +15,16 @@ # specific language governing permissions and limitations # under the License. +# Release artifacts +adbc-*.tar.gz Review Comment

[GitHub] [arrow-adbc] kou commented on pull request #218: chore: publish docs to website

2022-12-13 Thread GitBox
kou commented on PR #218: URL: https://github.com/apache/arrow-adbc/pull/218#issuecomment-1350391223 We can use apache/arrow-adbc for the "redirect" approach because we just need to put a simple HTML. We don't need to use the same design in https://arrow.apache.org/ for the approach. --

[GitHub] [arrow] tuziershi commented on issue #14924: [GO] does it support repeated fields read?

2022-12-13 Thread GitBox
tuziershi commented on issue #14924: URL: https://github.com/apache/arrow/issues/14924#issuecomment-1350387551 @drin @zeroshade Thanks! it is very helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] jackwener commented on pull request #4599: add `try_optimize()` for all rules.

2022-12-13 Thread GitBox
jackwener commented on PR #4599: URL: https://github.com/apache/arrow-datafusion/pull/4599#issuecomment-1350372850 > I wonder if after this change we should perhaps simply remove OptimizerRule::optimize to simply the code 🤔 I prepare to do it in followup-PR -- This is an automated mes

[GitHub] [arrow] ursabot commented on pull request #14769: MINOR: [Python] Bump max_line_length = 88

2022-12-13 Thread GitBox
ursabot commented on PR #14769: URL: https://github.com/apache/arrow/pull/14769#issuecomment-1350372413 Benchmark runs are scheduled for baseline = c7eddff959896f00b7576e5b121323cf1aab0fe7 and contender = 3e2a2242d547c270785aecab757e55ff504ec8ba. 3e2a2242d547c270785aecab757e55ff504ec8ba is

[GitHub] [arrow-rs] askoa opened a new pull request, #3342: feat: configure null value in arrow csv writer

2022-12-13 Thread GitBox
askoa opened a new pull request, #3342: URL: https://github.com/apache/arrow-rs/pull/3342 # Which issue does this PR close? Closes #3268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow-rs] askoa commented on issue #3268: Allow ArrowCSV writer to control the display of NULL values

2022-12-13 Thread GitBox
askoa commented on issue #3268: URL: https://github.com/apache/arrow-rs/issues/3268#issuecomment-1350317692 I'll pick this up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow] cyb70289 commented on pull request #14353: ARROW-17735: [C++][Parquet] Optimize parquet reading for String/Binary type

2022-12-13 Thread GitBox
cyb70289 commented on PR #14353: URL: https://github.com/apache/arrow/pull/14353#issuecomment-1350315565 > > I am in fixing errors in github CI. For "TestArrowReadDeltaEncoding.DeltaByteArray", why is skipped in my local testing? But showed failure on CI. > Don't know the deta

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4586: Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange

2022-12-13 Thread GitBox
mingmwang commented on code in PR #4586: URL: https://github.com/apache/arrow-datafusion/pull/4586#discussion_r1047962931 ## datafusion/core/src/physical_optimizer/enforcement.rs: ## @@ -885,14 +915,9 @@ fn ensure_distribution_and_ordering( Ok(child)

[GitHub] [arrow] zhixingheyi-tian commented on pull request #14353: ARROW-17735: [C++][Parquet] Optimize parquet reading for String/Binary type

2022-12-13 Thread GitBox
zhixingheyi-tian commented on PR #14353: URL: https://github.com/apache/arrow/pull/14353#issuecomment-1350299053 > Would you fix the CI failures? Thanks @cyb70289 I am in fixing errors in github CI. For "TestArrowReadDeltaEncoding.DeltaByteArray", why is skipped in my local te

[GitHub] [arrow] mapleFU commented on pull request #14351: ARROW-17904: [C++] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1)

2022-12-13 Thread GitBox
mapleFU commented on PR #14351: URL: https://github.com/apache/arrow/pull/14351#issuecomment-1350296152 After discussion on `parquet-format`, the checksum for DICT and DATA_PAGE_V2 will be implemented in the coming patches. -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4586: Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange

2022-12-13 Thread GitBox
mingmwang commented on code in PR #4586: URL: https://github.com/apache/arrow-datafusion/pull/4586#discussion_r1047955341 ## datafusion/core/src/physical_optimizer/enforcement.rs: ## @@ -835,13 +836,42 @@ fn new_join_conditions( new_join_on } +/// Within this function, i

[GitHub] [arrow-datafusion] HaoYang670 opened a new pull request, #4609: Remove the function `consume_token` from the parser

2022-12-13 Thread GitBox
HaoYang670 opened a new pull request, #4609: URL: https://github.com/apache/arrow-datafusion/pull/4609 Signed-off-by: remzi <1371656737...@gmail.com> # Which issue does this PR close? None. This is a follow-up of #4262. I found there were some cases in the `parser.rs`

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4586: Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange

2022-12-13 Thread GitBox
mingmwang commented on code in PR #4586: URL: https://github.com/apache/arrow-datafusion/pull/4586#discussion_r1047948804 ## datafusion/core/tests/sql/joins.rs: ## @@ -2040,8 +2040,8 @@ async fn left_semi_join() -> Result<()> { let physical_plan = state.create_physical_

[GitHub] [arrow] cyb70289 commented on pull request #14353: ARROW-17735: [C++][Parquet] Optimize parquet reading for String/Binary type

2022-12-13 Thread GitBox
cyb70289 commented on PR #14353: URL: https://github.com/apache/arrow/pull/14353#issuecomment-1350262966 Would you fix the CI failures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [arrow] david-engelmann commented on issue #14931: pyarrow 10.0.1 is missing libarrow_python.so

2022-12-13 Thread GitBox
david-engelmann commented on issue #14931: URL: https://github.com/apache/arrow/issues/14931#issuecomment-1350251058 @kou Amazing feedback, going to close this out and see how the testing goes -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-adbc] kou commented on pull request #218: chore: publish docs to website

2022-12-13 Thread GitBox
kou commented on PR #218: URL: https://github.com/apache/arrow-adbc/pull/218#issuecomment-1350241777 > I was thinking about having some sort of (brief) landing page, but it is redundant. (Or maybe a redirect, or a version selector.) Ah, it makes sense. I used the "redirect" approac

[GitHub] [arrow] cyb70289 merged pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
cyb70289 merged PR #14938: URL: https://github.com/apache/arrow/pull/14938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow] ursabot commented on pull request #14910: GH-14909: [Java] Prevent potential memory leak of ListSubfieldEncoder and StructSubfieldEncoder

2022-12-13 Thread GitBox
ursabot commented on PR #14910: URL: https://github.com/apache/arrow/pull/14910#issuecomment-1350236389 Benchmark runs are scheduled for baseline = 6f86fce8f15e492cb3eceb4b5e29f3a66233942b and contender = c7eddff959896f00b7576e5b121323cf1aab0fe7. c7eddff959896f00b7576e5b121323cf1aab0fe7 is

[GitHub] [arrow-datafusion] ursabot commented on pull request #4596: Normalize datafusion configuration names

2022-12-13 Thread GitBox
ursabot commented on PR #4596: URL: https://github.com/apache/arrow-datafusion/pull/4596#issuecomment-1350236183 Benchmark runs are scheduled for baseline = f8a3d584c8a392574347ebab97b26c07b054e93a and contender = a5cf57789a73646b92ecefc1124cd38215b91ee7. a5cf57789a73646b92ecefc1124cd3821

[GitHub] [arrow-datafusion] yahoNanJing merged pull request #4596: Normalize datafusion configuration names

2022-12-13 Thread GitBox
yahoNanJing merged PR #4596: URL: https://github.com/apache/arrow-datafusion/pull/4596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr..

[GitHub] [arrow-datafusion] yahoNanJing closed issue #4595: Normalize datafusion configuration names

2022-12-13 Thread GitBox
yahoNanJing closed issue #4595: Normalize datafusion configuration names URL: https://github.com/apache/arrow-datafusion/issues/4595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-adbc] lidavidm commented on pull request #218: chore: publish docs to website

2022-12-13 Thread GitBox
lidavidm commented on PR #218: URL: https://github.com/apache/arrow-adbc/pull/218#issuecomment-1350228706 > Why do we want to generate `/adbc/index.html` in Jekyll? I think that it's better that we generate all files for https://arrow.apache.org/adbc/ in apache/arrow-adbc. I was thin

[GitHub] [arrow] cyb70289 commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
cyb70289 commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047920984 ## cpp/src/arrow/util/benchmark_util.h: ## @@ -46,12 +39,22 @@ struct BenchmarkArgsType::type; +using internal::CpuInfo; + +static const CpuInfo* cpu_info = CpuInfo:

[GitHub] [arrow] WillAyd commented on pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
WillAyd commented on PR #14938: URL: https://github.com/apache/arrow/pull/14938#issuecomment-135032 I would be happy to try and add that in a follow up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on pull request #14941: Minor documentation cleanup

2022-12-13 Thread GitBox
github-actions[bot] commented on PR #14941: URL: https://github.com/apache/arrow/pull/14941#issuecomment-1350204885 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[GitHub] [arrow-adbc] kou commented on pull request #218: chore: publish docs to website

2022-12-13 Thread GitBox
kou commented on PR #218: URL: https://github.com/apache/arrow-adbc/pull/218#issuecomment-1350186957 Why do we want to generate `/adbc/index.html` in Jekyll? I think that it's better that we generate all files for https://arrow.apache.org/adbc/ in apache/arrow-adbc. How about putti

[GitHub] [arrow] rok commented on pull request #14191: ARROW-17798: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer

2022-12-13 Thread GitBox
rok commented on PR #14191: URL: https://github.com/apache/arrow/pull/14191#issuecomment-1350179567 @pitrou I think I addressed everything, please take a look if it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] rok commented on a diff in pull request #14191: ARROW-17798: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer

2022-12-13 Thread GitBox
rok commented on code in PR #14191: URL: https://github.com/apache/arrow/pull/14191#discussion_r1047891761 ## cpp/src/parquet/encoding_test.cc: ## @@ -1276,5 +1282,128 @@ TEST(ByteStreamSplitEncodeDecode, InvalidDataTypes) { ASSERT_THROW(MakeTypedDecoder(Encoding::BYTE_STREAM

[GitHub] [arrow] kou commented on issue #14931: pyarrow 10.0.1 is missing libarrow_python.so

2022-12-13 Thread GitBox
kou commented on issue #14931: URL: https://github.com/apache/arrow/issues/14931#issuecomment-1350175477 I've commented on the pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] kou commented on pull request #14921: MINOR: [Go][CI] Increase workflow timeout due to benchmarking

2022-12-13 Thread GitBox
kou commented on PR #14921: URL: https://github.com/apache/arrow/pull/14921#issuecomment-1350155824 It seems that this doesn't solve the problem: https://github.com/apache/arrow/actions/runs/3689554605 For example: https://github.com/apache/arrow/actions/runs/3689554605/jobs/62456235

[GitHub] [arrow-adbc] lidavidm commented on pull request #174: chore: set up release process

2022-12-13 Thread GitBox
lidavidm commented on PR #174: URL: https://github.com/apache/arrow-adbc/pull/174#issuecomment-1350150010 Fixed, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] ursabot commented on pull request #14911: MINOR: [Java] ArrowBuf#setOne should have int64 params

2022-12-13 Thread GitBox
ursabot commented on PR #14911: URL: https://github.com/apache/arrow/pull/14911#issuecomment-1350053693 Benchmark runs are scheduled for baseline = 34672cb256ffe7a297d7b6ad439bf6843b161b9c and contender = 6f86fce8f15e492cb3eceb4b5e29f3a66233942b. 6f86fce8f15e492cb3eceb4b5e29f3a66233942b is

[GitHub] [arrow] david-engelmann commented on issue #14931: pyarrow 10.0.1 is missing libarrow_python.so

2022-12-13 Thread GitBox
david-engelmann commented on issue #14931: URL: https://github.com/apache/arrow/issues/14931#issuecomment-1349976721 > We need to use `pyarrow.get_include()`, `pyarrow.get_libraries()` and `pyarrow.get_library_dirs()` to find `libarrow_python`. > > See also: https://arrow.apache.org/

[GitHub] [arrow] pitrou commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
pitrou commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047833711 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNanosec

[GitHub] [arrow] westonpace commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
westonpace commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047832866 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNan

[GitHub] [arrow] westonpace commented on issue #14792: [C++] CSVBufferIterator potentially used incorrectly

2022-12-13 Thread GitBox
westonpace commented on issue #14792: URL: https://github.com/apache/arrow/issues/14792#issuecomment-1349949748 > Right, I guess there wouldn't be an alternative in that case. I suppose that also explains why BlockParsingOperator doesn't concern itself with thread-safety either while BlockD

[GitHub] [arrow] pitrou commented on pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
pitrou commented on PR #14938: URL: https://github.com/apache/arrow/pull/14938#issuecomment-1349948618 > For both rank & sort do we have any kind of UTF8 benchmarks? We don't. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] pitrou commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
pitrou commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047827839 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNanosec

[GitHub] [arrow] westonpace commented on pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
westonpace commented on PR #14938: URL: https://github.com/apache/arrow/pull/14938#issuecomment-1349941214 For both rank & sort do we have any kind of UTF8 benchmarks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow] westonpace commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
westonpace commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047826532 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNan

[GitHub] [arrow-datafusion] tustvold commented on pull request #4560: Write faster kernel for is_distinct

2022-12-13 Thread GitBox
tustvold commented on PR #4560: URL: https://github.com/apache/arrow-datafusion/pull/4560#issuecomment-1349929493 > iterating through .values() is significantly faster than accessing value_unchecked(i) Yeah, I've seen this before. Last time it required some hackery with `inline(neve

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4560: Write faster kernel for is_distinct

2022-12-13 Thread GitBox
tustvold commented on code in PR #4560: URL: https://github.com/apache/arrow-datafusion/pull/4560#discussion_r1047821803 ## datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs: ## @@ -515,4 +513,40 @@ mod tests { let err = modulus_decimal_scalar(&left_decim

[GitHub] [arrow-datafusion] comphead commented on pull request #4560: Write faster kernel for is_distinct

2022-12-13 Thread GitBox
comphead commented on PR #4560: URL: https://github.com/apache/arrow-datafusion/pull/4560#issuecomment-1349906330 Avg time now **70s** @alamb added tests for nulls @tustvold iterating through `.values()` is significantly faster than accessing `value_unchecked(i)` @Dandandan you

[GitHub] [arrow] david-engelmann commented on issue #14931: pyarrow 10.0.1 is missing libarrow_python.so

2022-12-13 Thread GitBox
david-engelmann commented on issue #14931: URL: https://github.com/apache/arrow/issues/14931#issuecomment-1349887179 @kou I was able to find the ARROW_PYTHON_SHARED_LIB path but I can't find the ARROW_PYTHON_STATIC_LIB which is the libarrow_python.a. -- This is an automated message from t

[GitHub] [arrow-adbc] kou commented on a diff in pull request #174: chore: set up release process

2022-12-13 Thread GitBox
kou commented on code in PR #174: URL: https://github.com/apache/arrow-adbc/pull/174#discussion_r1047783811 ## .github/workflows/nightly-website.yml: ## @@ -71,3 +74,41 @@ jobs: git add --force dev/ git commit -m "publish documentation" --allow-empty

[GitHub] [arrow-cookbook] raulcd commented on a diff in pull request #280: [Release] Update post release tasks for cookbooks

2022-12-13 Thread GitBox
raulcd commented on code in PR #280: URL: https://github.com/apache/arrow-cookbook/pull/280#discussion_r1047791186 ## dev/release/01-prepare.sh: ## @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #4582: Remove `AggregateState` wrapper

2022-12-13 Thread GitBox
Dandandan commented on code in PR #4582: URL: https://github.com/apache/arrow-datafusion/pull/4582#discussion_r1047790400 ## datafusion/physical-expr/src/aggregate/utils.rs: ## @@ -21,28 +21,13 @@ use arrow::array::ArrayRef; use datafusion_common::{Result, ScalarValue}; use da

[GitHub] [arrow-rs] tustvold opened a new pull request, #3341: Upstream newline_delimited_stream and ChunkedStore from DataFusion

2022-12-13 Thread GitBox
tustvold opened a new pull request, #3341: URL: https://github.com/apache/arrow-rs/pull/3341 # Which issue does this PR close? Closes #. # Rationale for this change I originally wrote these for DataFusion, where they have proved useful. Lets upstream the

[GitHub] [arrow-cookbook] kou commented on a diff in pull request #280: [Release] Update post release tasks for cookbooks

2022-12-13 Thread GitBox
kou commented on code in PR #280: URL: https://github.com/apache/arrow-cookbook/pull/280#discussion_r1047771088 ## dev/release/utils-prepare.sh: ## @@ -0,0 +1,70 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

[GitHub] [arrow] zeroshade merged pull request #14921: MINOR: [Go][CI] Increase workflow timeout due to benchmarking

2022-12-13 Thread GitBox
zeroshade merged PR #14921: URL: https://github.com/apache/arrow/pull/14921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] ursabot commented on pull request #14857: GH-14855: [C++] Support importing zero-case unions

2022-12-13 Thread GitBox
ursabot commented on PR #14857: URL: https://github.com/apache/arrow/pull/14857#issuecomment-1349707267 Benchmark runs are scheduled for baseline = 45d83fec743b79d48ba213f93cae95eacf5ec806 and contender = 34672cb256ffe7a297d7b6ad439bf6843b161b9c. 34672cb256ffe7a297d7b6ad439bf6843b161b9c is

[GitHub] [arrow-rs] ursabot commented on pull request #3339: Add MapArray to pretty print

2022-12-13 Thread GitBox
ursabot commented on PR #3339: URL: https://github.com/apache/arrow-rs/pull/3339#issuecomment-1349707004 Benchmark runs are scheduled for baseline = 2749dcca50e6dd0ac72db7fe802552c2db742c3c and contender = a93859b07516b91511ffe3106a423b9af4b69f34. a93859b07516b91511ffe3106a423b9af4b69f34 i

[GitHub] [arrow-rs] tustvold closed issue #3322: Pretty print not implemented for Map

2022-12-13 Thread GitBox
tustvold closed issue #3322: Pretty print not implemented for Map URL: https://github.com/apache/arrow-rs/issues/3322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow-rs] tustvold merged pull request #3339: Add MapArray to pretty print

2022-12-13 Thread GitBox
tustvold merged PR #3339: URL: https://github.com/apache/arrow-rs/pull/3339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] WillAyd commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
WillAyd commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047738881 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNanose

[GitHub] [arrow-datafusion] ursabot commented on pull request #4606: Lazy system tables

2022-12-13 Thread GitBox
ursabot commented on PR #4606: URL: https://github.com/apache/arrow-datafusion/pull/4606#issuecomment-1349666385 Benchmark runs are scheduled for baseline = 1c6b1439c5454742c1e7c02eeef9886291b448ca and contender = f8a3d584c8a392574347ebab97b26c07b054e93a. f8a3d584c8a392574347ebab97b26c07b

[GitHub] [arrow] kou commented on issue #14931: pyarrow 10.0.1 is missing libarrow_python.so

2022-12-13 Thread GitBox
kou commented on issue #14931: URL: https://github.com/apache/arrow/issues/14931#issuecomment-1349659963 We need to use `pyarrow.get_include()`, `pyarrow.get_libraries()` and `pyarrow.get_library_dirs()` to find `libarrow_python`. See also: https://arrow.apache.org/docs/python/inte

[GitHub] [arrow] pitrou commented on a diff in pull request #14938: GH-14937: [C++] Add rank kernel benchmarks

2022-12-13 Thread GitBox
pitrou commented on code in PR #14938: URL: https://github.com/apache/arrow/pull/14938#discussion_r1047719213 ## cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc: ## @@ -301,5 +333,38 @@ BENCHMARK(TableSortIndicesInt64Wide) }) ->Unit(benchmark::TimeUnit::kNanosec

[GitHub] [arrow-datafusion] tustvold merged pull request #4606: Lazy system tables

2022-12-13 Thread GitBox
tustvold merged PR #4606: URL: https://github.com/apache/arrow-datafusion/pull/4606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-ballista] thinkharderdev closed issue #544: Make it concurrently to launch tasks to executors

2022-12-13 Thread GitBox
thinkharderdev closed issue #544: Make it concurrently to launch tasks to executors URL: https://github.com/apache/arrow-ballista/issues/544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow-ballista] thinkharderdev merged pull request #557: Make it concurrently to launch tasks to executors

2022-12-13 Thread GitBox
thinkharderdev merged PR #557: URL: https://github.com/apache/arrow-ballista/pull/557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-datafusion] tustvold commented on pull request #4601: Remove ObjectStore from FileStream (#4533)

2022-12-13 Thread GitBox
tustvold commented on PR #4601: URL: https://github.com/apache/arrow-datafusion/pull/4601#issuecomment-1349647791 > the rationale that any opener that needs a object_store reference should obtain one as part of construction (rather than requiring it on the trait)? The rationale is th

[GitHub] [arrow-datafusion] alamb commented on issue #4603: [window function] support min max with self define sliding window.

2022-12-13 Thread GitBox
alamb commented on issue #4603: URL: https://github.com/apache/arrow-datafusion/issues/4603#issuecomment-1349647333 cc @metesynnada @retikulum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion] alamb commented on issue #4533: FileStream requires fake ObjectStore when ParquetFileReaderFactory is used

2022-12-13 Thread GitBox
alamb commented on issue #4533: URL: https://github.com/apache/arrow-datafusion/issues/4533#issuecomment-1349643940 > Is there an open issue/pr for that work ? Maybe I could suggest adding unique id to identify requests in TableProvider scan operations. I am tracking some work in ht

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4606: Lazy system tables

2022-12-13 Thread GitBox
tustvold commented on code in PR #4606: URL: https://github.com/apache/arrow-datafusion/pull/4606#discussion_r1047703275 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -220,74 +217,113 @@ impl InformationSchemaProvider { } } }

  1   2   3   4   >