[GitHub] [arrow-rs] tustvold merged pull request #3522: minor fix: use the unified decimal type builder

2023-01-12 Thread GitBox
tustvold merged PR #3522: URL: https://github.com/apache/arrow-rs/pull/3522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-julia] codecov-commenter commented on pull request #379: Test ArrowTypes package in RC verification script

2023-01-12 Thread GitBox
codecov-commenter commented on PR #379: URL: https://github.com/apache/arrow-julia/pull/379#issuecomment-1381443121 # [Codecov](https://codecov.io/gh/apache/arrow-julia/pull/379?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apach

[GitHub] [arrow-rs] tustvold commented on pull request #3517: Add Global deallocation variant (#3516)

2023-01-12 Thread GitBox
tustvold commented on PR #3517: URL: https://github.com/apache/arrow-rs/pull/3517#issuecomment-1381441796 I need to think a bit more on this, the challenge is Vec will call dealloc with the alignment it would ask for, not the alignment that is actually correct :thinking: -- This is an a

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069049222 ## datafusion/core/src/physical_plan/repartition/distributor_channels.rs: ## @@ -0,0 +1,669 @@ +// Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [arrow-rs] ursabot commented on pull request #3510: Make consistent behavior on zeros equality on floating point types

2023-01-12 Thread GitBox
ursabot commented on PR #3510: URL: https://github.com/apache/arrow-rs/pull/3510#issuecomment-1381439716 Benchmark runs are scheduled for baseline = 8688dba69b925f5be3b6484c19ca7d54da1c0511 and contender = d49cd21f9c5ac27961041f7a2a9dbf4cea9708de. d49cd21f9c5ac27961041f7a2a9dbf4cea9708de i

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3523: Fix reading null booleans from CSV

2023-01-12 Thread GitBox
alamb commented on code in PR #3523: URL: https://github.com/apache/arrow-rs/pull/3523#discussion_r1069048424 ## arrow-csv/src/reader/mod.rs: ## @@ -2067,4 +2071,30 @@ mod tests { assert_eq!(b.num_rows(), expected, "{}", idx); } } + +#[test] +f

[GitHub] [arrow] kou commented on issue #15287: [Ruby] Add option to keep/merge join keys in Table#join

2023-01-12 Thread GitBox
kou commented on issue #15287: URL: https://github.com/apache/arrow/issues/15287#issuecomment-1381438649 I think that (2) is needless. I think that the current implementation already implements (2). -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [arrow-rs] tustvold merged pull request #3510: Make consistent behavior on zeros equality on floating point types

2023-01-12 Thread GitBox
tustvold merged PR #3510: URL: https://github.com/apache/arrow-rs/pull/3510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #3509: Make consistent behavior on zeros equality on floating point types

2023-01-12 Thread GitBox
tustvold closed issue #3509: Make consistent behavior on zeros equality on floating point types URL: https://github.com/apache/arrow-rs/issues/3509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion-python] martin-g commented on a diff in pull request #124: Introduce conda directory containing datafusion-dev.yaml conda enviro…

2023-01-12 Thread GitBox
martin-g commented on code in PR #124: URL: https://github.com/apache/arrow-datafusion-python/pull/124#discussion_r1069042109 ## README.md: ## @@ -149,12 +149,18 @@ assert result.column(0) == pyarrow.array([6.0]) ## How to install (from pip) +### Pip ```bash pip install

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069043945 ## datafusion/core/src/physical_plan/repartition/distributor_channels.rs: ## @@ -0,0 +1,669 @@ +// Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [arrow-rs] tustvold opened a new pull request, #3523: Fix reading null booleans from CSV

2023-01-12 Thread GitBox
tustvold opened a new pull request, #3523: URL: https://github.com/apache/arrow-rs/pull/3523 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

[GitHub] [arrow-rs] ursabot commented on pull request #3518: Update version to `31.0.0` and add changelog

2023-01-12 Thread GitBox
ursabot commented on PR #3518: URL: https://github.com/apache/arrow-rs/pull/3518#issuecomment-1381430381 Benchmark runs are scheduled for baseline = 79d823f9ad4b6c02d4aa7a6d4a5a178a25fc4363 and contender = 8688dba69b925f5be3b6484c19ca7d54da1c0511. 8688dba69b925f5be3b6484c19ca7d54da1c0511 i

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069038446 ## datafusion/core/src/physical_plan/repartition/distributor_channels.rs: ## @@ -0,0 +1,669 @@ +// Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069036415 ## datafusion/core/src/physical_plan/repartition/distributor_channels.rs: ## @@ -0,0 +1,669 @@ +// Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [arrow-julia] kou opened a new pull request, #379: Test ArrowTypes package in RC verification script

2023-01-12 Thread GitBox
kou opened a new pull request, #379: URL: https://github.com/apache/arrow-julia/pull/379 fixes #378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-rs] liukun4515 opened a new pull request, #3522: minor fix: use the unified decimal type builder

2023-01-12 Thread GitBox
liukun4515 opened a new pull request, #3522: URL: https://github.com/apache/arrow-rs/pull/3522 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

[GitHub] [arrow-rs] tustvold commented on issue #3415: Release Arrow `XXX` (next release after `30.0.0`)

2023-01-12 Thread GitBox
tustvold commented on issue #3415: URL: https://github.com/apache/arrow-rs/issues/3415#issuecomment-1381422896 I plan to cut the release later today, once I've fixed a minor bug in the CSV reader -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow-rs] tustvold merged pull request #3518: Update version to `31.0.0` and add changelog

2023-01-12 Thread GitBox
tustvold merged PR #3518: URL: https://github.com/apache/arrow-rs/pull/3518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-12 Thread GitBox
tustvold commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1069027517 ## arrow-csv/src/reader/mod.rs: ## @@ -916,31 +856,23 @@ fn build_primitive_array( // parses a specific column (col_idx) into an Arrow Array. fn build_boolean_array

github@arrow.apache.org

2023-01-12 Thread GitBox
mapleFU closed pull request #13874: ARROW-17408: [c++] Fixing C++20 compile warning using operator==(const T&, const T&) URL: https://github.com/apache/arrow/pull/13874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069026114 ## datafusion/core/src/physical_plan/repartition/mod.rs: ## @@ -132,70 +135,103 @@ impl BatchPartitioner { where F: FnMut(usize, RecordBatch)

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #4867: refactor: improve repartition buffering

2023-01-12 Thread GitBox
crepererum commented on code in PR #4867: URL: https://github.com/apache/arrow-datafusion/pull/4867#discussion_r1069025876 ## datafusion/core/src/physical_plan/repartition/mod.rs: ## @@ -132,70 +135,103 @@ impl BatchPartitioner { where F: FnMut(usize, RecordBatch)

[GitHub] [arrow-julia] quinnj commented on issue #376: Release ArrowTypes@2.0.2

2023-01-12 Thread GitBox
quinnj commented on issue #376: URL: https://github.com/apache/arrow-julia/issues/376#issuecomment-1381419031 Yeah, let's do an Arrow 2.4.2 release as well. https://github.com/apache/arrow-julia/pull/377 -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [arrow] AlenkaF commented on pull request #33609: GH-31506: [Python] Address docstrings in Streams and File Access (Factory Functions)

2023-01-12 Thread GitBox
AlenkaF commented on PR #33609: URL: https://github.com/apache/arrow/pull/33609#issuecomment-1381416577 That makes sense, thank you for explaining! Will do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] HaoYang670 opened a new pull request, #4888: Update `optimize_children` to return `Result>`

2023-01-12 Thread GitBox
HaoYang670 opened a new pull request, #4888: URL: https://github.com/apache/arrow-datafusion/pull/4888 # Which issue does this PR close? Closes #4882. # Rationale for this change # What changes are included in this PR? # Are these changes te

[GitHub] [arrow-julia] kou commented on issue #376: Release ArrowTypes@2.0.2

2023-01-12 Thread GitBox
kou commented on issue #376: URL: https://github.com/apache/arrow-julia/issues/376#issuecomment-1381414099 @quinnj Should we release only `ArrowTypes`? (We don't release `Arrow`?) Our established release script doesn't assume this case. (`ArrowTypes` is only released.) Can we also rele

[GitHub] [arrow-rs] bjchambers opened a new issue, #3521: Unable to read CSV with null boolean value

2023-01-12 Thread GitBox
bjchambers opened a new issue, #3521: URL: https://github.com/apache/arrow-rs/issues/3521 See comment here https://github.com/apache/arrow-rs/pull/3365/files#r1069008088 Removing this block causes errors when parsing CSV files with null values. -- This is an automated message from

[GitHub] [arrow-rs] bjchambers commented on a diff in pull request #3365: Add csv-core based reader (#3338)

2023-01-12 Thread GitBox
bjchambers commented on code in PR #3365: URL: https://github.com/apache/arrow-rs/pull/3365#discussion_r1069008088 ## arrow-csv/src/reader/mod.rs: ## @@ -916,31 +856,23 @@ fn build_primitive_array( // parses a specific column (col_idx) into an Arrow Array. fn build_boolean_arr

[GitHub] [arrow] vibhatha commented on issue #30891: [C++] The C++ API for writing datasets could be improved

2023-01-12 Thread GitBox
vibhatha commented on issue #30891: URL: https://github.com/apache/arrow/issues/30891#issuecomment-1381404152 @westonpace there is an issue in defining the `format`. Because all the format definitions depends on the definition on this `file_base.h` base class. It is not very clear how to se

[GitHub] [arrow] vibhatha commented on pull request #33623: GH-33212: [C++][Python] Add use_threads to pyarrow.substrait.run_query

2023-01-12 Thread GitBox
vibhatha commented on PR #33623: URL: https://github.com/apache/arrow/pull/33623#issuecomment-1381391196 I created the PR here: https://github.com/apache/arrow/pull/33651 Let's wait for the CIs. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on pull request #33651: GH-33649: [C++] Fix documentation on Newly Added Methods on ExecPlan

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33651: URL: https://github.com/apache/arrow/pull/33651#issuecomment-1381391001 * Closes: #33649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] vibhatha opened a new pull request, #33651: GH-33649: [C++] Fix documentation on Newly Added Methods on ExecPlan

2023-01-12 Thread GitBox
vibhatha opened a new pull request, #33651: URL: https://github.com/apache/arrow/pull/33651 This is a minor fix for CI failure in https://github.com/apache/arrow/pull/33623#issuecomment-1381326714. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [arrow] ursabot commented on pull request #15293: GH-15292: [C++] Typeclass alias is missing in ExtensionArray

2023-01-12 Thread GitBox
ursabot commented on PR #15293: URL: https://github.com/apache/arrow/pull/15293#issuecomment-1381376215 Benchmark runs are scheduled for baseline = 252e1e04d1dbed684efb11f550a6b4c8a9603d45 and contender = df2aa384eedc6dfd2a816f66e3ef8af1ecda3e8f. df2aa384eedc6dfd2a816f66e3ef8af1ecda3e8f is

[GitHub] [arrow] vibhatha commented on a diff in pull request #33623: GH-33212: [C++][Python] Add use_threads to pyarrow.substrait.run_query

2023-01-12 Thread GitBox
vibhatha commented on code in PR #33623: URL: https://github.com/apache/arrow/pull/33623#discussion_r1068967555 ## cpp/src/arrow/engine/substrait/util.cc: ## @@ -40,113 +40,15 @@ namespace arrow { namespace engine { -namespace { - -/// \brief A SinkNodeConsumer specialized

[GitHub] [arrow] vibhatha commented on pull request #33623: GH-33212: [C++][Python] Add use_threads to pyarrow.substrait.run_query

2023-01-12 Thread GitBox
vibhatha commented on PR #33623: URL: https://github.com/apache/arrow/pull/33623#issuecomment-1381355043 ```bash /arrow/cpp/src/arrow/compute/exec/exec_plan.h:427: error: The following parameter of arrow::compute::DeclarationToTable(Declaration declaration, bool use_threads=true, MemoryP

[GitHub] [arrow] vibhatha commented on issue #33649: [C++] Fix documentation on Newly Added Methods on ExecPlan

2023-01-12 Thread GitBox
vibhatha commented on issue #33649: URL: https://github.com/apache/arrow/issues/33649#issuecomment-1381355127 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [arrow] heronshoes commented on issue #33289: [Ruby] Arrow::Table#join returns duplicated key columns

2023-01-12 Thread GitBox
heronshoes commented on issue #33289: URL: https://github.com/apache/arrow/issues/33289#issuecomment-1381344025 Fixed in #15088 and #15287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] mapleFU commented on issue #15173: [Parquet][C++] ByteStreamSplitDecoder broken in presence of nulls

2023-01-12 Thread GitBox
mapleFU commented on issue #15173: URL: https://github.com/apache/arrow/issues/15173#issuecomment-1381335774 Reproduce the another problem with the code below: ```c++ void CheckRoundtripSpaced(const uint8_t* valid_bits, int64_t valid_bits_offset) ove

[GitHub] [arrow] mapleFU commented on issue #15173: [Parquet][C++] ByteStreamSplitDecoder broken in presence of nulls

2023-01-12 Thread GitBox
mapleFU commented on issue #15173: URL: https://github.com/apache/arrow/issues/15173#issuecomment-1381335403 @pitrou @wjones127 What do you think of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] VirendraYadav1234 commented on issue #14939: [C++] Support Table lookups in FieldRef and FieldPath

2023-01-12 Thread GitBox
VirendraYadav1234 commented on issue #14939: URL: https://github.com/apache/arrow/issues/14939#issuecomment-1381335027 I want to work on this issues please assign me this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] westonpace commented on pull request #33648: GH-33640: [C++] Add backpressure to asof join node

2023-01-12 Thread GitBox
westonpace commented on PR #33648: URL: https://github.com/apache/arrow/pull/33648#issuecomment-1381331162 CC @rtpsw can you please review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] westonpace commented on issue #33640: [C++] as-of-join backpressure for large sources

2023-01-12 Thread GitBox
westonpace commented on issue #33640: URL: https://github.com/apache/arrow/issues/33640#issuecomment-1381330671 Turns out we hadn't added https://github.com/westonpace/arrow/commit/45791de8311b0c2e2525e72f4c4746cc3b4364e3 anyways. So I've combined both "AsofJoin backpressure" and "backpres

[GitHub] [arrow] github-actions[bot] commented on pull request #33648: GH-33640: [C++] Add backpressure to asof join node

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33648: URL: https://github.com/apache/arrow/pull/33648#issuecomment-1381328910 :warning: GitHub issue #33640 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #33648: GH-33640: [C++] Add backpressure to asof join node

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33648: URL: https://github.com/apache/arrow/pull/33648#issuecomment-1381328878 * Closes: #33640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] westonpace opened a new pull request, #33648: GH-33640: [C++] Add backpressure to asof join node

2023-01-12 Thread GitBox
westonpace opened a new pull request, #33648: URL: https://github.com/apache/arrow/pull/33648 If any input starts to accumulate too many batches then we ask that input to slow down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] saurabhacellere closed pull request #14733: Update message.cpp

2023-01-12 Thread GitBox
saurabhacellere closed pull request #14733: Update message.cpp URL: https://github.com/apache/arrow/pull/14733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[GitHub] [arrow] js8544 commented on pull request #33623: GH-33212: [C++][Python] Add use_threads to pyarrow.substrait.run_query

2023-01-12 Thread GitBox
js8544 commented on PR #33623: URL: https://github.com/apache/arrow/pull/33623#issuecomment-1381326714 https://github.com/apache/arrow/actions/runs/3907791966/jobs/6677356655 @westonpace, I think this CI failure is caused by this pr. Could you please have a look? -- This is an automat

[GitHub] [arrow] mapleFU commented on issue #15173: [Parquet][C++] ByteStreamSplitDecoder broken in presence of nulls

2023-01-12 Thread GitBox
mapleFU commented on issue #15173: URL: https://github.com/apache/arrow/issues/15173#issuecomment-1381287613 ```c++ // The total number of values stored in the data page. This is the maximum of // the number of encoded definition levels or encoded values. For // non-repeated,

[GitHub] [arrow-datafusion] waynexia commented on issue #4880: Add datafusion-substrait to workspace

2023-01-12 Thread GitBox
waynexia commented on issue #4880: URL: https://github.com/apache/arrow-datafusion/issues/4880#issuecomment-1381278545 I noticed that commits in `datafusion-contrib/datafusion-substrait` after Dec 7, 2022 are not included. How do we handle them? Not sure if upgrading to `substrait v0.3` ca

[GitHub] [arrow] ursabot commented on pull request #15294: MINOR: [C++] Optimize Gandiva log function

2023-01-12 Thread GitBox
ursabot commented on PR #15294: URL: https://github.com/apache/arrow/pull/15294#issuecomment-1381278233 Benchmark runs are scheduled for baseline = 48da0dfb6c0425646f6043afc41a2515e93a4ffb and contender = 252e1e04d1dbed684efb11f550a6b4c8a9603d45. 252e1e04d1dbed684efb11f550a6b4c8a9603d45 is

[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #4879: Update datafusion-substrait crate to build against repo version of DataFusion

2023-01-12 Thread GitBox
waynexia commented on code in PR #4879: URL: https://github.com/apache/arrow-datafusion/pull/4879#discussion_r1068886119 ## datafusion/substrait/tests/serialize.rs: ## @@ -33,15 +33,15 @@ mod tests { let sql = "SELECT a, b FROM data"; // Test reference

[GitHub] [arrow] youngfn commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

2023-01-12 Thread GitBox
youngfn commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381268030 > Ah, that number 5 jogged my memory. Perhaps you need to increase fragment_readahead in ScanOptions? I've changed fragment_readahead to 10(ARROW_IO_THREADS=40) and 40(ARROW_I

[GitHub] [arrow] mapleFU commented on a diff in pull request #15182: GH-15074: [Parquet][C++] change 16-bit page_ordinal to 32-bit

2023-01-12 Thread GitBox
mapleFU commented on code in PR #15182: URL: https://github.com/apache/arrow/pull/15182#discussion_r1068863630 ## cpp/src/parquet/column_reader.cc: ## @@ -329,17 +329,16 @@ void SerializedPageReader::InitDecryption() { } void SerializedPageReader::UpdateDecryption(const std:

[GitHub] [arrow] mapleFU commented on a diff in pull request #15182: GH-15074: [Parquet][C++] change 16-bit page_ordinal to 32-bit

2023-01-12 Thread GitBox
mapleFU commented on code in PR #15182: URL: https://github.com/apache/arrow/pull/15182#discussion_r1068861869 ## cpp/src/parquet/column_reader.cc: ## @@ -329,17 +329,16 @@ void SerializedPageReader::InitDecryption() { } void SerializedPageReader::UpdateDecryption(const std:

[GitHub] [arrow] anjakefala commented on a diff in pull request #14293: ARROW-17799: [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer

2023-01-12 Thread GitBox
anjakefala commented on code in PR #14293: URL: https://github.com/apache/arrow/pull/14293#discussion_r1068857310 ## cpp/src/parquet/encoding.cc: ## @@ -2537,6 +2537,114 @@ class DeltaBitPackDecoder : public DecoderImpl, virtual public TypedDecoder { + public: + using T = typ

[GitHub] [arrow] js8544 opened a new pull request, #33647: MINOR: Remove unnecessary code in MultipathLevelBuilder::Write

2023-01-12 Thread GitBox
js8544 opened a new pull request, #33647: URL: https://github.com/apache/arrow/pull/33647 The PathBuilder run is unnecessary as `MultipathLevelBuilder::Make` on line 896 does the visitation already. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [arrow] youngfn commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

2023-01-12 Thread GitBox
youngfn commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381226315 > thx~~~ I will test it right now. Btw what about batch_size and batch_readahead, do they affect the thread number? Also, Do you think the HDFS connection number wi

[GitHub] [arrow] js8544 commented on pull request #33608: GH-33607: [C++] Support optional additional arguments for inline visit functions

2023-01-12 Thread GitBox
js8544 commented on PR #33608: URL: https://github.com/apache/arrow/pull/33608#issuecomment-1381221712 > Ok, is it possible to exploit this new possibility in some select places? It will validate that it works and also show the benefits. > > (you can of course grep through the codebas

[GitHub] [arrow] js8544 commented on a diff in pull request #33608: GH-33607: [C++] Support optional additional arguments for inline visit functions

2023-01-12 Thread GitBox
js8544 commented on code in PR #33608: URL: https://github.com/apache/arrow/pull/33608#discussion_r1068845438 ## cpp/src/arrow/visit_type_inline.h: ## @@ -71,31 +77,33 @@ inline Status VisitTypeInline(const DataType& type, VISITOR* visitor) { /// /// The intent is for this to

[GitHub] [arrow] westonpace merged pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
westonpace merged PR #33644: URL: https://github.com/apache/arrow/pull/33644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[GitHub] [arrow-rs] JayjeetAtGithub commented on pull request #3514: Use array_value_to_string in arrow-csv

2023-01-12 Thread GitBox
JayjeetAtGithub commented on PR #3514: URL: https://github.com/apache/arrow-rs/pull/3514#issuecomment-1381209988 Thanks for the idea @tustvold @alamb. I will explore how to do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] westonpace commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

2023-01-12 Thread GitBox
westonpace commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381209310 Ah, that number 5 jogged my memory. Perhaps you need to increase fragment_readahead in ScanOptions? -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow-datafusion-python] jdye64 opened a new pull request, #124: Introduce conda directory containing datafusion-dev.yaml conda enviro…

2023-01-12 Thread GitBox
jdye64 opened a new pull request, #124: URL: https://github.com/apache/arrow-datafusion-python/pull/124 …nment file, conda build files for building and publishing packages to anaconda.org as well as supporting documentation updates to README.md # Which issue does this PR close? Clo

[GitHub] [arrow] youngfn commented on issue #33627: [C++][HDFS] Can't get performance improve when increase the thread number of IO thread pool

2023-01-12 Thread GitBox
youngfn commented on issue #33627: URL: https://github.com/apache/arrow/issues/33627#issuecomment-1381201481 > Do you have multiple CSV files or just one CSV file? -- Yes. My test table has 196 files, even though I've set the ARROW_IO_THREADS to 200, but it just runs with 5 reading t

[GitHub] [arrow-datafusion-python] jdye64 opened a new issue, #123: [DISCUSS] arrow-datafusion-python versioning

2023-01-12 Thread GitBox
jdye64 opened a new issue, #123: URL: https://github.com/apache/arrow-datafusion-python/issues/123 Apologies if this isn't the correct venue for discussing this but I want to discuss versioning for `arrow-datafusion-python`. With the `arrow-datafusion` 16.0.0 release in motion I was thinkin

[GitHub] [arrow] amoeba commented on issue #33646: [Python][Doc] Enable remainder of discussed numpydoc checks

2023-01-12 Thread GitBox
amoeba commented on issue #33646: URL: https://github.com/apache/arrow/issues/33646#issuecomment-1381196384 I've started working on PR06 and ran into one blocker with how numpydoc works. I filed an issue at https://github.com/numpy/numpydoc/issues/446. -- This is an automated message from

[GitHub] [arrow] amoeba commented on issue #33646: [Python][Doc] Enable remainder of discussed numpydoc checks

2023-01-12 Thread GitBox
amoeba commented on issue #33646: URL: https://github.com/apache/arrow/issues/33646#issuecomment-1381195995 assign -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [arrow] h-vetinari commented on issue #20272: [C++] Bump version of bundled AWS SDK

2023-01-12 Thread GitBox
h-vetinari commented on issue #20272: URL: https://github.com/apache/arrow/issues/20272#issuecomment-1381194930 > Yes, it should be. Unfortunately, it might require bumping all AWS-related dependencies, or perhaps even adding some of them. cc @kou AFAIU, the aws-sdk-cpp bundles everyt

[GitHub] [arrow] westonpace commented on pull request #13751: ARROW-17022: [C++] Add unit tests and documentation for swiss-join

2023-01-12 Thread GitBox
westonpace commented on PR #13751: URL: https://github.com/apache/arrow/pull/13751#issuecomment-1381171673 Closing my stale PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow] westonpace closed pull request #13751: ARROW-17022: [C++] Add unit tests and documentation for swiss-join

2023-01-12 Thread GitBox
westonpace closed pull request #13751: ARROW-17022: [C++] Add unit tests and documentation for swiss-join URL: https://github.com/apache/arrow/pull/13751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] westonpace commented on pull request #13368: ARROW-16811: [C++] Remove default exec context from Expression::Bind

2023-01-12 Thread GitBox
westonpace commented on PR #13368: URL: https://github.com/apache/arrow/pull/13368#issuecomment-1381171497 Closing my stale PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow] westonpace closed pull request #13368: ARROW-16811: [C++] Remove default exec context from Expression::Bind

2023-01-12 Thread GitBox
westonpace closed pull request #13368: ARROW-16811: [C++] Remove default exec context from Expression::Bind URL: https://github.com/apache/arrow/pull/13368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] westonpace closed pull request #12871: ARROW-16178: [C++] Add a ThreadLocalState concept built on thread local

2023-01-12 Thread GitBox
westonpace closed pull request #12871: ARROW-16178: [C++] Add a ThreadLocalState concept built on thread local URL: https://github.com/apache/arrow/pull/12871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] westonpace commented on pull request #12871: ARROW-16178: [C++] Add a ThreadLocalState concept built on thread local

2023-01-12 Thread GitBox
westonpace commented on PR #12871: URL: https://github.com/apache/arrow/pull/12871#issuecomment-1381171325 Closing my stale PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow] github-actions[bot] commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381171217 Revision: 964b96a4a591327a2f119781ba58cf38f996d46f Submitted crossbow builds: [ursacomputing/crossbow @ actions-1937fb2067](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] ursabot commented on pull request #15182: GH-15074: [Parquet][C++] change 16-bit page_ordinal to 32-bit

2023-01-12 Thread GitBox
ursabot commented on PR #15182: URL: https://github.com/apache/arrow/pull/15182#issuecomment-1381171100 Benchmark runs are scheduled for baseline = fc53ff8c5e2797c1a5a99db7f3aece80dd0b9f3e and contender = 48da0dfb6c0425646f6043afc41a2515e93a4ffb. 48da0dfb6c0425646f6043afc41a2515e93a4ffb is

[GitHub] [arrow] westonpace closed pull request #12586: ARROW-15877: [C++] Add a C++ query testing tool

2023-01-12 Thread GitBox
westonpace closed pull request #12586: ARROW-15877: [C++] Add a C++ query testing tool URL: https://github.com/apache/arrow/pull/12586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow] westonpace merged pull request #33623: GH-33212: [C++][Python] Add use_threads to pyarrow.substrait.run_query

2023-01-12 Thread GitBox
westonpace merged PR #33623: URL: https://github.com/apache/arrow/pull/33623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[GitHub] [arrow] westonpace commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
westonpace commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381169604 @github-actions crossbow submit test-ubuntu-20.04-cpp-20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow] github-actions[bot] commented on pull request #33645: GH-33000: [doc] Various documentation syntax fixes

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33645: URL: https://github.com/apache/arrow/pull/33645#issuecomment-1381168298 :warning: GitHub issue #33000 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #33645: GH-33000: [doc] Various documentation syntax fixes

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33645: URL: https://github.com/apache/arrow/pull/33645#issuecomment-1381168278 * Closes: #33000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow-datafusion] askoa opened a new issue, #4887: The rule `common_sub_expression_eliminate` removes non-duplicate expressions

2023-01-12 Thread GitBox
askoa opened a new issue, #4887: URL: https://github.com/apache/arrow-datafusion/issues/4887 **Describe the bug** While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below

[GitHub] [arrow] github-actions[bot] commented on pull request #33645: GH-33000: [doc] Various documentation syntax fixes

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33645: URL: https://github.com/apache/arrow/pull/33645#issuecomment-1381168069 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[GitHub] [arrow] lidavidm commented on issue #33633: [C++] Separate Protobuf dependencies into a `libarrow_proto.so`

2023-01-12 Thread GitBox
lidavidm commented on issue #33633: URL: https://github.com/apache/arrow/issues/33633#issuecomment-1381161497 That said, I can investigate more - just not in time for the release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow] lidavidm commented on issue #33633: [C++] Separate Protobuf dependencies into a `libarrow_proto.so`

2023-01-12 Thread GitBox
lidavidm commented on issue #33633: URL: https://github.com/apache/arrow/issues/33633#issuecomment-1381161354 Ah, thanks for reminding me. That's an option, I believe, but what I could find seemed waffly about whether it would actually solve this. (Also: the conflict is in a core Protobuf t

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4818: Update to arrow `30.1.0`

2023-01-12 Thread GitBox
tustvold commented on code in PR #4818: URL: https://github.com/apache/arrow-datafusion/pull/4818#discussion_r1068802571 ## datafusion/core/src/physical_plan/file_format/csv.rs: ## @@ -444,24 +444,11 @@ mod tests { assert_eq!(14, csv.schema().fields().len());

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4818: Update to arrow `30.1.0`

2023-01-12 Thread GitBox
tustvold commented on code in PR #4818: URL: https://github.com/apache/arrow-datafusion/pull/4818#discussion_r1068802571 ## datafusion/core/src/physical_plan/file_format/csv.rs: ## @@ -444,24 +444,11 @@ mod tests { assert_eq!(14, csv.schema().fields().len());

[GitHub] [arrow] github-actions[bot] commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381125288 ``` Unable to match any tasks for `ubuntu-20.04-cpp-20` The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3907058236 ``` -- This is

[GitHub] [arrow] westonpace commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
westonpace commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381124123 @github-actions crossbow submit ubuntu-20.04-cpp-20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow] github-actions[bot] commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
github-actions[bot] commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381120065 ``` No such option: -e The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3907028385 ``` -- This is an automated message from the Apa

[GitHub] [arrow] westonpace commented on pull request #33644: GH-33643: [C++] Remove implicit = capture of this which is not valid in c++20

2023-01-12 Thread GitBox
westonpace commented on PR #33644: URL: https://github.com/apache/arrow/pull/33644#issuecomment-1381119413 @github-actions crossbow submit -e CMAKE_CXX_STANDARD=20 ubuntu-cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] westonpace commented on issue #33638: R package has pragmas that are supressing diagnostics

2023-01-12 Thread GitBox
westonpace commented on issue #33638: URL: https://github.com/apache/arrow/issues/33638#issuecomment-138412 So the problem stems from the case when `use_threads` is false. `ExecPlan::Make(ExecContext*)` changed to `ExecPlan::Make(ExecContext)` but the behavior subtly changed. In

[GitHub] [arrow-ballista] andygrove opened a new pull request, #593: Python: add method to get explain output as a string

2023-01-12 Thread GitBox
andygrove opened a new pull request, #593: URL: https://github.com/apache/arrow-ballista/pull/593 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow] minyoung commented on pull request #14989: ARROW-18438: [Go][Parquet] Panic in bitmap writer

2023-01-12 Thread GitBox
minyoung commented on PR #14989: URL: https://github.com/apache/arrow/pull/14989#issuecomment-1381106835 @zeroshade the `+1` is almost certainly a hack. I tried digging into this, but am not sure what the right solution is. Disclaimer, my investigation may or may not correct since I'm

[GitHub] [arrow] westonpace commented on issue #33633: [C++] Separate Protobuf dependencies into a `libarrow_proto.so`

2023-01-12 Thread GitBox
westonpace commented on issue #33633: URL: https://github.com/apache/arrow/issues/33633#issuecomment-1381101118 Configuring the linker to hide the symbols would be my preference I think. Pity this is all so complicated. Did we give up on the "protobuf lite" option? -- This is an automat

[GitHub] [arrow-datafusion] andygrove commented on pull request #4879: Update datafusion-substrait crate to build against repo version of DataFusion

2023-01-12 Thread GitBox
andygrove commented on PR #4879: URL: https://github.com/apache/arrow-datafusion/pull/4879#issuecomment-1381096090 @waynexia @nseekhao fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] emkornfield merged pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-12 Thread GitBox
emkornfield merged PR #14603: URL: https://github.com/apache/arrow/pull/14603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

[GitHub] [arrow] emkornfield commented on pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-12 Thread GitBox
emkornfield commented on PR #14603: URL: https://github.com/apache/arrow/pull/14603#issuecomment-1381086143 Travis CI failures look infrastructure related. going to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] wjones127 commented on pull request #14743: replace sample_frac with slice_sample

2023-01-12 Thread GitBox
wjones127 commented on PR #14743: URL: https://github.com/apache/arrow/pull/14743#issuecomment-1381085579 If we can use `slice_sample()` directly on Arrow queries, the example is still useful since we use it later to call `predict()` on each batch with the model we fit from the sample. I'd

  1   2   3   4   >