[GitHub] [arrow] moria97 commented on issue #14606: [C++] memory consumption question

2022-11-15 Thread GitBox
moria97 commented on issue #14606: URL: https://github.com/apache/arrow/issues/14606#issuecomment-1316549270 @westonpace Thanks for the explanation! Y es, I'm using write_table to write ndjson input to parquet file. I'm using PInvoke from dotnet and pass the original multiple line json fi

[GitHub] [arrow-rs] viirya commented on pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#issuecomment-1316547100 > I think a test of the null behaviour would be good, along with potentially one of the slicing behaviour (which I think will currently fail, which is fine). This has a null test no

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023629988 ## arrow-array/src/builder/boolean_buffer_builder.rs: ## @@ -33,6 +33,10 @@ impl BooleanBufferBuilder { Self { buffer, len: 0 } } +pub fn new_from_b

[GitHub] [arrow-datafusion] timvw commented on a diff in pull request #4227: refactor how we create listing tables

2022-11-15 Thread GitBox
timvw commented on code in PR #4227: URL: https://github.com/apache/arrow-datafusion/pull/4227#discussion_r1023566499 ## datafusion/core/src/execution/runtime_env.rs: ## @@ -152,7 +153,16 @@ pub struct RuntimeConfig { impl RuntimeConfig { /// New with default values p

[GitHub] [arrow] yaqi-zhao commented on pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-11-15 Thread GitBox
yaqi-zhao commented on PR #14585: URL: https://github.com/apache/arrow/pull/14585#issuecomment-1316519433 @emkornfield hi, emkornfield. For the 2 questions you proposed last week, is the solution on comment acceptable? > https://github.com/apache/arrow/pull/14585#issuecomment-13052

[GitHub] [arrow] djouallah commented on issue #14619: arrow dataset: how to use date.year and date.month as partitioning

2022-11-15 Thread GitBox
djouallah commented on issue #14619: URL: https://github.com/apache/arrow/issues/14619#issuecomment-1316512016 @westonpace I think we are talking about two different thing, ideally arrow should be able to partition by a field date, but instead of generating a file by day, it will generate a

[GitHub] [arrow-datafusion] HaoYang670 commented on a diff in pull request #4234: [WIP] Unfold the `round` function in logical plan if it has 2 arguments

2022-11-15 Thread GitBox
HaoYang670 commented on code in PR #4234: URL: https://github.com/apache/arrow-datafusion/pull/4234#discussion_r1023593430 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -728,6 +729,30 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a, S> {

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023593199 ## arrow-buffer/src/buffer/immutable.rs: ## @@ -227,6 +227,28 @@ impl Buffer { pub fn count_set_bits_offset(&self, offset: usize, len: usize) -> usize {

[GitHub] [arrow] github-actions[bot] commented on pull request #14655: Update FlightSql.rst

2022-11-15 Thread GitBox
github-actions[bot] commented on PR #14655: URL: https://github.com/apache/arrow/pull/14655#issuecomment-1316505489 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[GitHub] [arrow] thatstatsguy opened a new pull request, #14655: Update FlightSql.rst

2022-11-15 Thread GitBox
thatstatsguy opened a new pull request, #14655: URL: https://github.com/apache/arrow/pull/14655 Add Go example to documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
Ted-Jiang commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023591486 ## arrow-buffer/src/buffer/immutable.rs: ## @@ -227,6 +227,28 @@ impl Buffer { pub fn count_set_bits_offset(&self, offset: usize, len: usize) -> usize {

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023587474 ## arrow-array/src/array/primitive_array.rs: ## @@ -489,6 +519,43 @@ impl PrimitiveArray { ) } } + +/// Returns `PrimitiveBuilder` of thi

[GitHub] [arrow] ursabot commented on pull request #14219: ARROW-17825: [C++] Allow the possibility to write several tables in ORCFileWriter

2022-11-15 Thread GitBox
ursabot commented on PR #14219: URL: https://github.com/apache/arrow/pull/14219#issuecomment-1316494425 Benchmark runs are scheduled for baseline = 4d8c21bd303833c124f9d5801755c953c6c3260e and contender = 77f099fb5c324afc8ee38cda4976bf20a08e7a4a. 77f099fb5c324afc8ee38cda4976bf20a08e7a4a is

[GitHub] [arrow-datafusion] HaoYang670 commented on issue #4236: `cargo test` reports errors on the master branch.

2022-11-15 Thread GitBox
HaoYang670 commented on issue #4236: URL: https://github.com/apache/arrow-datafusion/issues/4236#issuecomment-1316488590 @alamb, could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow-datafusion] HaoYang670 commented on issue #4236: `cargo test` reports errors on the master branch.

2022-11-15 Thread GitBox
HaoYang670 commented on issue #4236: URL: https://github.com/apache/arrow-datafusion/issues/4236#issuecomment-1316485735 A weird thing is that the `cargo test` doesn't fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-datafusion] HaoYang670 opened a new issue, #4236: `cargo test` reports errors on the master branch.

2022-11-15 Thread GitBox
HaoYang670 opened a new issue, #4236: URL: https://github.com/apache/arrow-datafusion/issues/4236 **Describe the bug** A clear and concise description of what the bug is. ``` [2022-11-16T07:11:47Z ERROR datafusion::physical_plan::file_format::parquet::page_filter] Error evaluating

[GitHub] [arrow] vibhatha commented on pull request #14641: ARROW-15716: [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-15 Thread GitBox
vibhatha commented on PR #14641: URL: https://github.com/apache/arrow/pull/14641#issuecomment-1316485067 Should we close this PR? I also think handling things externally using the existing APIs is fine for the job. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4235: remove duplicate or redundant code

2022-11-15 Thread GitBox
jackwener commented on code in PR #4235: URL: https://github.com/apache/arrow-datafusion/pull/4235#discussion_r1023578666 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1001,11 +1001,6 @@ mod tests { let expr = call_fn("to_timestamp", vec![c

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #4235: remove duplicate or redundant code

2022-11-15 Thread GitBox
jackwener opened a new pull request, #4235: URL: https://github.com/apache/arrow-datafusion/pull/4235 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are these changes tested?

[GitHub] [arrow-datafusion] HaoYang670 opened a new pull request, #4234: [WIP] Unfold the `round` function in logical plan if it has 2 arguments

2022-11-15 Thread GitBox
HaoYang670 opened a new pull request, #4234: URL: https://github.com/apache/arrow-datafusion/pull/4234 Signed-off-by: remzi <1371656737...@gmail.com> # Which issue does this PR close? Closes #4191. # Rationale for this change # What changes are incl

[GitHub] [arrow-datafusion] waitingkuo commented on issue #4220: Bug displaying fractional seconds in `IntervalMonthDayNano`

2022-11-15 Thread GitBox
waitingkuo commented on issue #4220: URL: https://github.com/apache/arrow-datafusion/issues/4220#issuecomment-1316477074 @alamb just found another bug while displaying fraction seconds ```bash ❯ select interval '-0.0001 second'; +-

[GitHub] [arrow-datafusion] timvw commented on a diff in pull request #4227: refactor how we create listing tables

2022-11-15 Thread GitBox
timvw commented on code in PR #4227: URL: https://github.com/apache/arrow-datafusion/pull/4227#discussion_r1023570338 ## datafusion/core/src/execution/runtime_env.rs: ## @@ -152,7 +153,16 @@ pub struct RuntimeConfig { impl RuntimeConfig { /// New with default values p

[GitHub] [arrow] westonpace commented on pull request #14641: ARROW-15716: [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-15 Thread GitBox
westonpace commented on PR #14641: URL: https://github.com/apache/arrow/pull/14641#issuecomment-1316471121 > I am actually not fully sure if the code to evaluate pushdown filters would actually understand an isin kernel. I think this is handled in SimplifyWithGuarantee: I'm pretty su

[GitHub] [arrow-datafusion] timvw commented on a diff in pull request #4227: refactor how we create listing tables

2022-11-15 Thread GitBox
timvw commented on code in PR #4227: URL: https://github.com/apache/arrow-datafusion/pull/4227#discussion_r1023566499 ## datafusion/core/src/execution/runtime_env.rs: ## @@ -152,7 +153,16 @@ pub struct RuntimeConfig { impl RuntimeConfig { /// New with default values p

[GitHub] [arrow] vibhatha commented on pull request #14373: ARROW-17292: [C++] Segmentation fault on arrow-compute-hash-join-node-test on macos nightlies

2022-11-15 Thread GitBox
vibhatha commented on PR #14373: URL: https://github.com/apache/arrow/pull/14373#issuecomment-1316463297 > https://issues.apache.org/jira/browse/ARROW-18018 Yes probably, you're correct here. Let's close this one. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [arrow] kou commented on pull request #14576: ARROW-18232: [Release][macOS][wheel] Disable GCS/S3 tests on not available env

2022-11-15 Thread GitBox
kou commented on PR #14576: URL: https://github.com/apache/arrow/pull/14576#issuecomment-1316462157 This is needless because ARROW-17487 / #14499 removed this block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou closed pull request #14576: ARROW-18232: [Release][macOS][wheel] Disable GCS/S3 tests on not available env

2022-11-15 Thread GitBox
kou closed pull request #14576: ARROW-18232: [Release][macOS][wheel] Disable GCS/S3 tests on not available env URL: https://github.com/apache/arrow/pull/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] westonpace commented on pull request #14373: ARROW-17292: [C++] Segmentation fault on arrow-compute-hash-join-node-test on macos nightlies

2022-11-15 Thread GitBox
westonpace commented on PR #14373: URL: https://github.com/apache/arrow/pull/14373#issuecomment-1316462232 Are we still seeing the issue? I was hopeful that https://issues.apache.org/jira/browse/ARROW-18018 was the root cause for the macos nightlies failures. I imagine we've have enough r

[GitHub] [arrow-datafusion] timvw commented on pull request #4227: refactor how we create listing tables

2022-11-15 Thread GitBox
timvw commented on PR #4227: URL: https://github.com/apache/arrow-datafusion/pull/4227#issuecomment-1316461984 It is still possible to register other/unknown tableproviderfactories. This happens in eg: test sql::create_drop::create_external_table_with_ddl. There is a slight change in

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023563726 ## arrow-array/src/builder/primitive_builder.rs: ## @@ -114,6 +115,26 @@ impl PrimitiveBuilder { } } +pub fn new_from_buffer( +values_buff

[GitHub] [arrow] kou merged pull request #14577: ARROW-18233: [Release][JS] don't install yarn to system

2022-11-15 Thread GitBox
kou merged PR #14577: URL: https://github.com/apache/arrow/pull/14577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] westonpace commented on issue #14619: arrow dataset: how to use date.year and date.month as partitioning

2022-11-15 Thread GitBox
westonpace commented on issue #14619: URL: https://github.com/apache/arrow/issues/14619#issuecomment-1316460668 Are you using pyarrow? Have you looked at https://arrow.apache.org/docs/python/dataset.html#writing-partitioned-data ? > And when do filtering like date > "2021-09-02" and

[GitHub] [arrow] kou commented on pull request #14577: ARROW-18233: [Release][JS] don't install yarn to system

2022-11-15 Thread GitBox
kou commented on PR #14577: URL: https://github.com/apache/arrow/pull/14577#issuecomment-1316460495 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow] kou merged pull request #14597: ARROW-18259: [C++][CMake] Add support for system Thrift CMake package

2022-11-15 Thread GitBox
kou merged PR #14597: URL: https://github.com/apache/arrow/pull/14597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #14597: ARROW-18259: [C++][CMake] Add support for system Thrift CMake package

2022-11-15 Thread GitBox
kou commented on PR #14597: URL: https://github.com/apache/arrow/pull/14597#issuecomment-1316458908 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023559246 ## arrow-array/src/array/primitive_array.rs: ## @@ -489,6 +519,43 @@ impl PrimitiveArray { ) } } + +/// Returns `PrimitiveBuilder` of t

[GitHub] [arrow] kou merged pull request #14609: ARROW-18287: [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg

2022-11-15 Thread GitBox
kou merged PR #14609: URL: https://github.com/apache/arrow/pull/14609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #14609: ARROW-18287: [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg

2022-11-15 Thread GitBox
kou commented on PR #14609: URL: https://github.com/apache/arrow/pull/14609#issuecomment-1316457922 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[GitHub] [arrow] westonpace commented on a diff in pull request #14527: ARROW-15641: [C++][Python] UDF Aggregate Function Implementation

2022-11-15 Thread GitBox
westonpace commented on code in PR #14527: URL: https://github.com/apache/arrow/pull/14527#discussion_r1023541046 ## python/pyarrow/tests/test_udf.py: ## @@ -504,3 +504,132 @@ def test_input_lifetime(unary_func_fixture): # Calling a UDF should not have kept `v` alive longer

[GitHub] [arrow] kou merged pull request #14610: ARROW-18289: [Release][vcpkg] Add a script to update vcpkg's arrow port

2022-11-15 Thread GitBox
kou merged PR #14610: URL: https://github.com/apache/arrow/pull/14610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023553811 ## arrow-array/src/array/primitive_array.rs: ## @@ -397,6 +397,61 @@ impl PrimitiveArray { unsafe { build_primitive_array(len, buffer, null_count, null_buffer)

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023553580 ## arrow-buffer/src/buffer/mutable.rs: ## @@ -92,6 +93,23 @@ impl MutableBuffer { } } +/// Allocates a new [MutableBuffer] from given `Bytes`. +

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023552487 ## arrow-array/src/builder/boolean_buffer_builder.rs: ## @@ -33,6 +33,10 @@ impl BooleanBufferBuilder { Self { buffer, len: 0 } } +pub fn new_from_b

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023552250 ## arrow-array/src/array/primitive_array.rs: ## @@ -489,6 +544,42 @@ impl PrimitiveArray { ) } } + +/// Returns `PrimitiveBuilder` of thi

[GitHub] [arrow] vibhatha commented on pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

2022-11-15 Thread GitBox
vibhatha commented on PR #14646: URL: https://github.com/apache/arrow/pull/14646#issuecomment-1316438569 > > Good question. Should we do in a separate PR? > > I think that's fine. I also don't know if we do the URI decoding with directory or filename partitioning. If I recall, URI dec

[GitHub] [arrow] westonpace commented on pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

2022-11-15 Thread GitBox
westonpace commented on PR #14646: URL: https://github.com/apache/arrow/pull/14646#issuecomment-1316418833 > Good question. Should we do in a separate PR? I think that's fine. I also don't know if we do the URI decoding with directory or filename partitioning. If I recall, URI decod

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #4233: add a check to confirm optimizer can keep plan schema immutable.

2022-11-15 Thread GitBox
jackwener opened a new pull request, #4233: URL: https://github.com/apache/arrow-datafusion/pull/4233 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are these changes tested?

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4232: separate `projection` and `alias-projection`

2022-11-15 Thread GitBox
jackwener commented on code in PR #4232: URL: https://github.com/apache/arrow-datafusion/pull/4232#discussion_r1023529656 ## datafusion/core/tests/sql/subqueries.rs: ## @@ -56,13 +56,15 @@ where c_acctbal < ( Inner Join: customer.c_custkey = __sq_2.o_custkey Tabl

[GitHub] [arrow-datafusion] jackwener opened a new pull request, #4232: separate `projection` and `alias-projection`

2022-11-15 Thread GitBox
jackwener opened a new pull request, #4232: URL: https://github.com/apache/arrow-datafusion/pull/4232 # Which issue does this PR close? Part of #3927 #2212 We should separate `projection` and `alias-projection`. In fact, `alias-projection` is `subqueryAlias`, after this j

[GitHub] [arrow] djnavarro commented on a diff in pull request #14514: ARROW-17887: [R][Doc][WIP] Improve readability of the Get Started and README pages

2022-11-15 Thread GitBox
djnavarro commented on code in PR #14514: URL: https://github.com/apache/arrow/pull/14514#discussion_r1023528743 ## r/vignettes/data_wrangling.Rmd: ## @@ -0,0 +1,172 @@ +--- +title: "Data analysis with dplyr syntax" +description: > + Learn how to use the `dplyr` backend supplie

[GitHub] [arrow-rs] tustvold commented on pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#issuecomment-1316393749 Aah yes, my mistake I thought I had already added that, it seems I didn't get further than adding `as_slice_mut` to BufferBuilder -- This is an automated message from the Apache Git S

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023525036 ## arrow-array/src/array/primitive_array.rs: ## @@ -1939,4 +2032,52 @@ mod tests { array.value(4); } + +#[test] +fn test_into_builder() { +

[GitHub] [arrow-rs] viirya commented on pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#issuecomment-1316390606 > This is by design, as it would allow for unitialised memory. My suggestion is that `into_builder` would create a builder with the same length as the array from which it was created, i.e

[GitHub] [arrow-datafusion] mingmwang commented on issue #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang commented on issue #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230#issuecomment-1316389109 For DataFusion, a possible fix is to maintain a global `visited_left_side` data structure. And for Ballista, we can not rely on any global structure, because differen

[GitHub] [arrow] vibhatha commented on a diff in pull request #14174: ARROW-17486: [C++] Substrait To Arrow Emit feature testing

2022-11-15 Thread GitBox
vibhatha commented on code in PR #14174: URL: https://github.com/apache/arrow/pull/14174#discussion_r1023522551 ## cpp/src/arrow/engine/substrait/test_util.h: ## @@ -0,0 +1,118 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[GitHub] [arrow-datafusion] doki23 commented on pull request #4221: Nonstring partitioned cols

2022-11-15 Thread GitBox
doki23 commented on PR #4221: URL: https://github.com/apache/arrow-datafusion/pull/4221#issuecomment-1316385432 I find that [ListingTable](https://github.com/apache/arrow-datafusion/blob/406c1087bc16f8d2a49e5a9b05d2a0e1b67f7aa5/datafusion/core/src/datasource/listing/table.rs#L416) also tre

[GitHub] [arrow-datafusion] mingmwang commented on issue #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang commented on issue #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230#issuecomment-1316385502 @yahoNanJing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] mingmwang commented on issue #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang commented on issue #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230#issuecomment-1316384866 The root cause is that, for Left Out join, for each partition of the right side, they are running independently, each of them constructs the `HashJoinStream` and has th

[GitHub] [arrow] djnavarro commented on a diff in pull request #14514: ARROW-17887: [R][Doc][WIP] Improve readability of the Get Started and README pages

2022-11-15 Thread GitBox
djnavarro commented on code in PR #14514: URL: https://github.com/apache/arrow/pull/14514#discussion_r1023520893 ## r/vignettes/dataset.Rmd: ## @@ -548,4 +451,11 @@ Most file formats have magic numbers which are written at the end. This means a partial file write can safely b

[GitHub] [arrow-datafusion] HaoYang670 opened a new issue, #4231: Doc of the expression function`log2` is incorrect

2022-11-15 Thread GitBox
HaoYang670 opened a new issue, #4231: URL: https://github.com/apache/arrow-datafusion/issues/4231 **Describe the bug** A clear and concise description of what the bug is. https://github.com/apache/arrow-datafusion/blob/406c1087bc16f8d2a49e5a9b05d2a0e1b67f7aa5/datafusion/expr/src/expr_f

[GitHub] [arrow] kou merged pull request #14623: ARROW-18278: [Java] Adjust path in Maven generate-libs-jni-macos-linux

2022-11-15 Thread GitBox
kou merged PR #14623: URL: https://github.com/apache/arrow/pull/14623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] djnavarro commented on a diff in pull request #14514: ARROW-17887: [R][Doc][WIP] Improve readability of the Get Started and README pages

2022-11-15 Thread GitBox
djnavarro commented on code in PR #14514: URL: https://github.com/apache/arrow/pull/14514#discussion_r1023516220 ## r/vignettes/dataset.Rmd: ## @@ -1,157 +1,95 @@ --- -title: "Working with Arrow Datasets and dplyr" +title: "Working with multi-file data sets" +description: > +

[GitHub] [arrow] AlenkaF commented on pull request #14631: ARROW-18173: [Python] Drop older versions of Pandas (<1.0)

2022-11-15 Thread GitBox
AlenkaF commented on PR #14631: URL: https://github.com/apache/arrow/pull/14631#issuecomment-1316373299 I have applied all the changes from suggestions and can merge once the CI is green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] kou merged pull request #14653: ARROW-18336: [Release][Docs] Don't update versions not in major release

2022-11-15 Thread GitBox
kou merged PR #14653: URL: https://github.com/apache/arrow/pull/14653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-rs] tustvold commented on pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#issuecomment-1316357244 > currently doesn't offer the capacity to advance without setting its value buffer This is by design, as it would allow for unitialised memory. My suggestion is that `into_builder` w

[GitHub] [arrow-rs] viirya commented on pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
viirya commented on PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#issuecomment-1316355302 > This is really exciting, left some further comments. I had envisaged that `into_builder` and related APIs would keep the existing values. Effectively they are a way to go from the immut

[GitHub] [arrow] ursabot commented on pull request #13718: ARROW-15026: [Python] Error if datetime.timedelta to pyarrow.duration conversion overflows

2022-11-15 Thread GitBox
ursabot commented on PR #13718: URL: https://github.com/apache/arrow/pull/13718#issuecomment-1316353373 Benchmark runs are scheduled for baseline = c3cfc7934ebdc652399af95b8696bd5a05d943fa and contender = 4d8c21bd303833c124f9d5801755c953c6c3260e. 4d8c21bd303833c124f9d5801755c953c6c3260e is

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4185: Reimplement `Eliminate cross join`

2022-11-15 Thread GitBox
jackwener commented on code in PR #4185: URL: https://github.com/apache/arrow-datafusion/pull/4185#discussion_r1023477191 ## datafusion/sql/src/planner.rs: ## @@ -2955,30 +2807,6 @@ fn extract_join_keys( } } -/// Extract join keys from a WHERE clause -fn extract_possible

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4185: Reimplement `Eliminate cross join`

2022-11-15 Thread GitBox
jackwener commented on code in PR #4185: URL: https://github.com/apache/arrow-datafusion/pull/4185#discussion_r1023476969 ## datafusion/sql/src/planner.rs: ## @@ -2854,41 +2741,6 @@ fn normalize_sql_object_name(sql_object_name: &ObjectName) -> String { .join(".") }

[GitHub] [arrow] github-actions[bot] commented on pull request #14652: ARROW-18335: [CI][Release][JS] Use Node.js 16 as workaround

2022-11-15 Thread GitBox
github-actions[bot] commented on PR #14652: URL: https://github.com/apache/arrow/pull/14652#issuecomment-1316282126 Revision: 714386bc463d020d3844d17e8ae227c72c22ece9 Submitted crossbow builds: [ursacomputing/crossbow @ actions-b758d1df41](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #14652: ARROW-18335: [CI][Release][JS] Use Node.js 16 as workaround

2022-11-15 Thread GitBox
kou commented on PR #14652: URL: https://github.com/apache/arrow/pull/14652#issuecomment-1316280179 @github-actions crossbow submit verify-rc-source-js-linux-ubuntu-*-amd64 verify-rc-source-integration-linux-ubuntu-*-amd64 -- This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] kou commented on a diff in pull request #14652: ARROW-18335: [CI][Release][JS] Use Node.js 16 as workaround

2022-11-15 Thread GitBox
kou commented on code in PR #14652: URL: https://github.com/apache/arrow/pull/14652#discussion_r1023474544 ## dev/release/verify-release-candidate.sh: ## @@ -326,7 +326,9 @@ install_nodejs() { PROFILE=/dev/null bash [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"

[GitHub] [arrow] kou commented on pull request #14654: WIP: [Release] Verify release-10.0.1-rc0

2022-11-15 Thread GitBox
kou commented on PR #14654: URL: https://github.com/apache/arrow/pull/14654#issuecomment-1316276925 @nealrichardson @paleolimbot @thisisnic [r-binary-packages](https://github.com/ursacomputing/crossbow/actions/runs/347523/jobs/5809504360) failed. Could you confirm this? ```text

[GitHub] [arrow] domoritz commented on a diff in pull request #14652: ARROW-18335: [CI][Release][JS] Use Node.js 16 as workaround

2022-11-15 Thread GitBox
domoritz commented on code in PR #14652: URL: https://github.com/apache/arrow/pull/14652#discussion_r1023470519 ## dev/release/verify-release-candidate.sh: ## @@ -326,7 +326,9 @@ install_nodejs() { PROFILE=/dev/null bash [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.s

[GitHub] [arrow-rs] ursabot commented on pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
ursabot commented on PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#issuecomment-1316245597 Benchmark runs are scheduled for baseline = c95eb4c80a532653bc91e04e78814f1282c8d005 and contender = 73d66d837c20e3b80a77fdad5018f7872de4ef9d. 73d66d837c20e3b80a77fdad5018f7872de4ef9d i

[GitHub] [arrow-rs] tustvold merged pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
tustvold merged PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4219: [CBO] JoinSelection Rule, select HashJoin Partition Mode based on the available statistics

2022-11-15 Thread GitBox
mingmwang commented on PR #4219: URL: https://github.com/apache/arrow-datafusion/pull/4219#issuecomment-1316235523 There is some bug with HashJoin CollectLeft: https://github.com/apache/arrow-datafusion/issues/4230 -- This is an automated message from the Apache Git Service

[GitHub] [arrow-datafusion] mingmwang commented on issue #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang commented on issue #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230#issuecomment-1316234564 expected: [ "+---+-+-+", "| t1_id | t1_name | t2_name |", "+---+-+-+", "| 11| a | z

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023452284 ## arrow-array/src/array/primitive_array.rs: ## @@ -489,6 +544,42 @@ impl PrimitiveArray { ) } } + +/// Returns `PrimitiveBuilder` of t

[GitHub] [arrow-datafusion] mingmwang commented on issue #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang commented on issue #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230#issuecomment-1316230360 @alamb @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-datafusion] mingmwang opened a new issue, #4230: HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result

2022-11-15 Thread GitBox
mingmwang opened a new issue, #4230: URL: https://github.com/apache/arrow-datafusion/issues/4230 **Describe the bug** A clear and concise description of what the bug is. The join result is wrong. **To Reproduce** Steps to reproduce the behavior: #[tokio::test] as

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023450806 ## arrow-array/src/array/primitive_array.rs: ## @@ -397,6 +397,61 @@ impl PrimitiveArray { unsafe { build_primitive_array(len, buffer, null_count, null_buffe

[GitHub] [arrow] vibhatha commented on pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

2022-11-15 Thread GitBox
vibhatha commented on PR #14646: URL: https://github.com/apache/arrow/pull/14646#issuecomment-1316222474 > This fixes hive partitioning. Do we need to also fix directory partitioning or filename partitioning? Good question. Should we do in a separate PR? -- This is an automated me

[GitHub] [arrow] vibhatha commented on a diff in pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

2022-11-15 Thread GitBox
vibhatha commented on code in PR #14646: URL: https://github.com/apache/arrow/pull/14646#discussion_r1023451492 ## python/pyarrow/tests/test_dataset.py: ## @@ -4912,3 +4912,33 @@ def test_read_table_nested_columns(tempdir, format): {'user_id': 'qrs456', 'type': 'scroll'

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3115: Add COW conversion for Buffer and PrimitiveArray and unary_mut

2022-11-15 Thread GitBox
tustvold commented on code in PR #3115: URL: https://github.com/apache/arrow-rs/pull/3115#discussion_r1023450806 ## arrow-array/src/array/primitive_array.rs: ## @@ -397,6 +397,61 @@ impl PrimitiveArray { unsafe { build_primitive_array(len, buffer, null_count, null_buffe

[GitHub] [arrow] ursabot commented on pull request #14395: ARROW-17960: [C++][Python] Implement list_slice kernel

2022-11-15 Thread GitBox
ursabot commented on PR #14395: URL: https://github.com/apache/arrow/pull/14395#issuecomment-1316217185 Benchmark runs are scheduled for baseline = 3b852e49fec85b57545c6edc6c66d3da93de2c06 and contender = c3cfc7934ebdc652399af95b8696bd5a05d943fa. c3cfc7934ebdc652399af95b8696bd5a05d943fa is

[GitHub] [arrow-datafusion] doki23 commented on pull request #4221: Nonstring partitioned cols

2022-11-15 Thread GitBox
doki23 commented on PR #4221: URL: https://github.com/apache/arrow-datafusion/pull/4221#issuecomment-1316164123 > Thank you @doki23 -- I think the functionality needs some test of a non-string partition column before we would consider merging it. > > Otherwise how would we know if we

[GitHub] [arrow-datafusion] doki23 commented on pull request #4221: Nonstring partitioned cols

2022-11-15 Thread GitBox
doki23 commented on PR #4221: URL: https://github.com/apache/arrow-datafusion/pull/4221#issuecomment-1316161354 > I think not all the types of columns can be used as partition columns, should there a white list for supported types? > And can you explain a little how you will infer th

[GitHub] [arrow-datafusion] doki23 commented on pull request #4194: create table with schema

2022-11-15 Thread GitBox
doki23 commented on PR #4194: URL: https://github.com/apache/arrow-datafusion/pull/4194#issuecomment-1316147872 > Thank you @doki23 -- this looks like a nice improvement! I have a few suggestions -- the only thing I think is needed prior to merge is a test for mismatched column count. >

[GitHub] [arrow-datafusion] doki23 commented on a diff in pull request #4194: create table with schema

2022-11-15 Thread GitBox
doki23 commented on code in PR #4194: URL: https://github.com/apache/arrow-datafusion/pull/4194#discussion_r1023411537 ## datafusion/sql/src/planner.rs: ## @@ -180,12 +180,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { if_not_exists, or_re

[GitHub] [arrow-datafusion] doki23 commented on a diff in pull request #4194: create table with schema

2022-11-15 Thread GitBox
doki23 commented on code in PR #4194: URL: https://github.com/apache/arrow-datafusion/pull/4194#discussion_r1023411537 ## datafusion/sql/src/planner.rs: ## @@ -180,12 +180,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { if_not_exists, or_re

[GitHub] [arrow-datafusion] doki23 commented on a diff in pull request #4194: create table with schema

2022-11-15 Thread GitBox
doki23 commented on code in PR #4194: URL: https://github.com/apache/arrow-datafusion/pull/4194#discussion_r1023411537 ## datafusion/sql/src/planner.rs: ## @@ -180,12 +180,36 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { if_not_exists, or_re

[GitHub] [arrow-datafusion] ygf11 commented on issue #4210: Add ambiguous check when generate projection plan

2022-11-15 Thread GitBox
ygf11 commented on issue #4210: URL: https://github.com/apache/arrow-datafusion/issues/4210#issuecomment-1316123771 Like join and selection in PostgreSQL , it will report an error: ```sql psql -d "$POSTGRES_DB" -h "$POSTGRES_HOST" -p "$POSTGRES_PORT" -U "$POSTGRES_USER" -c "select

[GitHub] [arrow-rs] Jimexist commented on a diff in pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
Jimexist commented on code in PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1023393988 ## parquet/src/file/reader.rs: ## @@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync { Ok(col_reader) } +#[cfg(feature = "bloom")] Review

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
tustvold commented on code in PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1023393798 ## parquet/src/file/reader.rs: ## @@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync { Ok(col_reader) } +#[cfg(feature = "bloom")] Review

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
tustvold commented on code in PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1023393798 ## parquet/src/file/reader.rs: ## @@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync { Ok(col_reader) } +#[cfg(feature = "bloom")] Review

[GitHub] [arrow] westonpace commented on a diff in pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

2022-11-15 Thread GitBox
westonpace commented on code in PR #14646: URL: https://github.com/apache/arrow/pull/14646#discussion_r1023392610 ## python/pyarrow/tests/test_dataset.py: ## @@ -4912,3 +4912,33 @@ def test_read_table_nested_columns(tempdir, format): {'user_id': 'qrs456', 'type': 'scrol

[GitHub] [arrow-rs] Jimexist commented on a diff in pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
Jimexist commented on code in PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1023389113 ## parquet/src/file/reader.rs: ## @@ -143,6 +145,10 @@ pub trait RowGroupReader: Send + Sync { Ok(col_reader) } +#[cfg(feature = "bloom")] Review

[GitHub] [arrow] kou commented on pull request #14654: WIP: [Release] Verify release-10.0.1-rc0

2022-11-15 Thread GitBox
kou commented on PR #14654: URL: https://github.com/apache/arrow/pull/14654#issuecomment-1316086506 Revision: a6eabc2b890030578131aecc5e85900597d694a4 Submitted crossbow builds: [ursacomputing/crossbow @ release-10.0.1-rc0-0](https://github.com/ursacomputing/crossbow/branches/all?quer

[GitHub] [arrow-rs] Jimexist commented on a diff in pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

2022-11-15 Thread GitBox
Jimexist commented on code in PR #3102: URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1023386158 ## parquet/src/bin/parquet-show-bloom-filter.rs: ## @@ -0,0 +1,110 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

  1   2   3   4   5   >