[GitHub] [arrow] codecov-io commented on pull request #9243: ARROW-11298: [Rust][DataFusion] Implement Postgres String Functions [WIP]

2021-01-20 Thread GitBox
codecov-io commented on pull request #9243: URL: https://github.com/apache/arrow/pull/9243#issuecomment-764451929 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9243?src=pr&el=h1) Report > Merging [#9243](https://codecov.io/gh/apache/arrow/pull/9243?src=pr&el=desc) (64abbc5) into

[GitHub] [arrow] seddonm1 commented on a change in pull request #9243: ARROW-11298: [Rust][DataFusion] Implement Postgres String Functions [WIP]

2021-01-20 Thread GitBox
seddonm1 commented on a change in pull request #9243: URL: https://github.com/apache/arrow/pull/9243#discussion_r561653478 ## File path: rust/datafusion/src/physical_plan/string_expressions.rs ## @@ -34,42 +34,446 @@ macro_rules! downcast_vec { }}; } -/// concatenate st

[GitHub] [arrow] jorgecarleitao commented on pull request #9281: ARROW-11333: [Rust] Generalized creation of empty arrays.

2021-01-20 Thread GitBox
jorgecarleitao commented on pull request #9281: URL: https://github.com/apache/arrow/pull/9281#issuecomment-764422417 @ovr, this is what I was trying to express in your PR: we can use `ArrayData` directly and thereby support arbitrary nested types.

[GitHub] [arrow] github-actions[bot] commented on pull request #9281: ARROW-11333: [Rust] Generalized creation of empty arrays.

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9281: URL: https://github.com/apache/arrow/pull/9281#issuecomment-764412043 https://issues.apache.org/jira/browse/ARROW-11333 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorgecarleitao opened a new pull request #9281: ARROW-11333: [Rust] Generalized creation of empty arrays.

2021-01-20 Thread GitBox
jorgecarleitao opened a new pull request #9281: URL: https://github.com/apache/arrow/pull/9281 # Rational Creating an empty array was scattered around parquet, arrow and datafusion crates, mostly derived from need. Recently, #9156 harmonized some of that for list arrays. However

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -86,13 +86,9 @@ impl PrimitiveArray { } /// Returns the primitive value at in

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -86,13 +86,9 @@ impl PrimitiveArray { } /// Returns the primitive value at in

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -86,13 +86,9 @@ impl PrimitiveArray { } /// Returns the primitive value at in

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9215: URL: https://github.com/apache/arrow/pull/9215#discussion_r561618199 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -86,13 +86,9 @@ impl PrimitiveArray { } /// Returns the primitive value at in

[GitHub] [arrow] codecov-io edited a comment on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

2021-01-20 Thread GitBox
codecov-io edited a comment on pull request #9122: URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (8ccc5bd)

[GitHub] [arrow] nevi-me commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

2021-01-20 Thread GitBox
nevi-me commented on pull request #9122: URL: https://github.com/apache/arrow/pull/9122#issuecomment-764332590 > @nevi-me is this one ready to merge? Should we rebase it against latest master and rerun the checks? Or I can merge it in as is too I've rebased, and changed a few comment

[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
codecov-io edited a comment on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (f5413b1)

[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
codecov-io edited a comment on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (2b5cc6a)

[GitHub] [arrow] cyb70289 commented on a change in pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

2021-01-20 Thread GitBox
cyb70289 commented on a change in pull request #9280: URL: https://github.com/apache/arrow/pull/9280#discussion_r561589436 ## File path: cpp/src/arrow/compute/internals_benchmark.cc ## @@ -0,0 +1,86 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

[GitHub] [arrow] cyb70289 commented on a change in pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

2021-01-20 Thread GitBox
cyb70289 commented on a change in pull request #9280: URL: https://github.com/apache/arrow/pull/9280#discussion_r561589279 ## File path: cpp/src/arrow/compute/internals_benchmark.cc ## @@ -0,0 +1,86 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

[GitHub] [arrow] cyb70289 commented on pull request #9274: ARROW-11299: [C++][Python] Fix invalid-offsetof warnings

2021-01-20 Thread GitBox
cyb70289 commented on pull request #9274: URL: https://github.com/apache/arrow/pull/9274#issuecomment-764217852 > @cyb70289 `arrow::dataset::Expression::Call::options` requires that virtual destructor since a `shared_ptr` is used to store subclasses of FunctionOptions. Removing this virtua

[GitHub] [arrow] liyafan82 commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflo

2021-01-20 Thread GitBox
liyafan82 commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-764200223 > @liyafan82 But, I don't agree. See this section comment in source code: > https://github.com/apache/arrow/blob/691286975f277f00586cabc6d834ff1efd8caf8c/java/vector/src/main/

[GitHub] [arrow] mathyingzhou commented on a change in pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou commented on a change in pull request #8648: URL: https://github.com/apache/arrow/pull/8648#discussion_r561521360 ## File path: cpp/src/arrow/adapters/orc/adapter_util.cc ## @@ -316,10 +326,482 @@ Status AppendBatch(const liborc::Type* type, liborc::ColumnVectorBa

[GitHub] [arrow] mathyingzhou commented on a change in pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou commented on a change in pull request #8648: URL: https://github.com/apache/arrow/pull/8648#discussion_r561520934 ## File path: cpp/src/arrow/adapters/orc/adapter_util.cc ## @@ -316,10 +326,482 @@ Status AppendBatch(const liborc::Type* type, liborc::ColumnVectorBa

[GitHub] [arrow] mathyingzhou commented on a change in pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou commented on a change in pull request #8648: URL: https://github.com/apache/arrow/pull/8648#discussion_r561520435 ## File path: cpp/src/arrow/adapters/orc/adapter_util.cc ## @@ -40,15 +44,21 @@ namespace orc { using internal::checked_cast; -// The number of na

[GitHub] [arrow] liyafan82 commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflo

2021-01-20 Thread GitBox
liyafan82 commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-764197687 > @liyafan82, does it imply that `setValueCount` should always be called before `vector.getBufferSizeFor` to have the guaranteed result without failure? It would be great if we

[GitHub] [arrow] codecov-io edited a comment on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
codecov-io edited a comment on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-761621256 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=h1) Report > Merging [#9215](https://codecov.io/gh/apache/arrow/pull/9215?src=pr&el=desc) (5cd8075)

[GitHub] [arrow] WeichenXu123 edited a comment on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buff

2021-01-20 Thread GitBox
WeichenXu123 edited a comment on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-763540715 > In a streaming scenario, in which you fill up batches up to a given size, you need to check the buffer size without actually closing the batch. Yes. and I thi

[GitHub] [arrow] tyrelr commented on pull request #9215: ARROW-11270: [Rust] Array slice accessors

2021-01-20 Thread GitBox
tyrelr commented on pull request #9215: URL: https://github.com/apache/arrow/pull/9215#issuecomment-764164139 Rebased. But I didn't hit the merge conflicts during... I can see that a number of tests were swapped to use Buffer::from_slice_ref()... so I'll add another commit to switch to t

[GitHub] [arrow] github-actions[bot] commented on pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9280: URL: https://github.com/apache/arrow/pull/9280#issuecomment-764142561 https://issues.apache.org/jira/browse/ARROW-8928 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

2021-01-20 Thread GitBox
wesm opened a new pull request #9280: URL: https://github.com/apache/arrow/pull/9280 These are only preliminary benchmarks but may help in examining microperformance overhead related to `ExecBatch` and its implementation (as a `vector`). It may be desirable to devise an "array refer

[GitHub] [arrow] rtyler commented on a change in pull request #9240: ARROW-10766: [Rust] [Parquet] Compute nested list definitions

2021-01-20 Thread GitBox
rtyler commented on a change in pull request #9240: URL: https://github.com/apache/arrow/pull/9240#discussion_r561361029 ## File path: rust/parquet/src/arrow/array_reader.rs ## @@ -912,11 +912,36 @@ impl ArrayReader for ListArrayReader { )); } -

[GitHub] [arrow] mathyingzhou edited a comment on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou edited a comment on pull request #8648: URL: https://github.com/apache/arrow/pull/8648#issuecomment-764050107 > I didn't see anything rust related in this PR so I removed the Rust label @alamb Thanks! There is nothing parquet-related in this PR either. Can the parquet la

[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou commented on pull request #8648: URL: https://github.com/apache/arrow/pull/8648#issuecomment-764050107 > I didn't see anything rust related in this PR so I removed the Rust label @alamb Sorry it didn’t appear in the right place but there is nothing parquet-related in thi

[GitHub] [arrow] mathyingzhou commented on a change in pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-01-20 Thread GitBox
mathyingzhou commented on a change in pull request #8648: URL: https://github.com/apache/arrow/pull/8648#discussion_r561405645 ## File path: cpp/src/arrow/adapters/orc/adapter_test.cc ## @@ -157,4 +225,2478 @@ TEST(TestAdapter, readIntAndStringFileMultipleStripes) { EXPECT

[GitHub] [arrow] alamb commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9278: URL: https://github.com/apache/arrow/pull/9278#discussion_r561366255 ## File path: rust/datafusion/src/logical_plan/expr.rs ## @@ -422,6 +422,136 @@ impl Expr { nulls_first, } } + +/// Performs a

[GitHub] [arrow] alamb commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
alamb commented on pull request #9278: URL: https://github.com/apache/arrow/pull/9278#issuecomment-764011355 Thanks @jorgecarleitao and @Dandandan . I am personally quite excited at using this same idea for expression rewriting. --

[GitHub] [arrow] Dandandan commented on a change in pull request #9279: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread GitBox
Dandandan commented on a change in pull request #9279: URL: https://github.com/apache/arrow/pull/9279#discussion_r561307105 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -423,7 +423,7 @@ where let mut offsets_buffer = MutableBuffer::from_len_zeroed(bytes_offse

[GitHub] [arrow] Dandandan commented on pull request #9279: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread GitBox
Dandandan commented on pull request #9279: URL: https://github.com/apache/arrow/pull/9279#issuecomment-763951458 I think a nice followup for `take_string` would maybe be looking into using the offsets data to calculate the total length of the buffer to allocate, which should be pretty fast

[GitHub] [arrow] Dandandan commented on a change in pull request #9279: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread GitBox
Dandandan commented on a change in pull request #9279: URL: https://github.com/apache/arrow/pull/9279#discussion_r561307105 ## File path: rust/arrow/src/compute/kernels/take.rs ## @@ -423,7 +423,7 @@ where let mut offsets_buffer = MutableBuffer::from_len_zeroed(bytes_offse

[GitHub] [arrow] kou commented on pull request #9155: ARROW-6103: [Java] Remove mvn release plugin [WIP]

2021-01-20 Thread GitBox
kou commented on pull request #9155: URL: https://github.com/apache/arrow/pull/9155#issuecomment-763948236 Great! Can we only build Java packages(*) without publishing to the Maven Central Repository for testing? For example, our nightly Gandiva for Java builds[1] just build `*.jar`

[GitHub] [arrow] github-actions[bot] commented on pull request #9279: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9279: URL: https://github.com/apache/arrow/pull/9279#issuecomment-763944867 https://issues.apache.org/jira/browse/ARROW-11332 This is an automated message from the Apache Git Ser

[GitHub] [arrow] Dandandan opened a new pull request #9279: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

2021-01-20 Thread GitBox
Dandandan opened a new pull request #9279: URL: https://github.com/apache/arrow/pull/9279 This PR changes take string to use `MutableBuffer` to create a byte array directly instead of converting it from a `Vec`. There used to be some overhead compared to using a `Vec` and converting it

[GitHub] [arrow] andygrove commented on pull request #9155: ARROW-6103: [Java] Remove mvn release plugin [WIP]

2021-01-20 Thread GitBox
andygrove commented on pull request #9155: URL: https://github.com/apache/arrow/pull/9155#issuecomment-763939243 > > > Sorry for my late response. > > We have a post manual(!) task that publishes Java packages to the Maven Central Repository: https://cwiki.apache.org/conflue

[GitHub] [arrow] kou commented on pull request #9155: ARROW-6103: [Java] Remove mvn release plugin [WIP]

2021-01-20 Thread GitBox
kou commented on pull request #9155: URL: https://github.com/apache/arrow/pull/9155#issuecomment-763933322 Sorry for my late response. We have a post manual(!) task that publishes Java packages to the Maven Central Repository: https://cwiki.apache.org/confluence/display/ARROW/Releas

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao edited a comment on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758 I made a mistake in benchmarking and have to retract my statement. I am really sorry for the noise. :( I mistakenly updated the wrong part of the code before r

[GitHub] [arrow] andygrove commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
andygrove commented on pull request #9278: URL: https://github.com/apache/arrow/pull/9278#issuecomment-763902393 I just wanted to add that there is no need to wait for me to review before merging. This is an automated messag

[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758 I made a mistake in benchmarking and have to retract my statement. I am really sorry for the noise. :( I mistakenly updated the wrong part of the code before running

[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, i

[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, i

[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, i

[GitHub] [arrow] codecov-io edited a comment on pull request #9276: ARROW-11327: [Rust][DataFusion] Add DictionarySupport to create_batch_empty

2021-01-20 Thread GitBox
codecov-io edited a comment on pull request #9276: URL: https://github.com/apache/arrow/pull/9276#issuecomment-763672876 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9276?src=pr&el=h1) Report > Merging [#9276](https://codecov.io/gh/apache/arrow/pull/9276?src=pr&el=desc) (07d8ce8)

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9278: URL: https://github.com/apache/arrow/pull/9278#discussion_r561229060 ## File path: rust/datafusion/src/logical_plan/expr.rs ## @@ -422,6 +422,136 @@ impl Expr { nulls_first, } } + +/// Pe

[GitHub] [arrow] andygrove commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
andygrove commented on pull request #9278: URL: https://github.com/apache/arrow/pull/9278#issuecomment-763885202 I took a quick look and this looks good to me. I will try and review in more detail tonight. This is an automat

[GitHub] [arrow] mbrubeck commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763880774 > When we commit to master we squash all commits. Would you be willing to take credit for this whole PR + your changes? I think it would be more appropriate to keep your na

[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763878812 I validated that with @mbrubeck 's change, we can just use `collect` and drop the `trusted_len` and `unsafe` hack with the same performance on the arithmetics 🎉 🎉🎉 @

[GitHub] [arrow] mbrubeck edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck edited a comment on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523 I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts. This version is only about 8% slower than `extend_from_

[GitHub] [arrow] mbrubeck edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck edited a comment on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523 I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts. This version is only about 8% slower than `extend_from_

[GitHub] [arrow] github-actions[bot] commented on pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9278: URL: https://github.com/apache/arrow/pull/9278#issuecomment-763867008 https://issues.apache.org/jira/browse/ARROW-11330 This is an automated message from the Apache Git Ser

[GitHub] [arrow] alamb commented on a change in pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encode expression walking

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9278: URL: https://github.com/apache/arrow/pull/9278#discussion_r561197743 ## File path: rust/datafusion/src/optimizer/utils.rs ## @@ -46,75 +50,48 @@ pub fn exprlist_to_column_names( /// Recursively walk an expression tree, colle

[GitHub] [arrow] alamb opened a new pull request #9278: ARROW-11330: [Rust][DataFusion] add ExpressionVisitor to encod

2021-01-20 Thread GitBox
alamb opened a new pull request #9278: URL: https://github.com/apache/arrow/pull/9278 ## Problem: * There are several places in the DataFusion codebase where a walk of an Expression tree is needed * The logic of how to walk the tree is replicated * Adding new expression types often

[GitHub] [arrow] mbrubeck commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523 I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts. This version is only about 8% slower than `extend_from_trusted

[GitHub] [arrow] jorgecarleitao closed pull request #9277: ARROW-11329: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
jorgecarleitao closed pull request #9277: URL: https://github.com/apache/arrow/pull/9277 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #9277: ARROW-11329: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
github-actions[bot] removed a comment on pull request #9277: URL: https://github.com/apache/arrow/pull/9277#issuecomment-763808991 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Th

[GitHub] [arrow] alamb closed pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
alamb closed pull request #9114: URL: https://github.com/apache/arrow/pull/9114 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb edited a comment on pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
alamb edited a comment on pull request #9114: URL: https://github.com/apache/arrow/pull/9114#issuecomment-763838115 The travis CI run is backed up -- https://github.com/apache/arrow/pull/9114/checks?check_run_id=1735587739 hasn't finished -- and this PR has no non-rust changes. I think it

[GitHub] [arrow] alamb commented on pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
alamb commented on pull request #9114: URL: https://github.com/apache/arrow/pull/9114#issuecomment-763838115 The travis CI run is backed up -- and this PR has no non-rust changes. I think it is good to go This is an automate

[GitHub] [arrow] alamb commented on pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
alamb commented on pull request #9114: URL: https://github.com/apache/arrow/pull/9114#issuecomment-763837298 Awesome -- thanks @ovr ! This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] bkietz commented on pull request #9274: ARROW-11299: [C++][Python] Fix invalid-offsetof warnings

2021-01-20 Thread GitBox
bkietz commented on pull request #9274: URL: https://github.com/apache/arrow/pull/9274#issuecomment-763833521 I think the best way to resolve this issue is to avoid direct instances of derived classes of FunctionOptions in Cython, since Cython inappropriately applies `offsetof`. For exampl

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
jorgecarleitao commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0

[GitHub] [arrow] codecov-io commented on pull request #9277: ARROW-11329: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
codecov-io commented on pull request #9277: URL: https://github.com/apache/arrow/pull/9277#issuecomment-763823687 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9277?src=pr&el=h1) Report > Merging [#9277](https://codecov.io/gh/apache/arrow/pull/9277?src=pr&el=desc) (c636cf1) into

[GitHub] [arrow] bkietz commented on pull request #9274: ARROW-11299: [C++][Python] Fix invalid-offsetof warnings

2021-01-20 Thread GitBox
bkietz commented on pull request #9274: URL: https://github.com/apache/arrow/pull/9274#issuecomment-763823491 @cyb70289 `arrow::dataset::Expression::Call::options` requires that virtual destructor since a `shared_ptr` is used to store subclasses of FunctionOptions. Removing this virtual de

[GitHub] [arrow] github-actions[bot] commented on pull request #9277: ARROW-11329: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9277: URL: https://github.com/apache/arrow/pull/9277#issuecomment-763817914 https://issues.apache.org/jira/browse/ARROW-11329 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #9277: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9277: URL: https://github.com/apache/arrow/pull/9277#issuecomment-763808991 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mqy commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561133856 ## File path: rust/arrow/src/buffer.rs ## @@ -1008,7 +1008,7 @@ impl MutableBuffer { let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };

[GitHub] [arrow] mbrubeck opened a new pull request #9277: [Rust] Don't rerun build.rs on every file change

2021-01-20 Thread GitBox
mbrubeck opened a new pull request #9277: URL: https://github.com/apache/arrow/pull/9277 This speeds up development by avoiding rebuilding the whole library when any file in the package directory is touched. This is an aut

[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mbrubeck commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561128399 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,131 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, i

[GitHub] [arrow] mqy commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mqy commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763763668 > Looks like there is an error in the SIMD tests https://github.com/apache/arrow/pull/9235/checks? `cargo test --doc --package arrow -- buffer::MutableBuffer::extend_from_trust

[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mqy commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,130 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, increm

[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mqy commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,130 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, increm

[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
mqy commented on a change in pull request #9235: URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192 ## File path: rust/arrow/src/buffer.rs ## @@ -963,11 +968,130 @@ impl MutableBuffer { /// Extends the buffer by `additional` bytes equal to `0u8`, increm

[GitHub] [arrow] ovr commented on pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
ovr commented on pull request #9114: URL: https://github.com/apache/arrow/pull/9114#issuecomment-763672988 @alamb Rebased + renamed NYI -> `NotYetImplemented` in Arrow. Thanks This is an automated message from the Apache Git

[GitHub] [arrow] codecov-io commented on pull request #9276: ARROW-11327: [Rust][DataFusion] Add DictionarySupport to create_batch_empty

2021-01-20 Thread GitBox
codecov-io commented on pull request #9276: URL: https://github.com/apache/arrow/pull/9276#issuecomment-763672876 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9276?src=pr&el=h1) Report > Merging [#9276](https://codecov.io/gh/apache/arrow/pull/9276?src=pr&el=desc) (47bf88b) into

[GitHub] [arrow] codecov-io commented on pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

2021-01-20 Thread GitBox
codecov-io commented on pull request #9271: URL: https://github.com/apache/arrow/pull/9271#issuecomment-763660623 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9271?src=pr&el=h1) Report > Merging [#9271](https://codecov.io/gh/apache/arrow/pull/9271?src=pr&el=desc) (9933c7b) into

[GitHub] [arrow] alamb commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

2021-01-20 Thread GitBox
alamb commented on pull request #9235: URL: https://github.com/apache/arrow/pull/9235#issuecomment-763651758 Looks like there is an error in the SIMD tests https://github.com/apache/arrow/pull/9235/checks?check_run_id=1733044845 ``` failures: src/buffer.rs - buffer

[GitHub] [arrow] github-actions[bot] commented on pull request #9276: ARROW-11327: [Rust][DataFusion] Add DictionarySupport to create_batch_empty

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9276: URL: https://github.com/apache/arrow/pull/9276#issuecomment-763648633 https://issues.apache.org/jira/browse/ARROW-11327 This is an automated message from the Apache Git Ser

[GitHub] [arrow] Dandandan commented on pull request #9271: ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups

2021-01-20 Thread GitBox
Dandandan commented on pull request #9271: URL: https://github.com/apache/arrow/pull/9271#issuecomment-763646185 This PR in itself is ready for review now. The change in approach will help us in the future, even more with `Array.slice` being improved (will create a ticket later), execution

[GitHub] [arrow] alamb commented on a change in pull request #9276: ARROW-11327: [Rust][DataFusion] Add DictionarySupport to create_batch_empty

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9276: URL: https://github.com/apache/arrow/pull/9276#discussion_r561003845 ## File path: rust/datafusion/src/physical_plan/common.rs ## @@ -121,121 +128,130 @@ pub fn build_file_list(dir: &str, filenames: &mut Vec, ext: &str) -> Res

[GitHub] [arrow] alamb opened a new pull request #9276: ARROW-11327: [Rust][DataFusion] Add DictionarySupport to create_batch_empty

2021-01-20 Thread GitBox
alamb opened a new pull request #9276: URL: https://github.com/apache/arrow/pull/9276 The `create_batch_empty` function is used for creating output during aggregation. As part of my plan for better dictionary support it also needs to support DictionaryArray as well. I think this may

[GitHub] [arrow] alamb commented on a change in pull request #9275: ARROW-11323: [Rust][DataFusion] Allow sort queries to return no results

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9275: URL: https://github.com/apache/arrow/pull/9275#discussion_r560979269 ## File path: rust/datafusion/src/physical_plan/sort.rs ## @@ -141,7 +141,10 @@ fn sort_batches( batches: &Vec, schema: &SchemaRef, expr: &[Phy

[GitHub] [arrow] WeichenXu123 commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer over

2021-01-20 Thread GitBox
WeichenXu123 commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-763621581 @liyafan82 But, I don't agree. See this section comment in source code: https://github.com/apache/arrow/blob/691286975f277f00586cabc6d834ff1efd8caf8c/java/vector/src/main/j

[GitHub] [arrow] HyukjinKwon commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overf

2021-01-20 Thread GitBox
HyukjinKwon commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-763607402 @liyafan82, does it imply that `setValueCount` should always be called before `vector.getBufferSizeFor` to have the guaranteed result without failure? It would be great if we

[GitHub] [arrow] alamb commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

2021-01-20 Thread GitBox
alamb commented on pull request #9122: URL: https://github.com/apache/arrow/pull/9122#issuecomment-763601989 @nevi-me is this one ready to merge? Should we rebase it against latest master and rerun the checks? Or I can merge it in as is too ---

[GitHub] [arrow] liyafan82 commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflo

2021-01-20 Thread GitBox
liyafan82 commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-763600746 @bogdanghit @WeichenXu123 Thanks for your feedback. IMO, calling `setValueCount` does not close the vector. You can still append values and call `setValueCount` multiple t

[GitHub] [arrow] alamb edited a comment on pull request #9262: ARROW-11317: [Rust] Include the prettyprint feature in CI Coverage

2021-01-20 Thread GitBox
alamb edited a comment on pull request #9262: URL: https://github.com/apache/arrow/pull/9262#issuecomment-763597193 You can now see that the pretty print tests are running: ![Screen Shot 2021-01-20 at 8 12 13 AM](https://user-images.githubusercontent.com/490673/105179381-479fa980-5af

[GitHub] [arrow] alamb commented on pull request #9262: ARROW-11317: [Rust] Include the prettyprint feature in CI Coverage

2021-01-20 Thread GitBox
alamb commented on pull request #9262: URL: https://github.com/apache/arrow/pull/9262#issuecomment-763597193 You can now see that the pretty print tests are running: ![Uploading Screen Shot 2021-01-20 at 8.12.13 AM.png…]()

[GitHub] [arrow] alamb commented on a change in pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9264: URL: https://github.com/apache/arrow/pull/9264#discussion_r560947936 ## File path: rust/datafusion/src/test/mod.rs ## @@ -106,137 +106,7 @@ pub fn aggr_test_schema() -> SchemaRef { ])) } -/// Format a batch as csv -pub

[GitHub] [arrow] codecov-io commented on pull request #9275: ARROW-11323: [Rust][DataFusion] Allow sort queries to return no results

2021-01-20 Thread GitBox
codecov-io commented on pull request #9275: URL: https://github.com/apache/arrow/pull/9275#issuecomment-763592223 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9275?src=pr&el=h1) Report > Merging [#9275](https://codecov.io/gh/apache/arrow/pull/9275?src=pr&el=desc) (0297af3) into

[GitHub] [arrow] alamb commented on a change in pull request #9114: ARROW-11149: [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty

2021-01-20 Thread GitBox
alamb commented on a change in pull request #9114: URL: https://github.com/apache/arrow/pull/9114#discussion_r560935453 ## File path: rust/arrow/src/error.rs ## @@ -24,6 +24,9 @@ use std::error::Error; /// Many different operations in the `arrow` crate return this error type.

[GitHub] [arrow] github-actions[bot] commented on pull request #9275: ARROW-11323: [Rust][DataFusion] Allow sort queries to return no results

2021-01-20 Thread GitBox
github-actions[bot] commented on pull request #9275: URL: https://github.com/apache/arrow/pull/9275#issuecomment-763583275 https://issues.apache.org/jira/browse/ARROW-11323 This is an automated message from the Apache Git Ser

[GitHub] [arrow] alamb opened a new pull request #9275: ARROW-11323: [Rust][DataFusion] Allow sort queries to return no results

2021-01-20 Thread GitBox
alamb opened a new pull request #9275: URL: https://github.com/apache/arrow/pull/9275 Prior to this PR, if a plan had an ORDER BY (Sort) that got no input rows, you would get an output error. Now the test passes and produces the (expected) no output rows ---

[GitHub] [arrow] alamb closed pull request #9234: ARROW-11290: [Rust][DataFusion] Address hash aggregate performance issue with high number of groups

2021-01-20 Thread GitBox
alamb closed pull request #9234: URL: https://github.com/apache/arrow/pull/9234 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9234: ARROW-11290: [Rust][DataFusion] Address hash aggregate performance issue with high number of groups

2021-01-20 Thread GitBox
alamb commented on pull request #9234: URL: https://github.com/apache/arrow/pull/9234#issuecomment-763570051 I merged this branch locally to master and re-ran all the tests. Things looked good so merging it in. This is an a

  1   2   >