[GitHub] [arrow-datafusion] Jimexist opened a new issue #1232: add feature to allow datafusion cli to list columns from a table

2021-11-02 Thread GitBox
Jimexist opened a new issue #1232: URL: https://github.com/apache/arrow-datafusion/issues/1232 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** add feature to allow datafusion cli to list columns from a table **Describe

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1231: Add cli show columns

2021-11-02 Thread GitBox
Jimexist opened a new pull request #1231: URL: https://github.com/apache/arrow-datafusion/pull/1231 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

[GitHub] [arrow-datafusion] Jimexist merged pull request #1229: datafusion-cli to add list table command

2021-11-02 Thread GitBox
Jimexist merged pull request #1229: URL: https://github.com/apache/arrow-datafusion/pull/1229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gith

[GitHub] [arrow-datafusion] Jimexist closed issue #1230: datafusion cli to add list tables command

2021-11-02 Thread GitBox
Jimexist closed issue #1230: URL: https://github.com/apache/arrow-datafusion/issues/1230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

2021-11-02 Thread GitBox
Dandandan commented on issue #700: URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958685830 AFAIK No @mingmwang . Feel free to work on this - that would be great. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] vvellanki commented on a change in pull request #11180: ARROW-14032: [C++][Gandiva] Add concat_ws hive function to gandiva

2021-11-02 Thread GitBox
vvellanki commented on a change in pull request #11180: URL: https://github.com/apache/arrow/pull/11180#discussion_r741641131 ## File path: cpp/src/gandiva/gdv_function_stubs.cc ## @@ -794,6 +794,56 @@ const char* gdv_fn_initcap_utf8(int64_t context, const char* data, int32_t

[GitHub] [arrow] vvellanki commented on a change in pull request #11287: ARROW-14193: [C++][Gandiva] Implement INSTR function

2021-11-02 Thread GitBox
vvellanki commented on a change in pull request #11287: URL: https://github.com/apache/arrow/pull/11287#discussion_r741638784 ## File path: cpp/src/gandiva/function_registry_string.cc ## @@ -406,6 +406,10 @@ std::vector GetStringFunctionRegistry() { NativeFunction("spl

[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
xudong963 edited a comment on pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#issuecomment-958674138 The remaining LGTM, wait for @alamb to review. Nice work @capkurmagati -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
xudong963 commented on pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#issuecomment-958674138 The remaining LGTM, wait for @alamb to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow] vvellanki commented on a change in pull request #11287: ARROW-14193: [C++][Gandiva] Implement INSTR function

2021-11-02 Thread GitBox
vvellanki commented on a change in pull request #11287: URL: https://github.com/apache/arrow/pull/11287#discussion_r741632047 ## File path: cpp/src/gandiva/gdv_function_stubs.cc ## @@ -794,6 +794,27 @@ const char* gdv_fn_initcap_utf8(int64_t context, const char* data, int32_t

[GitHub] [arrow] cyb70289 commented on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2021-11-02 Thread GitBox
cyb70289 commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958669125 Thanks @niyue ! I did a similar test to read randomly at most 1/4 pages of a memory mapped file with 1G size. Without `madvise`, almost all 1G data is in the page cache afte

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
xudong963 commented on a change in pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#discussion_r741628302 ## File path: datafusion/src/logical_plan/expr.rs ## @@ -1935,10 +1935,32 @@ fn create_name(e: &Expr, input_schema: &DFSchema) -> Result {

[GitHub] [arrow] bkmgit commented on a change in pull request #11205: ARROW-14039: [C++][Docs] Indicate memory requirements for building

2021-11-02 Thread GitBox
bkmgit commented on a change in pull request #11205: URL: https://github.com/apache/arrow/pull/11205#discussion_r741627611 ## File path: docs/source/developers/cpp/building.rst ## @@ -41,6 +41,9 @@ Building requires: sufficient. For Windows, at least Visual Studio 2017 is re

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
xudong963 commented on a change in pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#discussion_r741626999 ## File path: datafusion/src/physical_plan/planner.rs ## @@ -176,6 +176,21 @@ fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
xudong963 commented on a change in pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#discussion_r741626089 ## File path: datafusion/src/logical_plan/expr.rs ## @@ -1935,10 +1935,32 @@ fn create_name(e: &Expr, input_schema: &DFSchema) -> Result {

[GitHub] [arrow] rkavanap commented on a change in pull request #11051: ARROW-13827: [C++][Gandiva] Implement LEVENSHTEIN Hive functions on Gandiva

2021-11-02 Thread GitBox
rkavanap commented on a change in pull request #11051: URL: https://github.com/apache/arrow/pull/11051#discussion_r741622617 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -1642,6 +1642,55 @@ const char* convert_toUTF8(int64_t context, const char* value, int32_t

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1135: Implement INTERSECT & INTERSECT DISTINCT

2021-11-02 Thread GitBox
xudong963 commented on pull request #1135: URL: https://github.com/apache/arrow-datafusion/pull/1135#issuecomment-958658978 > Am I correct that in the current design, the `null_equal_safe` argument will apply to all join key pairs? i.e. will we be able to have `t1 JOIN t2 ON t1.col1 = t2.c

[GitHub] [arrow] rkavanap commented on a change in pull request #11051: ARROW-13827: [C++][Gandiva] Implement LEVENSHTEIN Hive functions on Gandiva

2021-11-02 Thread GitBox
rkavanap commented on a change in pull request #11051: URL: https://github.com/apache/arrow/pull/11051#discussion_r741622617 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -1642,6 +1642,55 @@ const char* convert_toUTF8(int64_t context, const char* value, int32_t

[GitHub] [arrow-datafusion] Jimexist opened a new issue #1230: datafusion cli to add list tables command

2021-11-02 Thread GitBox
Jimexist opened a new issue #1230: URL: https://github.com/apache/arrow-datafusion/issues/1230 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** datafusion cli to add list tables command **Describe the solution you'd like

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1229: add list table command

2021-11-02 Thread GitBox
Jimexist opened a new pull request #1229: URL: https://github.com/apache/arrow-datafusion/pull/1229 # Which issue does this PR close? Closes #. # Rationale for this change add list table command # What changes are included in this PR? # Are there any u

[GitHub] [arrow-datafusion] Jimexist merged pull request #1224: add \q as quit command and add \? for help

2021-11-02 Thread GitBox
Jimexist merged pull request #1224: URL: https://github.com/apache/arrow-datafusion/pull/1224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gith

[GitHub] [arrow-datafusion] Jimexist closed issue #1216: Discussions about commands in datafusion-cli

2021-11-02 Thread GitBox
Jimexist closed issue #1216: URL: https://github.com/apache/arrow-datafusion/issues/1216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow-datafusion] capkurmagati commented on a change in pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
capkurmagati commented on a change in pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#discussion_r741610021 ## File path: datafusion/tests/sql.rs ## @@ -5305,3 +5305,57 @@ async fn case_with_bool_type_result() -> Result<()> { assert_eq!(expect

[GitHub] [arrow-datafusion] capkurmagati commented on pull request #1202: Fix `between` in select query

2021-11-02 Thread GitBox
capkurmagati commented on pull request #1202: URL: https://github.com/apache/arrow-datafusion/pull/1202#issuecomment-958643103 @xudong963 @alamb Thanks for the advice. I added some tests for the expr. PTAL. I will rebase the branch after reflecting your reviews. -- This is an automate

[GitHub] [arrow-datafusion] houqp commented on issue #1228: Extract logical plans in LogicalPlan as independent struct

2021-11-02 Thread GitBox
houqp commented on issue #1228: URL: https://github.com/apache/arrow-datafusion/issues/1228#issuecomment-958640365 @alamb if I recall correctly, the partialOrd for Expr as added to help with unit testing right? -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow-datafusion] xudong963 commented on issue #1228: Extract logical plans in LogicalPlan as independent struct

2021-11-02 Thread GitBox
xudong963 commented on issue #1228: URL: https://github.com/apache/arrow-datafusion/issues/1228#issuecomment-958638881 Let me know your thoughts ❤️ @houqp @alamb @Jimexist @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow-datafusion] xudong963 opened a new issue #1228: Extract logical plans in LogicalPlan as independent struct

2021-11-02 Thread GitBox
xudong963 opened a new issue #1228: URL: https://github.com/apache/arrow-datafusion/issues/1228 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** During I try to implement `Select (subquery)` , I add ` Select(Box)` in `Expr`. Becau

[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1135: Implement INTERSECT & INTERSECT DISTINCT

2021-11-02 Thread GitBox
houqp edited a comment on pull request #1135: URL: https://github.com/apache/arrow-datafusion/pull/1135#issuecomment-958638468 Am I correct that in the current design, the `null_equal_safe` argument will apply to all join key pairs? i.e. will we be able to have `t1 JOIN t2 ON t1.col1 = t2.

[GitHub] [arrow-datafusion] houqp commented on pull request #1135: Implement INTERSECT & INTERSECT DISTINCT

2021-11-02 Thread GitBox
houqp commented on pull request #1135: URL: https://github.com/apache/arrow-datafusion/pull/1135#issuecomment-958638468 Am I correct that in the current design, the `null_equal_safe` argument will apply to all join key pairs? i.e. will we be able to have `JOIN ON t1.col1 = t2.col1, t1.col2

[GitHub] [arrow-datafusion] Jimexist opened a new issue #1227: datafusion cli to support listing functions

2021-11-02 Thread GitBox
Jimexist opened a new issue #1227: URL: https://github.com/apache/arrow-datafusion/issues/1227 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A challenge to add function searching and listing in datafusion cli **Descri

[GitHub] [arrow-datafusion] Jimexist opened a new issue #1226: Datafusion cli should properly handle interrupt

2021-11-02 Thread GitBox
Jimexist opened a new issue #1226: URL: https://github.com/apache/arrow-datafusion/issues/1226 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** a challenge to add support for interrupt signal, to clear current line instead of

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1225: datafusion cli to handle EoF and interrupt signal

2021-11-02 Thread GitBox
Jimexist opened a new pull request #1225: URL: https://github.com/apache/arrow-datafusion/pull/1225 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

[GitHub] [arrow] westonpace commented on pull request #11486: ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader

2021-11-02 Thread GitBox
westonpace commented on pull request #11486: URL: https://github.com/apache/arrow/pull/11486#issuecomment-958617799 Hmm, I'll have to take a look and see what's up there. Some other ways to run lint are `ninja lint` and `archery lint --cpplint`. The former requires you to use the ninja g

[GitHub] [arrow] niyue commented on pull request #11486: ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader

2021-11-02 Thread GitBox
niyue commented on pull request #11486: URL: https://github.com/apache/arrow/pull/11486#issuecomment-958615422 > Looks like one last CI formatting thing: > > ``` > /arrow/cpp/src/arrow/ipc/reader_internal.h:84: Could not find a newline character at the end of the file. [whitespa

[GitHub] [arrow] niyue edited a comment on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2021-11-02 Thread GitBox
niyue edited a comment on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958612133 > Thanks for the PR. > I'm not sure if exposing madvise options is beneficial. If we expose `RANDOM`, then why not `SEQUENTIAL`? > I prefer don't expose these access p

[GitHub] [arrow] niyue commented on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2021-11-02 Thread GitBox
niyue commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958612133 > Thanks for the PR. > I'm not sure if exposing madvise options is beneficial. If we expose `RANDOM`, then why not `SEQUENTIAL`? > I prefer don't expose these access pattern

[GitHub] [arrow] ursabot edited a comment on pull request #11585: ARROW-14538: [R] Work around empty tr call on Solaris

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11585: URL: https://github.com/apache/arrow/pull/11585#issuecomment-958210366 Benchmark runs are scheduled for baseline = 92e3da573738d21ef7e74775f67ede499d59ebd7 and contender = bf67ec74635db2183619601f025e4724bd5a6b75. bf67ec74635db2183619601f02

[GitHub] [arrow] westonpace commented on pull request #11486: ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader

2021-11-02 Thread GitBox
westonpace commented on pull request #11486: URL: https://github.com/apache/arrow/pull/11486#issuecomment-958609714 Looks like one last CI formatting thing: ``` /arrow/cpp/src/arrow/ipc/reader_internal.h:84: Could not find a newline character at the end of the file. [whitespace/

[GitHub] [arrow] cyb70289 edited a comment on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2021-11-02 Thread GitBox
cyb70289 edited a comment on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958605062 Thanks for the PR. I'm not sure if exposing madvise options is beneficial. If we expose `RANDOM`, then why not `SEQUENTIAL`? I prefer don't expose these access patt

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1224: add \q as quit command and add \? for help

2021-11-02 Thread GitBox
Jimexist opened a new pull request #1224: URL: https://github.com/apache/arrow-datafusion/pull/1224 # Which issue does this PR close? Closes #1216 # Rationale for this change add \q as quit command and add \? for help # What changes are included in this PR?

[GitHub] [arrow] cyb70289 commented on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2021-11-02 Thread GitBox
cyb70289 commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958605062 Thanks for the PR. I'm not sure if exposing madvise options is beneficial. If we expose `RANDOM`, then why not `SEQUENTIAL`? I prefer don't expose these access pattern rel

[GitHub] [arrow] westonpace commented on a change in pull request #11542: ARROW-14356: [C++] Create kernel to determine buffer memory "referenced" by arrays (even if there are offsets)

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11542: URL: https://github.com/apache/arrow/pull/11542#discussion_r74154 ## File path: cpp/src/arrow/compute/kernels/vector_buffer_test.cc ## @@ -0,0 +1,382 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [arrow] westonpace commented on a change in pull request #11542: ARROW-14356: [C++] Create kernel to determine buffer memory "referenced" by arrays (even if there are offsets)

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11542: URL: https://github.com/apache/arrow/pull/11542#discussion_r741577703 ## File path: cpp/src/arrow/compute/kernels/vector_buffer.cc ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] ianmcook commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-11-02 Thread GitBox
ianmcook commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r741576370 ## File path: r/tests/testthat/test-dplyr-funcs-string.R ## @@ -467,6 +467,18 @@ test_that("strsplit and str_split", { ) }) +test_that("strrep", {

[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1062: Add support of HDFS as remote object store

2021-11-02 Thread GitBox
yahoNanJing commented on a change in pull request #1062: URL: https://github.com/apache/arrow-datafusion/pull/1062#discussion_r741575976 ## File path: datafusion/src/datasource/object_store/hdfs/os_parquet.rs ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (

[GitHub] [arrow-datafusion] yahoNanJing commented on a change in pull request #1062: Add support of HDFS as remote object store

2021-11-02 Thread GitBox
yahoNanJing commented on a change in pull request #1062: URL: https://github.com/apache/arrow-datafusion/pull/1062#discussion_r741575479 ## File path: datafusion/src/datasource/object_store/hdfs/os_parquet.rs ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (

[GitHub] [arrow] ianmcook commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-11-02 Thread GitBox
ianmcook commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r741575436 ## File path: r/R/expression.R ## @@ -100,7 +100,8 @@ # use `%/%` above. "%%" = "divide_checked", "^" = "power_checked", - "%in%" = "is_in_meta

[GitHub] [arrow-datafusion] yahoNanJing opened a new pull request #1223: Add support of HDFS as remote object store

2021-11-02 Thread GitBox
yahoNanJing opened a new pull request #1223: URL: https://github.com/apache/arrow-datafusion/pull/1223 # Which issue does this PR close? Closes #1060. It's a refactor version of PR #1062. # Rationale for this change Currently, we can only read parquet files from

[GitHub] [arrow] westonpace commented on a change in pull request #11542: ARROW-14356: [C++] Create kernel to determine buffer memory "referenced" by arrays (even if there are offsets)

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11542: URL: https://github.com/apache/arrow/pull/11542#discussion_r741573603 ## File path: cpp/src/arrow/compute/kernels/vector_buffer.cc ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] ursabot edited a comment on pull request #11585: ARROW-14538: [R] Work around empty tr call on Solaris

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11585: URL: https://github.com/apache/arrow/pull/11585#issuecomment-958210366 Benchmark runs are scheduled for baseline = 92e3da573738d21ef7e74775f67ede499d59ebd7 and contender = bf67ec74635db2183619601f025e4724bd5a6b75. bf67ec74635db2183619601f02

[GitHub] [arrow-rs] kingeasternsun opened a new pull request #909: Update mod.rs

2021-11-02 Thread GitBox
kingeasternsun opened a new pull request #909: URL: https://github.com/apache/arrow-rs/pull/909 # Which issue does this PR close? Closes #908 # Rationale for this change # What changes are included in this PR? fix the document li

[GitHub] [arrow-rs] kingeasternsun opened a new issue #908: the SerializedFileReader has moved from reader to serialized_reader , but document of mod.rs not updated

2021-11-02 Thread GitBox
kingeasternsun opened a new issue #908: URL: https://github.com/apache/arrow-rs/issues/908 **Describe the bug** A clear and concise description of what the bug is. the SerializedFileReader has moved from reader.rs to serialized_reader.rs , but document of mod.rs not updated

[GitHub] [arrow-datafusion] yahoNanJing commented on issue #1210: Bug of twice projection when creating ParquetExec during deserialization

2021-11-02 Thread GitBox
yahoNanJing commented on issue #1210: URL: https://github.com/apache/arrow-datafusion/issues/1210#issuecomment-958580781 It's fixed from the root cause by #1141. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow-datafusion] yahoNanJing closed issue #1210: Bug of twice projection when creating ParquetExec during deserialization

2021-11-02 Thread GitBox
yahoNanJing closed issue #1210: URL: https://github.com/apache/arrow-datafusion/issues/1210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github

[GitHub] [arrow-datafusion] yahoNanJing closed pull request #1211: Fix bug of twice projection when creating ParquetExec during deserialization (#1210)

2021-11-02 Thread GitBox
yahoNanJing closed pull request #1211: URL: https://github.com/apache/arrow-datafusion/pull/1211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: g

[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #1211: Fix bug of twice projection when creating ParquetExec during deserialization (#1210)

2021-11-02 Thread GitBox
yahoNanJing commented on pull request #1211: URL: https://github.com/apache/arrow-datafusion/pull/1211#issuecomment-958580503 Hi @houqp, thanks to #1141. It fixed the root cause. We can close this PR. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] westonpace commented on a change in pull request #11556: ARROW-14426: [C++] Add a minimum_row_group_size to dataset writing

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11556: URL: https://github.com/apache/arrow/pull/11556#discussion_r741557476 ## File path: cpp/src/arrow/dataset/dataset_writer.cc ## @@ -83,128 +84,163 @@ class Throttle { std::mutex mutex_; }; +struct DatasetWriterState

[GitHub] [arrow] westonpace commented on a change in pull request #11556: ARROW-14426: [C++] Add a minimum_row_group_size to dataset writing

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11556: URL: https://github.com/apache/arrow/pull/11556#discussion_r741556380 ## File path: cpp/src/arrow/table.h ## @@ -208,6 +208,15 @@ class ARROW_EXPORT Table { Result> CombineChunks( MemoryPool* pool = default_memo

[GitHub] [arrow] westonpace commented on a change in pull request #11556: ARROW-14426: [C++] Add a minimum_row_group_size to dataset writing

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11556: URL: https://github.com/apache/arrow/pull/11556#discussion_r741556138 ## File path: cpp/src/arrow/util/async_util.h ## @@ -176,6 +176,13 @@ class ARROW_EXPORT SerializedAsyncTaskGroup { /// The returned future that wil

[GitHub] [arrow] westonpace commented on a change in pull request #11556: ARROW-14426: [C++] Add a minimum_row_group_size to dataset writing

2021-11-02 Thread GitBox
westonpace commented on a change in pull request #11556: URL: https://github.com/apache/arrow/pull/11556#discussion_r741556010 ## File path: cpp/src/arrow/util/async_util.h ## @@ -176,6 +176,13 @@ class ARROW_EXPORT SerializedAsyncTaskGroup { /// The returned future that wil

[GitHub] [arrow] ursabot edited a comment on pull request #11587: ARROW-14530: [GLib] Return error for invalid decimal string

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11587: URL: https://github.com/apache/arrow/pull/11587#issuecomment-958141990 Benchmark runs are scheduled for baseline = 2917baf4744940ebe809c5b16effc806859f8843 and contender = 92e3da573738d21ef7e74775f67ede499d59ebd7. 92e3da573738d21ef7e74775f6

[GitHub] [arrow-datafusion] mingmwang commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

2021-11-02 Thread GitBox
mingmwang commented on issue #700: URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958525993 Is there any PR related to this ? If not I think I can work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] michalursa commented on a change in pull request #11446: ARROW-14181: [C++][Compute] Support for dictionaries in hash join

2021-11-02 Thread GitBox
michalursa commented on a change in pull request #11446: URL: https://github.com/apache/arrow/pull/11446#discussion_r741544566 ## File path: cpp/src/arrow/compute/exec/hash_join_dict.h ## @@ -0,0 +1,321 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

[GitHub] [arrow] ursabot edited a comment on pull request #11587: ARROW-14530: [GLib] Return error for invalid decimal string

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11587: URL: https://github.com/apache/arrow/pull/11587#issuecomment-958141990 Benchmark runs are scheduled for baseline = 2917baf4744940ebe809c5b16effc806859f8843 and contender = 92e3da573738d21ef7e74775f67ede499d59ebd7. 92e3da573738d21ef7e74775f6

[GitHub] [arrow] cpcloud commented on pull request #11466: ARROW-13987: [C++] Support nested field refs

2021-11-02 Thread GitBox
cpcloud commented on pull request #11466: URL: https://github.com/apache/arrow/pull/11466#issuecomment-958419705 I think making these kernels would probably be a good idea, if that allows us to move some of the logic into smaller functions. Unions seem like a great follow up, and wel

[GitHub] [arrow] cpcloud commented on a change in pull request #11466: ARROW-13987: [C++] Support nested field refs

2021-11-02 Thread GitBox
cpcloud commented on a change in pull request #11466: URL: https://github.com/apache/arrow/pull/11466#discussion_r741531580 ## File path: cpp/src/arrow/compute/exec/expression.h ## @@ -112,7 +113,7 @@ class ARROW_EXPORT Expression { // post-bind properties ValueDesc

[GitHub] [arrow] cpcloud commented on a change in pull request #11466: ARROW-13987: [C++] Support nested field refs

2021-11-02 Thread GitBox
cpcloud commented on a change in pull request #11466: URL: https://github.com/apache/arrow/pull/11466#discussion_r741531506 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -512,7 +511,31 @@ Result ExecuteScalarExpression(const Expression& expr, const ExecBatch& i

[GitHub] [arrow] cpcloud commented on a change in pull request #11466: ARROW-13987: [C++] Support nested field refs

2021-11-02 Thread GitBox
cpcloud commented on a change in pull request #11466: URL: https://github.com/apache/arrow/pull/11466#discussion_r741531355 ## File path: cpp/src/arrow/array/array_nested.cc ## @@ -541,56 +541,62 @@ std::shared_ptr StructArray::GetFieldByName(const std::string& name) cons R

[GitHub] [arrow] lidavidm commented on a change in pull request #11452: ARROW-13988: [C++] Support base binary types in hash_min_max

2021-11-02 Thread GitBox
lidavidm commented on a change in pull request #11452: URL: https://github.com/apache/arrow/pull/11452#discussion_r741528938 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate.cc ## @@ -1677,6 +1703,177 @@ struct GroupedMinMaxImpl final : public GroupedAggregator {

[GitHub] [arrow] lidavidm commented on a change in pull request #11452: ARROW-13988: [C++] Support base binary types in hash_min_max

2021-11-02 Thread GitBox
lidavidm commented on a change in pull request #11452: URL: https://github.com/apache/arrow/pull/11452#discussion_r741528772 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate.cc ## @@ -1677,6 +1703,177 @@ struct GroupedMinMaxImpl final : public GroupedAggregator {

[GitHub] [arrow] edponce commented on a change in pull request #11452: ARROW-13988: [C++] Support base binary types in hash_min_max

2021-11-02 Thread GitBox
edponce commented on a change in pull request #11452: URL: https://github.com/apache/arrow/pull/11452#discussion_r741473297 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate.cc ## @@ -603,8 +603,9 @@ struct GroupedValueTraits { }; template -void VisitGroupedValues

[GitHub] [arrow] niyue commented on a change in pull request #11486: ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader

2021-11-02 Thread GitBox
niyue commented on a change in pull request #11486: URL: https://github.com/apache/arrow/pull/11486#discussion_r741521123 ## File path: cpp/src/arrow/ipc/message.h ## @@ -441,6 +441,10 @@ class ARROW_EXPORT MessageReader { virtual Result> ReadNextMessage() = 0; }; +// the

[GitHub] [arrow] niyue commented on a change in pull request #11486: ARROW-12683 [C++] Enable fine-grained I/O (coalescing) in IPC reader

2021-11-02 Thread GitBox
niyue commented on a change in pull request #11486: URL: https://github.com/apache/arrow/pull/11486#discussion_r741520738 ## File path: cpp/src/arrow/ipc/reader_internal.h ## @@ -0,0 +1,82 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] ursabot edited a comment on pull request #11586: ARROW-14529: [GLib] Validate Decimal{128,256}DataType precision

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11586: URL: https://github.com/apache/arrow/pull/11586#issuecomment-958141322 Benchmark runs are scheduled for baseline = 7667c10c448777b4b11dd88a084fa53e1da3e7c6 and contender = 2917baf4744940ebe809c5b16effc806859f8843. 2917baf4744940ebe809c5b16e

[GitHub] [arrow] ursabot edited a comment on pull request #11586: ARROW-14529: [GLib] Validate Decimal{128,256}DataType precision

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11586: URL: https://github.com/apache/arrow/pull/11586#issuecomment-958141322 Benchmark runs are scheduled for baseline = 7667c10c448777b4b11dd88a084fa53e1da3e7c6 and contender = 2917baf4744940ebe809c5b16effc806859f8843. 2917baf4744940ebe809c5b16e

[GitHub] [arrow] simpl1g commented on a change in pull request #11584: MINOR: [Ruby][Docs] Add examples how to use red-arrow

2021-11-02 Thread GitBox
simpl1g commented on a change in pull request #11584: URL: https://github.com/apache/arrow/pull/11584#discussion_r741500686 ## File path: ruby/red-arrow/doc/text/examples.md ## @@ -0,0 +1,106 @@ + + +# Examples + +## Create table +### From file +```ruby +table = Arrow::Table.lo

[GitHub] [arrow] ursabot edited a comment on pull request #11585: ARROW-14538: [R] Work around empty tr call on Solaris

2021-11-02 Thread GitBox
ursabot edited a comment on pull request #11585: URL: https://github.com/apache/arrow/pull/11585#issuecomment-958210366 Benchmark runs are scheduled for baseline = 92e3da573738d21ef7e74775f67ede499d59ebd7 and contender = bf67ec74635db2183619601f025e4724bd5a6b75. bf67ec74635db2183619601f02

[GitHub] [arrow] ursabot commented on pull request #11585: ARROW-14538: [R] Work around empty tr call on Solaris

2021-11-02 Thread GitBox
ursabot commented on pull request #11585: URL: https://github.com/apache/arrow/pull/11585#issuecomment-958210366 Benchmark runs are scheduled for baseline = 92e3da573738d21ef7e74775f67ede499d59ebd7 and contender = bf67ec74635db2183619601f025e4724bd5a6b75. bf67ec74635db2183619601f025e4724b

[GitHub] [arrow] nealrichardson closed pull request #11585: ARROW-14538: [R] Work around empty tr call on Solaris

2021-11-02 Thread GitBox
nealrichardson closed pull request #11585: URL: https://github.com/apache/arrow/pull/11585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow] coryan commented on pull request #11569: ARROW-14171: [C++] add google-cloud-cpp to vcpkg

2021-11-02 Thread GitBox
coryan commented on pull request #11569: URL: https://github.com/apache/arrow/pull/11569#issuecomment-958207722 Ping? Is there something else I need to do here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] lidavidm commented on pull request #11507: ARROW-14421: [C++] Implement Flight SQL

2021-11-02 Thread GitBox
lidavidm commented on pull request #11507: URL: https://github.com/apache/arrow/pull/11507#issuecomment-958183255 Alright - I can replicate it now, on Ubuntu 20.04: This is _without_ any patches - so I think the problem runs even deeper than we expected. I'll do my best to find a so

[GitHub] [arrow] emkornfield commented on pull request #11591: [Java] ARROW-12163 - Make compression levels configurable

2021-11-02 Thread GitBox
emkornfield commented on pull request #11591: URL: https://github.com/apache/arrow/pull/11591#issuecomment-957998681 One more note. The JIRA linked is about adjusting the compression level used by compression algorithms (trading off speed for output size). I didnt notice a change here fo

[GitHub] [arrow] ianmcook commented on a change in pull request #11592: ARROW-14227: [R] Implement lubridate is.* methods

2021-11-02 Thread GitBox
ianmcook commented on a change in pull request #11592: URL: https://github.com/apache/arrow/pull/11592#discussion_r741201435 ## File path: r/R/dplyr-functions.R ## @@ -848,6 +848,21 @@ nse_funcs$wday <- function(x, Expression$create("day_of_week", x, options = list(count_fro

[GitHub] [arrow] paleolimbot commented on pull request #11592: ARROW-14227: [R] Implement lubridate is.* methods

2021-11-02 Thread GitBox
paleolimbot commented on pull request #11592: URL: https://github.com/apache/arrow/pull/11592#issuecomment-957997188 Ok! I think this is cleaner...I had incorrectly included TIME32 and TIME64 with TIMESTAMP as well. Does DATE64 map back to POSIXct? I think it's best to leave POSIXlt

[GitHub] [arrow] lidavidm commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-11-02 Thread GitBox
lidavidm commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r741054324 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -20,6 +20,7 @@ #include #include #include +#include Review comment:

[GitHub] [arrow] bkietz commented on a change in pull request #11384: ARROW-14074: [C++][Compute] C++ consumer of compute IR

2021-11-02 Thread GitBox
bkietz commented on a change in pull request #11384: URL: https://github.com/apache/arrow/pull/11384#discussion_r741160879 ## File path: cpp/cmake_modules/BuildUtils.cmake ## @@ -723,22 +724,27 @@ function(ADD_TEST_CASE REL_TEST_NAME) add_dependencies(${TEST_NAME} ${ARG_EX

[GitHub] [arrow] jonkeane commented on a change in pull request #11592: ARROW-14227: [R] Implement lubridate is.* methods

2021-11-02 Thread GitBox
jonkeane commented on a change in pull request #11592: URL: https://github.com/apache/arrow/pull/11592#discussion_r741269486 ## File path: r/R/dplyr-functions.R ## @@ -848,6 +848,21 @@ nse_funcs$wday <- function(x, Expression$create("day_of_week", x, options = list(count_fro

[GitHub] [arrow] kou closed pull request #11587: ARROW-14530: [GLib] Return error for invalid decimal string

2021-11-02 Thread GitBox
kou closed pull request #11587: URL: https://github.com/apache/arrow/pull/11587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] jonkeane commented on pull request #11592: ARROW-14227: [R] Implement lubridate is.* methods

2021-11-02 Thread GitBox
jonkeane commented on pull request #11592: URL: https://github.com/apache/arrow/pull/11592#issuecomment-957908069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow-datafusion] alamb commented on pull request #1204: Improve GetIndexedFieldExpr adding utf8 key based access for struct v…

2021-11-02 Thread GitBox
alamb commented on pull request #1204: URL: https://github.com/apache/arrow-datafusion/pull/1204#issuecomment-958128372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [arrow-datafusion] alamb commented on pull request #1212: add untyped null

2021-11-02 Thread GitBox
alamb commented on pull request #1212: URL: https://github.com/apache/arrow-datafusion/pull/1212#issuecomment-958114854 I think that during planning we should treat the type of any `ScalarValue::*(None)` as `DataType::Null` and coerce them appropriately to the needed types as part of physi

[GitHub] [arrow] ursabot commented on pull request #11587: ARROW-14530: [GLib] Return error for invalid decimal string

2021-11-02 Thread GitBox
ursabot commented on pull request #11587: URL: https://github.com/apache/arrow/pull/11587#issuecomment-958141990 Benchmark runs are scheduled for baseline = 2917baf4744940ebe809c5b16effc806859f8843 and contender = 92e3da573738d21ef7e74775f67ede499d59ebd7. 92e3da573738d21ef7e74775f67ede499

[GitHub] [arrow-datafusion] alamb closed issue #614: physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally

2021-11-02 Thread GitBox
alamb closed issue #614: URL: https://github.com/apache/arrow-datafusion/issues/614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[GitHub] [arrow-datafusion] alamb closed issue #1162: Add additional algebraic simplifications

2021-11-02 Thread GitBox
alamb closed issue #1162: URL: https://github.com/apache/arrow-datafusion/issues/1162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow-datafusion] houqp merged pull request #873: Rework the python bindings using conversion traits from arrow-rs

2021-11-02 Thread GitBox
houqp merged pull request #873: URL: https://github.com/apache/arrow-datafusion/pull/873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1062: Add support of HDFS as remote object store

2021-11-02 Thread GitBox
alamb commented on a change in pull request #1062: URL: https://github.com/apache/arrow-datafusion/pull/1062#discussion_r740941986 ## File path: datafusion/src/datasource/object_store/hdfs/os_parquet.rs ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) u

[GitHub] [arrow-rs] codecov-commenter commented on pull request #907: (WIP) Use latest nightly for MIRI runs

2021-11-02 Thread GitBox
codecov-commenter commented on pull request #907: URL: https://github.com/apache/arrow-rs/pull/907#issuecomment-958134744 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/907?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+A

[GitHub] [arrow] edponce commented on issue #11559: cython how to access / check / print values for each row

2021-11-02 Thread GitBox
edponce commented on issue #11559: URL: https://github.com/apache/arrow/issues/11559#issuecomment-958150821 @teneon If these solutions worked for you, please close this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] emkornfield commented on a change in pull request #11591: [Java] ARROW-12163 - Make compression levels configurable

2021-11-02 Thread GitBox
emkornfield commented on a change in pull request #11591: URL: https://github.com/apache/arrow/pull/11591#discussion_r741330445 ## File path: java/compression/src/test/java/org/apache/arrow/compression/TestCompressionCodecFile.java ## @@ -0,0 +1,176 @@ +/* + * Licensed to the

[GitHub] [arrow-rs] alamb commented on pull request #521: Change `nullif` to support arbitrary arrays

2021-11-02 Thread GitBox
alamb commented on pull request #521: URL: https://github.com/apache/arrow-rs/pull/521#issuecomment-958149214 I am going through old PRs and this one seems stalled. I am wondering what we would like to do with this one? Is ok to merge? Are we doing an alternate implementation? Do we have s

  1   2   3   4   5   6   7   >