[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1072: Expose a static object store registry

2021-10-06 Thread GitBox
houqp edited a comment on pull request #1072: URL: https://github.com/apache/arrow-datafusion/pull/1072#issuecomment-937478271 Interesting, from the discussion we had in https://github.com/rdettai/arrow-datafusion/pull/1, I got the opposite impression on where we are heading ;P I thought t

[GitHub] [arrow-datafusion] houqp commented on pull request #1072: Expose a static object store registry

2021-10-06 Thread GitBox
houqp commented on pull request #1072: URL: https://github.com/apache/arrow-datafusion/pull/1072#issuecomment-937478271 Interesting, from the discussion we had in https://github.com/rdettai/arrow-datafusion/pull/1, I got the opposite impression on where we are heading ;P I thought the conc

[GitHub] [arrow-datafusion] houqp commented on pull request #1010: Reorganize table providers by table format

2021-10-06 Thread GitBox
houqp commented on pull request #1010: URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-937473432 Sounds good, thanks @rdettai for being patient with this big change :) I should be able to with the clean up in two days if you still need a second hand by then. -- Th

[GitHub] [arrow] github-actions[bot] commented on pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-937465189 Revision: 96823881bfd42876dfbcc577415a1870611fee25 Submitted crossbow builds: [ursacomputing/crossbow @ actions-903](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] michalursa commented on pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
michalursa commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-937464815 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on pull request #11351: ARROW-13151: [C++][Parquet] Propagate schema changes from selection all the way up the stack

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11351: URL: https://github.com/apache/arrow/pull/11351#issuecomment-937462760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] emkornfield opened a new pull request #11351: ARROW-13151: [C++][Parquet] Propagate schema changes from selection all the way up the stack

2021-10-06 Thread GitBox
emkornfield opened a new pull request #11351: URL: https://github.com/apache/arrow/pull/11351 Previously, structs would only work for a single level, and other nested types did not do this at all. This PR propagates types through lists, large_lists, fixed_sized_lists and multiple nesting

[GitHub] [arrow] emkornfield edited a comment on issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield edited a comment on issue #11347: URL: https://github.com/apache/arrow/issues/11347#issuecomment-937318958 It is but currently only one level of nesting. This is a known bug: http://issues.apache.org/jira/browse/ARROW-13151?filter=-1 I am working on fix, I don't know if

[GitHub] [arrow] emkornfield edited a comment on issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield edited a comment on issue #11347: URL: https://github.com/apache/arrow/issues/11347#issuecomment-937318958 It is but currently only one level of nesting. This is a known bug: http://issues.apache.org/jira/browse/ARROW-13151 I am working on fix, I don't know if it will ma

[GitHub] [arrow] emkornfield commented on issue #11224: How can I get the row view of data read from parquet file?

2021-10-06 Thread GitBox
emkornfield commented on issue #11224: URL: https://github.com/apache/arrow/issues/11224#issuecomment-937460961 This seems answered. resolving. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] emkornfield closed issue #11224: How can I get the row view of data read from parquet file?

2021-10-06 Thread GitBox
emkornfield closed issue #11224: URL: https://github.com/apache/arrow/issues/11224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] emkornfield commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723849509 ## File path: python/pyarrow/types.pxi ## @@ -2176,6 +2176,14 @@ def duration(unit): return out +def month_day_nano_interval(): +""" +

[GitHub] [arrow] github-actions[bot] commented on pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-937440090 Revision: 16ded42dddc5e654dd177707c1c47dd6d8005c0d Submitted crossbow builds: [ursacomputing/crossbow @ actions-902](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-937439660 https://issues.apache.org/jira/browse/ARROW-14197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] michalursa commented on pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
michalursa commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-937439647 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] michalursa opened a new pull request #11350: ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-06 Thread GitBox
michalursa opened a new pull request #11350: URL: https://github.com/apache/arrow/pull/11350 Fixing 3 issues: - one in SchemaProjectionMaps - I simplified all of the code to get rid of thread synchronization at all - one in TaskScheduler - added (unnecessary) mutex - one in HashJoi

[GitHub] [arrow] westonpace commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723842177 ## File path: python/pyarrow/types.pxi ## @@ -2176,6 +2176,14 @@ def duration(unit): return out +def month_day_nano_interval(): +""" +

[GitHub] [arrow] emkornfield commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723837221 ## File path: python/pyarrow/types.pxi ## @@ -2176,6 +2176,14 @@ def duration(unit): return out +def month_day_nano_interval(): +""" +

[GitHub] [arrow] emkornfield commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723836536 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -71,6 +74,26 @@ bool MatchFixedOffset(const std::string& tz, util::string_view* sign, return

[GitHub] [arrow-datafusion] houqp opened a new pull request #1078: debug python test failure

2021-10-06 Thread GitBox
houqp opened a new pull request #1078: URL: https://github.com/apache/arrow-datafusion/pull/1078 # Which issue does this PR close? Debug intermittent python test failure. Not able to reproduce it from my local machine. -- This is an automated message from the Apache Git Service. T

[GitHub] [arrow] emkornfield commented on issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield commented on issue #11347: URL: https://github.com/apache/arrow/issues/11347#issuecomment-937418742 Sorry closed this too soon, something seems off here. 'a.b.c' won't work because of the JIRA issue but the error you are getting seems to indicate something strange is going on w

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1077: [window function] add `percent_rank` function

2021-10-06 Thread GitBox
Jimexist opened a new pull request #1077: URL: https://github.com/apache/arrow-datafusion/pull/1077 # Which issue does this PR close? [window function] add `percent_rank` function Closes #667 Related #1076 # Rationale for this change # What changes are i

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1076: add `cume_dist` implementation

2021-10-06 Thread GitBox
Jimexist opened a new pull request #1076: URL: https://github.com/apache/arrow-datafusion/pull/1076 # Which issue does this PR close? add `cume_dist` implementation Related #667 # Rationale for this change # What changes are included in this PR? #

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #821: [nit] update readme.md and reformat

2021-10-06 Thread GitBox
codecov-commenter edited a comment on pull request #821: URL: https://github.com/apache/arrow-rs/pull/821#issuecomment-937385658 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/821?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-rs] codecov-commenter commented on pull request #821: [nit] update readme.md and reformat

2021-10-06 Thread GitBox
codecov-commenter commented on pull request #821: URL: https://github.com/apache/arrow-rs/pull/821#issuecomment-937385658 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/821?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+A

[GitHub] [arrow-datafusion] Jimexist commented on pull request #1073: [WIP] update python verion to 3.10

2021-10-06 Thread GitBox
Jimexist commented on pull request #1073: URL: https://github.com/apache/arrow-datafusion/pull/1073#issuecomment-937382407 i guess it needs to have `numpy` updated first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-rs] Jimexist opened a new pull request #821: [nit] update readme.md and reformat

2021-10-06 Thread GitBox
Jimexist opened a new pull request #821: URL: https://github.com/apache/arrow-rs/pull/821 # Which issue does this PR close? update readme.md and reformat. Closes #. # Rationale for this change # What changes are included in this PR? # Are

[GitHub] [arrow] Jimexist closed pull request #10986: MINOR: [Doc][Python] Update compute kernel list

2021-10-06 Thread GitBox
Jimexist closed pull request #10986: URL: https://github.com/apache/arrow/pull/10986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-rs] nevi-me commented on pull request #820: Fewer ByteArray allocations when writing binary columns

2021-10-06 Thread GitBox
nevi-me commented on pull request #820: URL: https://github.com/apache/arrow-rs/pull/820#issuecomment-937370923 @alamb this is on top of #818. I'm seeing the below perf change on my machine, so around double the throughput: ```rust write_batch primitive/65536 values string

[GitHub] [arrow-rs] nevi-me opened a new pull request #820: Fewer ByteArray allocations when writing binary columns

2021-10-06 Thread GitBox
nevi-me opened a new pull request #820: URL: https://github.com/apache/arrow-rs/pull/820 # Which issue does this PR close? Closes #819. # Rationale for this change See linked issue, reduces allocations when writing binary arrays. # What changes are included in th

[GitHub] [arrow] kou closed pull request #11349: ARROW-14240: [C++] Fix wrong nlohmann-json header path

2021-10-06 Thread GitBox
kou closed pull request #11349: URL: https://github.com/apache/arrow/pull/11349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #11349: ARROW-14240: [C++] Fix wrong nlohmann-json header path

2021-10-06 Thread GitBox
kou commented on pull request #11349: URL: https://github.com/apache/arrow/pull/11349#issuecomment-937358483 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow] kou closed pull request #11348: ARROW-14246: [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()

2021-10-06 Thread GitBox
kou closed pull request #11348: URL: https://github.com/apache/arrow/pull/11348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-rs] nevi-me opened a new issue #819: Improve parquet binary writer speed by reducing allocations

2021-10-06 Thread GitBox
nevi-me opened a new issue #819: URL: https://github.com/apache/arrow-rs/issues/819 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When writing arrow binary columns to parquet, we create thousands of small `ByteBuffer` objec

[GitHub] [arrow-rs] nevi-me closed issue #815: Incorrect null count for cast kernel for list arrays

2021-10-06 Thread GitBox
nevi-me closed issue #815: URL: https://github.com/apache/arrow-rs/issues/815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-rs] nevi-me merged pull request #816: Fix null count when casting ListArray

2021-10-06 Thread GitBox
nevi-me merged pull request #816: URL: https://github.com/apache/arrow-rs/pull/816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] nevi-me opened a new pull request #818: Separate parquet writer benchmarks

2021-10-06 Thread GitBox
nevi-me opened a new pull request #818: URL: https://github.com/apache/arrow-rs/pull/818 # Which issue does this PR close? Not yet created # Rationale for this change I've been tracking a tricky nested list write bug, and because I've been away for a while, I've go

[GitHub] [arrow] kou commented on pull request #11348: ARROW-14246: [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()

2021-10-06 Thread GitBox
kou commented on pull request #11348: URL: https://github.com/apache/arrow/pull/11348#issuecomment-937339127 Thanks for confirming this. You don't need to run a test for this because I tested locally. I just want to share GCS related code changes to you. I merge this once CI is

[GitHub] [arrow] westonpace commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723744352 ## File path: python/pyarrow/types.pxi ## @@ -2176,6 +2176,14 @@ def duration(unit): return out +def month_day_nano_interval(): +""" +

[GitHub] [arrow] emkornfield edited a comment on issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield edited a comment on issue #11347: URL: https://github.com/apache/arrow/issues/11347#issuecomment-937318958 It is but currently only one level of nesting. This is a known bug: https://issues.apache.org/jira/browse/ARROW-13151?filter=-1 I am working on fix, I don't know if

[GitHub] [arrow] emkornfield closed issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield closed issue #11347: URL: https://github.com/apache/arrow/issues/11347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] emkornfield commented on issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
emkornfield commented on issue #11347: URL: https://github.com/apache/arrow/issues/11347#issuecomment-937318958 It is but currently only one level of nesting. This is a known bug: https://issues.apache.org/jira/browse/ARROW-13151?filter=-1 I am working on fix, I don't know if it wil

[GitHub] [arrow] eerhardt closed pull request #10973: ARROW-13689: [C#][Integration] Initial commit of C# Integration tests

2021-10-06 Thread GitBox
eerhardt closed pull request #10973: URL: https://github.com/apache/arrow/pull/10973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] emkornfield commented on pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on pull request #11302: URL: https://github.com/apache/arrow/pull/11302#issuecomment-937291121 @jorisvandenbossche thanks for the feedback. Please see responses inline > Looks good! Added a few inline comments. And some additional non-inline comments: >

[GitHub] [arrow] emkornfield commented on pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on pull request #11302: URL: https://github.com/apache/arrow/pull/11302#issuecomment-937287047 > @emkornfield It can live somewhere with the other Python helpers IMHO. Consolidated in datetime.h/datetime.cc. -- This is an automated message from the Apache Gi

[GitHub] [arrow-datafusion] houqp merged pull request #1075: Add a LogicalPlanBuilder::schema() function

2021-10-06 Thread GitBox
houqp merged pull request #1075: URL: https://github.com/apache/arrow-datafusion/pull/1075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] houqp closed issue #1074: Make a `LogicalPlanBuilder::schema()` function

2021-10-06 Thread GitBox
houqp closed issue #1074: URL: https://github.com/apache/arrow-datafusion/issues/1074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] lidavidm closed pull request #11345: MINOR: [C++] Avoid exposing arrow_vendored::date in public headers

2021-10-06 Thread GitBox
lidavidm closed pull request #11345: URL: https://github.com/apache/arrow/pull/11345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] lidavidm commented on pull request #11345: MINOR: [C++] Avoid exposing arrow_vendored::date in public headers

2021-10-06 Thread GitBox
lidavidm commented on pull request #11345: URL: https://github.com/apache/arrow/pull/11345#issuecomment-937266510 The failures look unrelated. I filed ARROW-14247 for the Valgrind error in parquet-error-test in test-conda-cpp-valgrind. -- This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #11306: ARROW-13739: [R] Support dplyr::count() and tally()

2021-10-06 Thread GitBox
nealrichardson commented on pull request #11306: URL: https://github.com/apache/arrow/pull/11306#issuecomment-937173815 I've fixed the tests, 🤞 we get a passing build now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow] emkornfield commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723688517 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -321,6 +323,14 @@ void InitPandasStaticData() { pandas_NA = ref.obj(); } + // Import Da

[GitHub] [arrow] emkornfield commented on a change in pull request #11302: ARROW-13806: [C++][Python] Add support for new MonthDayNano Interval Type

2021-10-06 Thread GitBox
emkornfield commented on a change in pull request #11302: URL: https://github.com/apache/arrow/pull/11302#discussion_r723687949 ## File path: cpp/src/arrow/python/arrow_to_python_internal.h ## @@ -0,0 +1,52 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// o

[GitHub] [arrow] kou edited a comment on issue #11342: Can arrow be used to load a parquet file from an Azure blob store (dbfs://) using Arrow ruby

2021-10-06 Thread GitBox
kou edited a comment on issue #11342: URL: https://github.com/apache/arrow/issues/11342#issuecomment-937162999 We don't implement Azure blob storage support yet. You can track it on ARROW-2034 . -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] kou commented on issue #11342: Can arrow be used to load a parquet file from an Azure blob store (dbfs://) using Arrow ruby

2021-10-06 Thread GitBox
kou commented on issue #11342: URL: https://github.com/apache/arrow/issues/11342#issuecomment-937162999 We didn't implement Azure blob storage support yet. You can track it on ARROW-2034 . -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] kou closed issue #11342: Can arrow be used to load a parquet file from an Azure blob store (dbfs://) using Arrow ruby

2021-10-06 Thread GitBox
kou closed issue #11342: URL: https://github.com/apache/arrow/issues/11342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] kou commented on pull request #10991: ARROW-13572: [C++][Datasets] Add ORC support to Datasets API

2021-10-06 Thread GitBox
kou commented on pull request #10991: URL: https://github.com/apache/arrow/pull/10991#issuecomment-937157675 Ah, #11343 fixes something. I don't touch the opened issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] kou commented on pull request #10991: ARROW-13572: [C++][Datasets] Add ORC support to Datasets API

2021-10-06 Thread GitBox
kou commented on pull request #10991: URL: https://github.com/apache/arrow/pull/10991#issuecomment-937153181 Thanks for confirming this. I don't know why but java-jars build failure has gone since 2021-10-06 daily build...: https://lists.apache.org/thread.html/r4a20c19088d6f21f961494d54

[GitHub] [arrow] kou closed pull request #11300: ARROW-14207: [C++] Add missing dependencies for bundled Boost targets

2021-10-06 Thread GitBox
kou closed pull request #11300: URL: https://github.com/apache/arrow/pull/11300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #11300: ARROW-14207: [C++] Add missing dependencies for bundled Boost targets

2021-10-06 Thread GitBox
kou commented on pull request #11300: URL: https://github.com/apache/arrow/pull/11300#issuecomment-937146813 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow] kou commented on pull request #11349: ARROW-14240: [C++] Fix wrong nlohmann-json header path

2021-10-06 Thread GitBox
kou commented on pull request #11349: URL: https://github.com/apache/arrow/pull/11349#issuecomment-937144898 FYI: @coryan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [arrow] github-actions[bot] commented on pull request #11349: ARROW-14240: [C++] Fix wrong nlohmann-json header path

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11349: URL: https://github.com/apache/arrow/pull/11349#issuecomment-937144138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] kou commented on pull request #11348: ARROW-14246: [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()

2021-10-06 Thread GitBox
kou commented on pull request #11348: URL: https://github.com/apache/arrow/pull/11348#issuecomment-937138351 FYI: @coryan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [arrow] github-actions[bot] commented on pull request #11348: ARROW-14246: [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11348: URL: https://github.com/apache/arrow/pull/11348#issuecomment-937137837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] lidavidm commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
lidavidm commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723676397 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -123,6 +123,12 @@ struct ARROW_DS_EXPORT ScanOptions { /// Fragment-specific scan options. std:

[GitHub] [arrow] kou opened a new pull request #11348: ARROW-14246: [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()

2021-10-06 Thread GitBox
kou opened a new pull request #11348: URL: https://github.com/apache/arrow/pull/11348 Error message: CMake Error at /usr/share/cmake-3.18/Modules/FindCURL.cmake:163 (message): CURL: Required feature 7.47.0 is not found Call Stack (most recent call first):

[GitHub] [arrow] westonpace commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723674829 ## File path: cpp/src/arrow/dataset/scanner_test.cc ## @@ -1025,13 +1035,20 @@ TEST_F(TestBackpressure, ScanBatchesUnordered) { EXPECT_OK_AND_ASSIGN

[GitHub] [arrow] westonpace commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723673903 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -123,6 +123,12 @@ struct ARROW_DS_EXPORT ScanOptions { /// Fragment-specific scan options. st

[GitHub] [arrow-datafusion] alamb commented on pull request #1072: Expose a static object store registry

2021-10-06 Thread GitBox
alamb commented on pull request #1072: URL: https://github.com/apache/arrow-datafusion/pull/1072#issuecomment-937128950 > I am not really happy about this solution either, so any alternative solution is welcome! I think the alternative would be to thread the ObjectStoreRegistry on so

[GitHub] [arrow] westonpace commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723663091 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -341,6 +341,9 @@ class ARROW_EXPORT ThreadPool : public Executor { // tasks are finished. St

[GitHub] [arrow] mpeterson-p4 commented on pull request #11324: ARROW-14228: [R] Allow for creation of nullable fields

2021-10-06 Thread GitBox
mpeterson-p4 commented on pull request #11324: URL: https://github.com/apache/arrow/pull/11324#issuecomment-937100414 Thanks for the feedback - removed that default value, and added a test to check that nullable is set correctly, and that equality works as expected. -- This is an automa

[GitHub] [arrow-datafusion] alamb opened a new pull request #1075: Add a LogicalPlanBuilder::schema() function

2021-10-06 Thread GitBox
alamb opened a new pull request #1075: URL: https://github.com/apache/arrow-datafusion/pull/1075 # Which issue does this PR close? Resolves https://github.com/apache/arrow-datafusion/issues/1074 # Rationale for this change I have code in IOx (https://github.com/influxdata/i

[GitHub] [arrow] westonpace commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723662097 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -123,6 +123,12 @@ struct ARROW_DS_EXPORT ScanOptions { /// Fragment-specific scan options. st

[GitHub] [arrow] lidavidm commented on a change in pull request #11285: ARROW-13611: [C++] Scanning datasets does not enforce back pressure

2021-10-06 Thread GitBox
lidavidm commented on a change in pull request #11285: URL: https://github.com/apache/arrow/pull/11285#discussion_r723660021 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -341,6 +341,9 @@ class ARROW_EXPORT ThreadPool : public Executor { // tasks are finished. Stat

[GitHub] [arrow] vikasmalhotra08 opened a new issue #11347: How to read nested fields when using read table to read a parquet file in PyArrow?

2021-10-06 Thread GitBox
vikasmalhotra08 opened a new issue #11347: URL: https://github.com/apache/arrow/issues/11347 Hello, Is it possible to read specific nested fields when trying to read a parquet file? I am getting an error that: ```pyarrow.lib.ArrowInvalid: Field named 'a.b' not found or not unique

[GitHub] [arrow-datafusion] alamb opened a new issue #1074: Make a `LogicalPlanBuilder::schema()` function

2021-10-06 Thread GitBox
alamb opened a new issue #1074: URL: https://github.com/apache/arrow-datafusion/issues/1074 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I have code in IOx that builds up custom plans using `LogicalPlanBuilder` and for one of

[GitHub] [arrow] nealrichardson commented on pull request #11306: ARROW-13739: [R] Support dplyr::count() and tally()

2021-10-06 Thread GitBox
nealrichardson commented on pull request #11306: URL: https://github.com/apache/arrow/pull/11306#issuecomment-937033917 Turns out we're getting quosures inside of quosures still. I'll suggest another patch, which might help for any others who are building NSE wrappers, and if that doesn't

[GitHub] [arrow-rs] taralx commented on pull request #812: parquet: enable base64 if needed

2021-10-06 Thread GitBox
taralx commented on pull request #812: URL: https://github.com/apache/arrow-rs/pull/812#issuecomment-936995837 Curses. Requires https://github.com/rust-lang/cargo/issues/5565 to hit stable first. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow-rs] taralx closed pull request #812: parquet: enable base64 if needed

2021-10-06 Thread GitBox
taralx closed pull request #812: URL: https://github.com/apache/arrow-rs/pull/812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[GitHub] [arrow] westonpace commented on a change in pull request #11017: ARROW-13542: [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11017: URL: https://github.com/apache/arrow/pull/11017#discussion_r723603737 ## File path: cpp/src/arrow/compute/exec/plan_test.cc ## @@ -374,6 +374,108 @@ TEST(ExecPlanExecution, SourceSinkError) { Finishes(Raise

[GitHub] [arrow] westonpace commented on a change in pull request #11017: ARROW-13542: [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11017: URL: https://github.com/apache/arrow/pull/11017#discussion_r723602579 ## File path: cpp/src/arrow/compute/exec/plan_test.cc ## @@ -374,6 +374,108 @@ TEST(ExecPlanExecution, SourceSinkError) { Finishes(Raise

[GitHub] [arrow] lidavidm commented on a change in pull request #11017: ARROW-13542: [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk

2021-10-06 Thread GitBox
lidavidm commented on a change in pull request #11017: URL: https://github.com/apache/arrow/pull/11017#discussion_r723601470 ## File path: cpp/src/arrow/compute/exec/plan_test.cc ## @@ -374,6 +374,108 @@ TEST(ExecPlanExecution, SourceSinkError) { Finishes(Raises(

[GitHub] [arrow] westonpace commented on a change in pull request #11017: ARROW-13542: [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk

2021-10-06 Thread GitBox
westonpace commented on a change in pull request #11017: URL: https://github.com/apache/arrow/pull/11017#discussion_r723599186 ## File path: cpp/src/arrow/util/future.h ## @@ -840,6 +840,17 @@ inline Future<>::Future(Status s) : Future(internal::Empty::ToResult(std::move(s AR

[GitHub] [arrow] github-actions[bot] commented on pull request #11346: ARROW-14063: [R] open_dataset() does not work on CSVs without header rows

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11346: URL: https://github.com/apache/arrow/pull/11346#issuecomment-936892657 https://issues.apache.org/jira/browse/ARROW-14063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-datafusion] rdettai commented on pull request #1072: Expose a static object store registry

2021-10-06 Thread GitBox
rdettai commented on pull request #1072: URL: https://github.com/apache/arrow-datafusion/pull/1072#issuecomment-936866927 > Could you elaborate on the root cause for requiring this static? The problem is mainly present in Ballista: - you need the `ObjectStore` in the `ExecutionPlan

[GitHub] [arrow] ursabot edited a comment on pull request #11330: ARROW-14224: [C++] Try to reduce build time/memory usage

2021-10-06 Thread GitBox
ursabot edited a comment on pull request #11330: URL: https://github.com/apache/arrow/pull/11330#issuecomment-936651389 Benchmark runs are scheduled for baseline = 7766c2feb64f9008a863f2aa3fab79f81e11fe38 and contender = 6a5ff9150f11159c9c9c45b862c57da588c79e75. Results will be available a

[GitHub] [arrow] tachyonwill commented on pull request #11281: PARQUET-2067: [C++][Parquet] Fix Parquet null count stats for enclosing null lists

2021-10-06 Thread GitBox
tachyonwill commented on pull request #11281: URL: https://github.com/apache/arrow/pull/11281#issuecomment-936801035 > @tachyonwill Do you have an id on https://issues.apache.org/ so that this issue can be assigned to you? willb_google -- This is an automated message from the Apac

[GitHub] [arrow] bkmgit commented on pull request #11340: ARROW-14229: [C++] WIP Bump versions of bundled dependencies

2021-10-06 Thread GitBox
bkmgit commented on pull request #11340: URL: https://github.com/apache/arrow/pull/11340#issuecomment-936768381 @jonkeane Thanks. As indicated in https://issues.apache.org/jira/browse/ARROW-6407 would you also consider having a mirror of https://github.com/ursa-labs/thirdparty on [OSDN]

[GitHub] [arrow] github-actions[bot] commented on pull request #11345: MINOR: [C++] Avoid exposing arrow_vendored::date in public headers

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11345: URL: https://github.com/apache/arrow/pull/11345#issuecomment-936765725 Revision: 4acc5353c2b5e40b54c8907aaac96083fca71fa8 Submitted crossbow builds: [ursacomputing/crossbow @ actions-901](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] pitrou commented on pull request #11345: MINOR: [C++] Avoid exposing arrow_vendored::date in public headers

2021-10-06 Thread GitBox
pitrou commented on pull request #11345: URL: https://github.com/apache/arrow/pull/11345#issuecomment-936763645 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] pitrou opened a new pull request #11345: MINOR: [C++] Avoid exposing arrow_vendored::date in public headers

2021-10-06 Thread GitBox
pitrou opened a new pull request #11345: URL: https://github.com/apache/arrow/pull/11345 This may also reduce compilation cost slightly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow] github-actions[bot] commented on pull request #11343: ARROW-14241: [C++][Java][CI] Fix java-jars build

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11343: URL: https://github.com/apache/arrow/pull/11343#issuecomment-936706913 Revision: 4bb541a24716a4e517689daacbd67ae098db8f34 Submitted crossbow builds: [ursacomputing/crossbow @ actions-900](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] pitrou commented on pull request #11343: ARROW-14241: [C++][Java][CI] Fix java-jars build

2021-10-06 Thread GitBox
pitrou commented on pull request #11343: URL: https://github.com/apache/arrow/pull/11343#issuecomment-936704999 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] pitrou commented on a change in pull request #11309: ARROW-13227: [Documentation][Compute] Document ExecNode

2021-10-06 Thread GitBox
pitrou commented on a change in pull request #11309: URL: https://github.com/apache/arrow/pull/11309#discussion_r723491214 ## File path: docs/source/cpp/compute.rst ## @@ -50,6 +50,8 @@ both array (chunked or not) and scalar inputs, however some will mandate either. For exam

[GitHub] [arrow] github-actions[bot] commented on pull request #11344: ARROW-14156: [C++] Properly synthesize validity buffer in StructArray::Flatten

2021-10-06 Thread GitBox
github-actions[bot] commented on pull request #11344: URL: https://github.com/apache/arrow/pull/11344#issuecomment-936688037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] lidavidm opened a new pull request #11344: ARROW-14156: [C++] Properly synthesize validity buffer in StructArray::Flatten

2021-10-06 Thread GitBox
lidavidm opened a new pull request #11344: URL: https://github.com/apache/arrow/pull/11344 We were copying the parent's validity buffer, but if the child has an offset, we need a validity buffer that's properly offset as well. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] jonkeane edited a comment on pull request #11340: ARROW-14229: [C++] WIP Bump versions of bundled dependencies

2021-10-06 Thread GitBox
jonkeane edited a comment on pull request #11340: URL: https://github.com/apache/arrow/pull/11340#issuecomment-936677294 I triggered our offline-r build and it failed for the reasons stated in the ticket (we need to upload the new source(s)). But once that's done it would be good to re-reu

[GitHub] [arrow] jonkeane commented on pull request #11340: ARROW-14229: [C++] WIP Bump versions of bundled dependencies

2021-10-06 Thread GitBox
jonkeane commented on pull request #11340: URL: https://github.com/apache/arrow/pull/11340#issuecomment-936677294 I triggered our offline-r build and it failed for the reasons stated in the ticket (we need to upload the new source(s)). But once that's done it would be good to re-reun this

[GitHub] [arrow] ursabot edited a comment on pull request #11284: ARROW-14187: [Python] Performance Regression on file-read benchmark

2021-10-06 Thread GitBox
ursabot edited a comment on pull request #11284: URL: https://github.com/apache/arrow/pull/11284#issuecomment-936410780 Benchmark runs are scheduled for baseline = 3080d25125266cd11d6645fdfcf570e40fb5376c and contender = 79192025c2423e019a33eecd5e150688a19d6be0. Results will be available a

[GitHub] [arrow] ursabot commented on pull request #11330: ARROW-14224: [C++] Try to reduce build time/memory usage

2021-10-06 Thread GitBox
ursabot commented on pull request #11330: URL: https://github.com/apache/arrow/pull/11330#issuecomment-936651389 Benchmark runs are scheduled for baseline = 7766c2feb64f9008a863f2aa3fab79f81e11fe38 and contender = 6a5ff9150f11159c9c9c45b862c57da588c79e75. Results will be available as each

[GitHub] [arrow] pitrou commented on pull request #11330: ARROW-14224: [C++] Try to reduce build time/memory usage

2021-10-06 Thread GitBox
pitrou commented on pull request #11330: URL: https://github.com/apache/arrow/pull/11330#issuecomment-936650469 @ursabot please benchmark lang=C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   3   >