[GitHub] [arrow] cyb70289 commented on a change in pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
cyb70289 commented on a change in pull request #8524: URL: https://github.com/apache/arrow/pull/8524#discussion_r520353233 ## File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc ## @@ -243,25 +258,23 @@ TYPED_TEST(TestSortToIndicesKernelForReal, SortReal) { this->As

[GitHub] [arrow] cyb70289 commented on a change in pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
cyb70289 commented on a change in pull request #8524: URL: https://github.com/apache/arrow/pull/8524#discussion_r520352989 ## File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc ## @@ -121,6 +132,10 @@ TYPED_TEST(TestNthToIndicesForReal, Real) { this->AssertNthToInd

[GitHub] [arrow] emkornfield commented on a change in pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

2020-11-09 Thread GitBox
emkornfield commented on a change in pull request #8516: URL: https://github.com/apache/arrow/pull/8516#discussion_r520337463 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1190,7 +1192,7 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter

[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

2020-11-09 Thread GitBox
nevi-me commented on pull request #8200: URL: https://github.com/apache/arrow/pull/8200#issuecomment-724504342 @jorgecarleitao we're down to 3 failures. ```rust thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', arrow/src/array/equal/variable_size.rs

[GitHub] [arrow] kcyea commented on issue #1510: Question: Can arrow be used to load a parquet file from an Azure blob store?

2020-11-09 Thread GitBox
kcyea commented on issue #1510: URL: https://github.com/apache/arrow/issues/1510#issuecomment-724494811 Can we read a list of parquet files under subdirectory of a container in Azure blob storage? For example, under container "test_container", there are a list of files under subdirectory o

[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

2020-11-09 Thread GitBox
nevi-me commented on pull request #8200: URL: https://github.com/apache/arrow/pull/8200#issuecomment-724493404 > From what I have read in this PR, it looks good so far. Is there anything blocking it? @jorgecarleitao the actual integration tests not passing, are the blocker šŸ˜… T

[GitHub] [arrow] jorgecarleitao commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

2020-11-09 Thread GitBox
jorgecarleitao commented on pull request #8200: URL: https://github.com/apache/arrow/pull/8200#issuecomment-724490574 From what I have read in this PR, it looks good so far. Is there anything blocking it? This is an automate

[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

2020-11-09 Thread GitBox
nevi-me commented on pull request #8200: URL: https://github.com/apache/arrow/pull/8200#issuecomment-724489474 @jorgecarleitao @carols10cents I'm contemplating closing this, and breaking it up into smaller PRs (if this round of changes don't pass integration). I'll work on it this week in

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8611: ARROW-4804: [Rust] Parse Date32 and Date64 in CSV reader

2020-11-09 Thread GitBox
jorgecarleitao commented on a change in pull request #8611: URL: https://github.com/apache/arrow/pull/8611#discussion_r520319490 ## File path: rust/arrow/src/csv/reader.rs ## @@ -219,6 +226,35 @@ pub fn infer_schema_from_files( Schema::try_merge(&schemas) } +/// Parses

[GitHub] [arrow] nevi-me closed pull request #8543: ARROW-9361 [Rust] Move array types into their own modules

2020-11-09 Thread GitBox
nevi-me closed pull request #8543: URL: https://github.com/apache/arrow/pull/8543 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] github-actions[bot] commented on pull request #8620: ARROW-10539: [Packaging][Python] Use GitHub Actions to build wheels for Windows

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8620: URL: https://github.com/apache/arrow/pull/8620#issuecomment-724447789 https://issues.apache.org/jira/browse/ARROW-10539 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #8620: ARROW-10539: [Packaging][Python] Use GitHub Actions to build wheels for Windows

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8620: URL: https://github.com/apache/arrow/pull/8620#issuecomment-724445334 Revision: 8308be07f369f47ebb694ac448035cc397839346 Submitted crossbow builds: [ursa-labs/crossbow @ actions-696](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] emkornfield commented on pull request #8589: ARROW-10493: [C++][Parquet] Fix offset lost in MaybeReplaceValidity

2020-11-09 Thread GitBox
emkornfield commented on pull request #8589: URL: https://github.com/apache/arrow/pull/8589#issuecomment-72564 I will move the slice method into parquet, expect some pushes to the branch over the next couple of hours. Th

[GitHub] [arrow] kou opened a new pull request #8620: ARROW-10539: [Packaging][Python] Use GitHub Actions to build wheels for Windows

2020-11-09 Thread GitBox
kou opened a new pull request #8620: URL: https://github.com/apache/arrow/pull/8620 Because we are banned by AppVeyor. Python 3.5 support is also removed. Because Python 3.5 reaches EOL. This is an automated message fr

[GitHub] [arrow] kou commented on pull request #8620: ARROW-10539: [Packaging][Python] Use GitHub Actions to build wheels for Windows

2020-11-09 Thread GitBox
kou commented on pull request #8620: URL: https://github.com/apache/arrow/pull/8620#issuecomment-72513 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow] cyb70289 commented on pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
cyb70289 commented on pull request #8524: URL: https://github.com/apache/arrow/pull/8524#issuecomment-724403787 > Is "sort_indices" actually stable? The [documentation](https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions) says "non-stable". In current code, "sort_ind

[GitHub] [arrow] chrisavl commented on a change in pull request #8589: ARROW-10493: [C++][Parquet] Fix offset lost in MaybeReplaceValidity

2020-11-09 Thread GitBox
chrisavl commented on a change in pull request #8589: URL: https://github.com/apache/arrow/pull/8589#discussion_r520225236 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -3199,6 +3200,36 @@ TEST(TestArrowReaderAdHoc, HandleDictPageOffsetZero) { TryRead

[GitHub] [arrow] vertexclique commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-09 Thread GitBox
vertexclique commented on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-724344182 I have created: https://issues.apache.org/jira/browse/ARROW-10535 I can only move tests to another PR because it is a generic code and without the generic code won't compile.

[GitHub] [arrow] github-actions[bot] commented on pull request #8619: ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8619: URL: https://github.com/apache/arrow/pull/8619#issuecomment-724333176 https://issues.apache.org/jira/browse/ARROW-10531 This is an automated message from the Apache Git Ser

[GitHub] [arrow] vertexclique commented on a change in pull request #8609: ARROW-10513: [Rust]: ARMv7 compilation

2020-11-09 Thread GitBox
vertexclique commented on a change in pull request #8609: URL: https://github.com/apache/arrow/pull/8609#discussion_r520177140 ## File path: rust/arrow/src/array/union.rs ## @@ -698,7 +699,11 @@ mod tests { 4 * 8 * 4 * mem::size_of::(), union.get_buffe

[GitHub] [arrow] alamb commented on a change in pull request #8619: ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor

2020-11-09 Thread GitBox
alamb commented on a change in pull request #8619: URL: https://github.com/apache/arrow/pull/8619#discussion_r520176174 ## File path: rust/datafusion/src/logical_plan/mod.rs ## @@ -956,117 +956,570 @@ impl LogicalPlan { } } +/// Trait that implements the [Visitor +/// p

[GitHub] [arrow] alamb commented on pull request #8619: ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor

2020-11-09 Thread GitBox
alamb commented on pull request #8619: URL: https://github.com/apache/arrow/pull/8619#issuecomment-724330317 I am sorry for the PR size -- but it is mostly comments and tests This is an automated message from the Apache Git S

[GitHub] [arrow] alamb opened a new pull request #8619: ARROW-10531: [Rust][DataFusion]: Add schema and graphviz formatting for LogicalPlans and a PlanVisitor

2020-11-09 Thread GitBox
alamb opened a new pull request #8619: URL: https://github.com/apache/arrow/pull/8619 # Rationale: I have been tracking down potential issues DataFusion for my work project, and I have found myself wanting to print out the state of the logical_plan several times. The existing debug form

[GitHub] [arrow] vertexclique commented on pull request #8611: ARROW-4804: [Rust] Parse Date32 and Date64 in CSV reader

2020-11-09 Thread GitBox
vertexclique commented on pull request #8611: URL: https://github.com/apache/arrow/pull/8611#issuecomment-724324997 Truly it is. Otherwise platform native primitives mightn't fit into the return types. This is an automated m

[GitHub] [arrow] pitrou closed pull request #8529: ARROW-5394: [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types

2020-11-09 Thread GitBox
pitrou closed pull request #8529: URL: https://github.com/apache/arrow/pull/8529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kou commented on pull request #8386: ARROW-10224: [Python] Build, test, and support Python 3.9

2020-11-09 Thread GitBox
kou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-724286962 Oh, I didn't know about it. I thought we need to use the same Visual Studio that is used for building Python. I'll try using Visual Studio 2017. ---

[GitHub] [arrow] pitrou commented on pull request #8386: ARROW-10224: [Python] Build, test, and support Python 3.9

2020-11-09 Thread GitBox
pitrou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-724276115 Also, in case you are worrying about ABI issues, they are gone with the [UCRT](https://docs.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-160).

[GitHub] [arrow] pitrou commented on pull request #8386: ARROW-10224: [Python] Build, test, and support Python 3.9

2020-11-09 Thread GitBox
pitrou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-724274421 I'll note that [cibuildwheel](https://github.com/joerick/cibuildwheel) supports building Windows wheels on GHA. The [setup](https://cibuildwheel.readthedocs.io/en/stable/setup/#gi

[GitHub] [arrow] pitrou commented on pull request #8386: ARROW-10224: [Python] Build, test, and support Python 3.9

2020-11-09 Thread GitBox
pitrou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-724271692 Hmm... why is VS2015 required? This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] kou commented on pull request #8386: ARROW-10224: [Python] Build, test, and support Python 3.9

2020-11-09 Thread GitBox
kou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-724271156 Sorry... We can't use GitHub Actions for building wheels for Python on Windows. It requires Visual Studio 2015 but GitHub Actions provides only 2017 and 2019. ---

[GitHub] [arrow] github-actions[bot] commented on pull request #8618: ARROW-10530: [R] Optionally use distro package in linuxlibs.R

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8618: URL: https://github.com/apache/arrow/pull/8618#issuecomment-724268622 https://issues.apache.org/jira/browse/ARROW-10530 This is an automated message from the Apache Git Ser

[GitHub] [arrow] nealrichardson opened a new pull request #8618: ARROW-10530: [R] Optionally use distro package in linuxlibs.R

2020-11-09 Thread GitBox
nealrichardson opened a new pull request #8618: URL: https://github.com/apache/arrow/pull/8618 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] nevi-me commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-09 Thread GitBox
nevi-me commented on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-724263021 A few weeks/months ago when I was trying to work on bit slicing, `bit_vec` was recommended over at Reddit, so I think it's generally a good dependency for us to carry along, as it

[GitHub] [arrow] nevi-me commented on a change in pull request #8609: ARROW-10513: [Rust]: ARMv7 compilation

2020-11-09 Thread GitBox
nevi-me commented on a change in pull request #8609: URL: https://github.com/apache/arrow/pull/8609#discussion_r520092924 ## File path: rust/arrow/src/array/union.rs ## @@ -698,7 +699,11 @@ mod tests { 4 * 8 * 4 * mem::size_of::(), union.get_buffer_mem

[GitHub] [arrow] bkietz commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r520087424 ## File path: cpp/src/arrow/util/future_test.cc ## @@ -330,34 +347,58 @@ TEST(FutureSyncTest, MoveOnlyDataType) { } } -TEST(FutureSyncTest, void) { +TE

[GitHub] [arrow] nealrichardson commented on a change in pull request #8579: ARROW-10481: [R] Bindings to add, remove, replace Table columns

2020-11-09 Thread GitBox
nealrichardson commented on a change in pull request #8579: URL: https://github.com/apache/arrow/pull/8579#discussion_r520073139 ## File path: r/R/table.R ## @@ -254,6 +257,68 @@ names.Table <- function(x) x$ColumnNames() #' @export `[[.Table` <- `[[.RecordBatch` +#' @expor

[GitHub] [arrow] nealrichardson commented on a change in pull request #8579: ARROW-10481: [R] Bindings to add, remove, replace Table columns

2020-11-09 Thread GitBox
nealrichardson commented on a change in pull request #8579: URL: https://github.com/apache/arrow/pull/8579#discussion_r520072414 ## File path: r/tests/testthat/test-Table.R ## @@ -179,10 +183,34 @@ test_that("[[<- assignment", { # can use $ tab$new <- NULL expect_null(

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r520062916 ## File path: cpp/src/arrow/util/future_test.cc ## @@ -330,34 +347,58 @@ TEST(FutureSyncTest, MoveOnlyDataType) { } } -TEST(FutureSyncTest, void) { +TE

[GitHub] [arrow] jorisvandenbossche commented on pull request #8557: ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems

2020-11-09 Thread GitBox
jorisvandenbossche commented on pull request #8557: URL: https://github.com/apache/arrow/pull/8557#issuecomment-724225124 I was going to propose to actually completely remove the `S3FSWrapper` call here (instead of only moving it after the other check). This class if from before my

[GitHub] [arrow] pitrou commented on pull request #8529: ARROW-5394: [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types

2020-11-09 Thread GitBox
pitrou commented on pull request #8529: URL: https://github.com/apache/arrow/pull/8529#issuecomment-724216961 Ok, it seems the git situation is still a bit messed up, I'll fix it up :-) This is an automated message from the A

[GitHub] [arrow] github-actions[bot] commented on pull request #8617: ARROW-10525: [C++] Fix crash on unsupported IPC stream

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8617: URL: https://github.com/apache/arrow/pull/8617#issuecomment-724195841 https://issues.apache.org/jira/browse/ARROW-10525 This is an automated message from the Apache Git Ser

[GitHub] [arrow] pitrou closed pull request #8511: ARROW-7531: [C++] Reduce header inclusion cost slightly

2020-11-09 Thread GitBox
pitrou closed pull request #8511: URL: https://github.com/apache/arrow/pull/8511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou opened a new pull request #8617: ARROW-10525: [C++] Fix crash on unsupported IPC stream

2020-11-09 Thread GitBox
pitrou opened a new pull request #8617: URL: https://github.com/apache/arrow/pull/8617 Detect situations where we have a delta dictionary with an unresolved nested dictionary. Found by OSS-Fuzz. Should fix the following issue: - https://bugs.chromium.org/p/oss-fuzz/issues/detai

[GitHub] [arrow] david1437 commented on pull request #8529: ARROW-5394: [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types

2020-11-09 Thread GitBox
david1437 commented on pull request #8529: URL: https://github.com/apache/arrow/pull/8529#issuecomment-724171027 Not too proficient with git so hopefully, I rebased correctly. Added changes as you recommended. This is an au

[GitHub] [arrow] david1437 commented on a change in pull request #8529: ARROW-5394: [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types

2020-11-09 Thread GitBox
david1437 commented on a change in pull request #8529: URL: https://github.com/apache/arrow/pull/8529#discussion_r520002950 ## File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc ## @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under

[GitHub] [arrow] github-actions[bot] commented on pull request #8616: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8616: URL: https://github.com/apache/arrow/pull/8616#issuecomment-724144351 https://issues.apache.org/jira/browse/ARROW-10522 This is an automated message from the Apache Git Ser

[GitHub] [arrow] bkietz commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
bkietz commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724137914 I think I see what you're saying; I was incorrect on the #include order logic and we can make it public again if that'd be preferable. The OO objection remains, but I'm not committ

[GitHub] [arrow] ianmcook commented on pull request #8616: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()

2020-11-09 Thread GitBox
ianmcook commented on pull request #8616: URL: https://github.com/apache/arrow/pull/8616#issuecomment-724129628 @nealrichardson PTAL This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] ianmcook opened a new pull request #8616: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()

2020-11-09 Thread GitBox
ianmcook opened a new pull request #8616: URL: https://github.com/apache/arrow/pull/8616 This allows R users to rename columns inĀ `Table` andĀ `RecordBatch` objects usingĀ `names(obj) <- value` syntax. This is an automated mes

[GitHub] [arrow] pitrou commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
pitrou commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724121436 Ok, but I still don't understand why it should make `ThreadPool` private :-) This is an automated message from the

[GitHub] [arrow] bkietz commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
bkietz commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724120890 The specific function I had in mind to write is ```c++ /// execute continuation using a specific Executor Future Future::Then(Executor* where, Continuation&& continuation) {

[GitHub] [arrow] pitrou commented on pull request #8511: ARROW-7531: [C++] Reduce header inclusion cost slightly

2020-11-09 Thread GitBox
pitrou commented on pull request #8511: URL: https://github.com/apache/arrow/pull/8511#issuecomment-724115565 Rebased, will merge if CI is green. This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] pitrou commented on pull request #8501: ARROW-10364: [Dev][Archery] Add support for semver 2.13.0

2020-11-09 Thread GitBox
pitrou commented on pull request #8501: URL: https://github.com/apache/arrow/pull/8501#issuecomment-724107751 Ping @kszucs This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] pitrou closed pull request #8523: ARROW-10325: [C++][Compute] Refine aggregate kernel registration

2020-11-09 Thread GitBox
pitrou closed pull request #8523: URL: https://github.com/apache/arrow/pull/8523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on a change in pull request #8589: ARROW-10493: [C++][Parquet] Fix offset lost in MaybeReplaceValidity

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8589: URL: https://github.com/apache/arrow/pull/8589#discussion_r519899025 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1222,12 +1223,17 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] mariusvniekerk commented on pull request #8557: ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems

2020-11-09 Thread GitBox
mariusvniekerk commented on pull request #8557: URL: https://github.com/apache/arrow/pull/8557#issuecomment-724095124 might be useful still for s3fs<0.4 ? This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] pitrou commented on pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
pitrou commented on pull request #8524: URL: https://github.com/apache/arrow/pull/8524#issuecomment-724091219 Also, you'll need to update the docs :-) This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow] pitrou commented on pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
pitrou commented on pull request #8524: URL: https://github.com/apache/arrow/pull/8524#issuecomment-724090735 Is "sort_indices" actually stable? The [documentation](https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions) says "non-stable". ---

[GitHub] [arrow] pitrou commented on a change in pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8524: URL: https://github.com/apache/arrow/pull/8524#discussion_r519904982 ## File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc ## @@ -243,25 +258,23 @@ TYPED_TEST(TestSortToIndicesKernelForReal, SortReal) { this->Asse

[GitHub] [arrow] pitrou commented on a change in pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8524: URL: https://github.com/apache/arrow/pull/8524#discussion_r519903359 ## File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc ## @@ -121,6 +132,10 @@ TYPED_TEST(TestNthToIndicesForReal, Real) { this->AssertNthToIndic

[GitHub] [arrow] pitrou commented on a change in pull request #8529: ARROW-5394: [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8529: URL: https://github.com/apache/arrow/pull/8529#discussion_r519891890 ## File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_benchmark.cc ## @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519890573 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -64,14 +64,39 @@ namespace detail { // Make sure that both functions returning T and Result can be call

[GitHub] [arrow] pitrou commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
pitrou commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724077302 Hmm... I'm afraid I don't follow the explanation. Certainly `future.h` doesn't include `thread_pool.h`, so what is wrong with `thread_pool.h` including `future.h`? --

[GitHub] [arrow] pitrou commented on a change in pull request #8516: PARQUET-1935: [C++] Fix bug in WriteBatchSpaced

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8516: URL: https://github.com/apache/arrow/pull/8516#discussion_r519886116 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1190,7 +1192,7 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] bkietz commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519888335 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -64,14 +64,39 @@ namespace detail { // Make sure that both functions returning T and Result can be call

[GitHub] [arrow] bkietz commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
bkietz commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724072730 > I'm roughly ok with making ThreadPool private, but why is it necessary to do so? Currently `thread_pool.h` includes `future.h` because `Executor::Submit` depends on a comp

[GitHub] [arrow] pitrou commented on pull request #8542: ARROW-10407: [C++] Add BasicDecimal256 Division Support

2020-11-09 Thread GitBox
pitrou commented on pull request #8542: URL: https://github.com/apache/arrow/pull/8542#issuecomment-724071921 I didn't find any performance regression on an AMD Zen 2 CPU. This is an automated message from the Apache Git Serv

[GitHub] [arrow] pitrou commented on a change in pull request #8542: ARROW-10407: [C++] Add BasicDecimal256 Division Support

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8542: URL: https://github.com/apache/arrow/pull/8542#discussion_r519873678 ## File path: cpp/src/arrow/util/basic_decimal.cc ## @@ -395,33 +395,49 @@ BasicDecimal128& BasicDecimal128::operator*=(const BasicDecimal128& right) { r

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519871842 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -64,14 +64,39 @@ namespace detail { // Make sure that both functions returning T and Result can be call

[GitHub] [arrow] bkietz commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519870446 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -64,14 +64,39 @@ namespace detail { // Make sure that both functions returning T and Result can be call

[GitHub] [arrow] pitrou closed pull request #8568: ARROW-10346: [Python] Ensure tests aren't affected by user-supplied AWS config

2020-11-09 Thread GitBox
pitrou closed pull request #8568: URL: https://github.com/apache/arrow/pull/8568 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519864378 ## File path: cpp/src/arrow/util/future.h ## @@ -265,28 +188,24 @@ class Future { } /// \brief Wait for the Future to complete and return its Result

[GitHub] [arrow] bkietz commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
bkietz commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519864564 ## File path: cpp/src/arrow/util/future.h ## @@ -310,63 +229,66 @@ class Future { return impl_->Wait(seconds); } - /// If a Result holds an error

[GitHub] [arrow] pitrou commented on pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-09 Thread GitBox
pitrou commented on pull request #8582: URL: https://github.com/apache/arrow/pull/8582#issuecomment-724055871 I'm roughly ok with making ThreadPool private, but why is it necessary to do so? > Shutdown is only referenced in tests and benchmarks, is in necessary? In some cases

[GitHub] [arrow] pitrou commented on pull request #8557: ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems

2020-11-09 Thread GitBox
pitrou commented on pull request #8557: URL: https://github.com/apache/arrow/pull/8557#issuecomment-724052440 So what use is the `S3FSWrapper` after this PR? This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow] github-actions[bot] commented on pull request #8615: ARROW-10519: [Python] Fix deadlock when importing pandas from several threads

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8615: URL: https://github.com/apache/arrow/pull/8615#issuecomment-724050695 https://issues.apache.org/jira/browse/ARROW-10519 This is an automated message from the Apache Git Ser

[GitHub] [arrow] pitrou commented on pull request #8585: ARROW-10475: [C++][FlightRPC] handle IPv6 hosts

2020-11-09 Thread GitBox
pitrou commented on pull request #8585: URL: https://github.com/apache/arrow/pull/8585#issuecomment-724049802 > There's also already tests for the current behavior (IPv6 host gets returned without brackets) Those tests are correct, but we can add a function to encode back the host (

[GitHub] [arrow] jonkeane commented on pull request #8579: ARROW-10481: [R] Bindings to add, remove, replace Table columns

2020-11-09 Thread GitBox
jonkeane commented on pull request #8579: URL: https://github.com/apache/arrow/pull/8579#issuecomment-724047954 Ok, I opened https://issues.apache.org/jira/browse/ARROW-10520 and https://issues.apache.org/jira/browse/ARROW-10521 for the two follow ons, added some more NA and type checking,

[GitHub] [arrow] pitrou commented on pull request #8588: ARROW-4637: [Python] Must lock on import, to avoid deadlock due to circular imports

2020-11-09 Thread GitBox
pitrou commented on pull request #8588: URL: https://github.com/apache/arrow/pull/8588#issuecomment-724044398 Closed in favour of #8615. This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] pitrou opened a new pull request #8615: ARROW-10519: [Python] Fix deadlock when importing pandas from several threads

2020-11-09 Thread GitBox
pitrou opened a new pull request #8615: URL: https://github.com/apache/arrow/pull/8615 Fix a lock ordering-induced deadlock between the Python GIL and the `std::once_flag` lock used in `arrow::py::internal::InitPandasStaticData`. Thread A: * takes the GIL * calls `arrow::py:

[GitHub] [arrow] pitrou closed pull request #8588: ARROW-4637: [Python] Must lock on import, to avoid deadlock due to circular imports

2020-11-09 Thread GitBox
pitrou closed pull request #8588: URL: https://github.com/apache/arrow/pull/8588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou commented on pull request #8588: ARROW-4637: [Python] Must lock on import, to avoid deadlock due to circular imports

2020-11-09 Thread GitBox
pitrou commented on pull request #8588: URL: https://github.com/apache/arrow/pull/8588#issuecomment-724037836 Opened ARROW-10519. This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] pitrou commented on pull request #8588: ARROW-4637: [Python] Must lock on import, to avoid deadlock due to circular imports

2020-11-09 Thread GitBox
pitrou commented on pull request #8588: URL: https://github.com/apache/arrow/pull/8588#issuecomment-724031162 I've got a reproducer. The problem is actually more complex. We've got a deadlock between two threads: Thread A: * takes the GIL * calls `arrow::py::internal::InitPanda

[GitHub] [arrow] pitrou commented on pull request #8588: ARROW-4637: [Python] Must lock on import, to avoid deadlock due to circular imports

2020-11-09 Thread GitBox
pitrou commented on pull request #8588: URL: https://github.com/apache/arrow/pull/8588#issuecomment-724014295 @malthe Could you please open a separate JIRA for this? This is an automated message from the Apache Git Service. T

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519813747 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -64,14 +64,39 @@ namespace detail { // Make sure that both functions returning T and Result can be call

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519809843 ## File path: cpp/src/arrow/util/future.h ## @@ -265,28 +188,24 @@ class Future { } /// \brief Wait for the Future to complete and return its Result

[GitHub] [arrow] pitrou commented on a change in pull request #8591: ARROW-10484: [C++] Make Future<> more generic

2020-11-09 Thread GitBox
pitrou commented on a change in pull request #8591: URL: https://github.com/apache/arrow/pull/8591#discussion_r519810188 ## File path: cpp/src/arrow/util/future.h ## @@ -310,63 +229,66 @@ class Future { return impl_->Wait(seconds); } - /// If a Result holds an error

[GitHub] [arrow] pitrou closed pull request #8604: ARROW-10509: [C++] Define operator<<(ostream, ParquetException) for clang+Windows

2020-11-09 Thread GitBox
pitrou closed pull request #8604: URL: https://github.com/apache/arrow/pull/8604 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou commented on pull request #8604: ARROW-10509: [C++] Define operator<<(ostream, ParquetException) for clang+Windows

2020-11-09 Thread GitBox
pitrou commented on pull request #8604: URL: https://github.com/apache/arrow/pull/8604#issuecomment-724003667 The test failure is unrelated, will merge. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou commented on pull request #8600: ARROW-10503: [C++] Uriparser will not compile using Intel compiler

2020-11-09 Thread GitBox
pitrou commented on pull request #8600: URL: https://github.com/apache/arrow/pull/8600#issuecomment-724002339 Thank you @jensenrichardson ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] pitrou closed pull request #8600: ARROW-10503: [C++] Uriparser will not compile using Intel compiler

2020-11-09 Thread GitBox
pitrou closed pull request #8600: URL: https://github.com/apache/arrow/pull/8600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on pull request #8557: ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems

2020-11-09 Thread GitBox
jorisvandenbossche commented on pull request #8557: URL: https://github.com/apache/arrow/pull/8557#issuecomment-723994765 Rebased this, and running the tests that I added in https://github.com/apache/arrow/pull/8573 with latest s3fs locally (we currently still get s3fs 0.4 on CI) ---

[GitHub] [arrow] github-actions[bot] commented on pull request #8614: ARROW-10518: [C++][Gandiva] Adding NativeFunction::kCanReturnErrors to cast function in gandiva

2020-11-09 Thread GitBox
github-actions[bot] commented on pull request #8614: URL: https://github.com/apache/arrow/pull/8614#issuecomment-723987750 https://issues.apache.org/jira/browse/ARROW-10518 This is an automated message from the Apache Git Ser

[GitHub] [arrow] naman1996 opened a new pull request #8614: ARROW-10518: [C++][Gandiva] Adding NativeFunction::kCanReturnErrors to cast function in gandiva

2020-11-09 Thread GitBox
naman1996 opened a new pull request #8614: URL: https://github.com/apache/arrow/pull/8614 [](https://github.com/apache/arrow/commit/36bf7a43eefd3bc5563fcfa8d04bc35a49c97978) In this change a bit of refactoring was done and we changes function definitions for cast funct

[GitHub] [arrow] jorisvandenbossche closed pull request #8573: ARROW-10471: [CI][Python] Ensure we have tests with s3fs and run those on CI

2020-11-09 Thread GitBox
jorisvandenbossche closed pull request #8573: URL: https://github.com/apache/arrow/pull/8573 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] jorisvandenbossche closed pull request #8539: ARROW-10462: [Python] Fix usage of fsspec in ParquetDataset causing path issue on Windows

2020-11-09 Thread GitBox
jorisvandenbossche closed pull request #8539: URL: https://github.com/apache/arrow/pull/8539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] kszucs commented on pull request #8573: ARROW-10471: [CI][Python] Ensure we have tests with s3fs and run those on CI

2020-11-09 Thread GitBox
kszucs commented on pull request #8573: URL: https://github.com/apache/arrow/pull/8573#issuecomment-723942963 @ursabot build This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8573: ARROW-10471: [CI][Python] Ensure we have a test build with s3fs

2020-11-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #8573: URL: https://github.com/apache/arrow/pull/8573#discussion_r519691105 ## File path: .github/workflows/python.yml ## @@ -70,11 +70,11 @@ jobs: title: AMD64 Conda Python 3.6 Pandas 0.23 pytho

[GitHub] [arrow] kszucs commented on a change in pull request #8573: ARROW-10471: [CI][Python] Ensure we have a test build with s3fs

2020-11-09 Thread GitBox
kszucs commented on a change in pull request #8573: URL: https://github.com/apache/arrow/pull/8573#discussion_r519708758 ## File path: .github/workflows/python.yml ## @@ -70,11 +70,11 @@ jobs: title: AMD64 Conda Python 3.6 Pandas 0.23 python: 3.6

[GitHub] [arrow] kszucs commented on a change in pull request #8573: ARROW-10471: [CI][Python] Ensure we have a test build with s3fs

2020-11-09 Thread GitBox
kszucs commented on a change in pull request #8573: URL: https://github.com/apache/arrow/pull/8573#discussion_r519708758 ## File path: .github/workflows/python.yml ## @@ -70,11 +70,11 @@ jobs: title: AMD64 Conda Python 3.6 Pandas 0.23 python: 3.6

  1   2   >