[GitHub] [arrow] romainfrancois commented on a change in pull request #8341: ARROW-10093: [R] Add ability to opt-out of int64 -> int demotion

2020-10-07 Thread GitBox
romainfrancois commented on a change in pull request #8341: URL: https://github.com/apache/arrow/pull/8341#discussion_r500779786 ## File path: r/src/array_to_vector.cpp ## @@ -960,6 +960,18 @@ bool ArraysCanFitInteger(ArrayVector arrays) { return all_can_fit; } +bool GetB

[GitHub] [arrow] BryanCutler commented on a change in pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
BryanCutler commented on a change in pull request #8337: URL: https://github.com/apache/arrow/pull/8337#discussion_r500780264 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -791,6 +791,117 @@ Status ConvertListsLike(const PandasOptions& options, const ChunkedArray

[GitHub] [arrow] BryanCutler commented on pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
BryanCutler commented on pull request #8337: URL: https://github.com/apache/arrow/pull/8337#issuecomment-704745055 Thanks for reviewing @pitrou , I updated but still had a couple questions and I'm not totally sure I got the reference counting right. I'll have to take a closer look tomorrow

[GitHub] [arrow] kou commented on pull request #8373: ARROW-10202: [CI][Windows] Use sf.net mirror for MSYS2

2020-10-07 Thread GitBox
kou commented on pull request #8373: URL: https://github.com/apache/arrow/pull/8373#issuecomment-704759165 +1 @nealrichardson FYI: We can revert this change once http://repo.msys2.org/ is up again. This is an automate

[GitHub] [arrow] kou closed pull request #8373: ARROW-10202: [CI][Windows] Use sf.net mirror for MSYS2

2020-10-07 Thread GitBox
kou closed pull request #8373: URL: https://github.com/apache/arrow/pull/8373 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] thamht4190 opened a new pull request #8375: ARROW-9318: [C++] Two level cache with expiraton

2020-10-07 Thread GitBox
thamht4190 opened a new pull request #8375: URL: https://github.com/apache/arrow/pull/8375 This pull partly implements the ticket ARROW-9318 and is extracted from the mother pull https://github.com/apache/arrow/pull/8023. This part is about the cache to cache a concurrent map for each ac

[GitHub] [arrow] jorisvandenbossche closed pull request #8352: ARROW-10178: [CI] Remove patch to fix Spark master build

2020-10-07 Thread GitBox
jorisvandenbossche closed pull request #8352: URL: https://github.com/apache/arrow/pull/8352 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] emkornfield commented on pull request #8363: ARROW-10174: [Java] Fix reading/writing dict structs

2020-10-07 Thread GitBox
emkornfield commented on pull request #8363: URL: https://github.com/apache/arrow/pull/8363#issuecomment-704767430 @liyafan82 do you have time to review? This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] emkornfield opened a new pull request #8376: ARROW-7960: Add support fo reading additional types

2020-10-07 Thread GitBox
emkornfield opened a new pull request #8376: URL: https://github.com/apache/arrow/pull/8376 New types supported: - Fixed Size list (will throw an incorrect error if nulls are present, but this is best fixed after we recursively support applying types). - LargeList - Maps

[GitHub] [arrow] jorisvandenbossche commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8343: URL: https://github.com/apache/arrow/pull/8343#issuecomment-704765279 Remaining failures look unrelated? This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow] emkornfield commented on pull request #7110: ARROW-8952: [C++] WIP Support for textual, JSON schema representation

2020-10-07 Thread GitBox
emkornfield commented on pull request #7110: URL: https://github.com/apache/arrow/pull/7110#issuecomment-704768720 @chrish42 are you still working on this? Maybe close and then reopen the PR once you have the bandwidth to get something working? ---

[GitHub] [arrow] emkornfield commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

2020-10-07 Thread GitBox
emkornfield commented on pull request #8244: URL: https://github.com/apache/arrow/pull/8244#issuecomment-704769151 @jorisvandenbossche can this be merged now? This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] emkornfield commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-10-07 Thread GitBox
emkornfield commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-704771435 @wjones1 I think the ming/windows ones are ignorable. Lint one is real: ``` INFO:archery:Running Python formatter (autopep8) INFO:archery:Running Python linter (

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8340: ARROW-10165: [Rust] [DataFusion]: Remove special case DataFusion casting checks in favor of Arrow cast kernel

2020-10-07 Thread GitBox
jorgecarleitao commented on a change in pull request #8340: URL: https://github.com/apache/arrow/pull/8340#discussion_r500710152 ## File path: rust/datafusion/src/logical_plan/mod.rs ## @@ -323,21 +322,19 @@ impl Expr { /// /// # Errors /// -/// This function

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500820685 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -298,7 +298,14 @@ class PyValue { value = internal::PyDelta_to_us(dt);

[GitHub] [arrow] github-actions[bot] commented on pull request #8375: ARROW-9318: [C++] Two level cache with expiraton

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8375: URL: https://github.com/apache/arrow/pull/8375#issuecomment-704779353 https://issues.apache.org/jira/browse/ARROW-9318 This is an automated message from the Apache Git Serv

[GitHub] [arrow] github-actions[bot] commented on pull request #8376: ARROW-7960: Add support fo reading additional types

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8376: URL: https://github.com/apache/arrow/pull/8376#issuecomment-704779354 https://issues.apache.org/jira/browse/ARROW-7960 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jduo commented on pull request #8325: ARROW-10206: [C++][Python][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
jduo commented on pull request #8325: URL: https://github.com/apache/arrow/pull/8325#issuecomment-704792871 I've changed this PR to cover just C++ and Python. This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] jduo opened a new pull request #8377: ARROW-10205: [Java][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
jduo opened a new pull request #8377: URL: https://github.com/apache/arrow/pull/8377 - Add option to Java FlightClient.Builder to turn off server verification (verification is on by default). This is an automated message f

[GitHub] [arrow] jduo commented on pull request #8377: ARROW-10205: [Java][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
jduo commented on pull request #8377: URL: https://github.com/apache/arrow/pull/8377#issuecomment-704794424 Separated out the Java work to this PR and the C++/Python work to #8323 FYI @lidavidm This is an automated messa

[GitHub] [arrow] jduo commented on pull request #8325: ARROW-10206: [C++][Python][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
jduo commented on pull request #8325: URL: https://github.com/apache/arrow/pull/8325#issuecomment-704794645 The Java work has been moved to #8377. This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] github-actions[bot] commented on pull request #8325: ARROW-10206: [C++][Python][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8325: URL: https://github.com/apache/arrow/pull/8325#issuecomment-704796210 https://issues.apache.org/jira/browse/ARROW-10206 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #8377: ARROW-10205: [Java][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8377: URL: https://github.com/apache/arrow/pull/8377#issuecomment-704796206 https://issues.apache.org/jira/browse/ARROW-10205 This is an automated message from the Apache Git Ser

[GitHub] [arrow] rdettai commented on pull request #8300: ARROW-10135: [Rust] [Parquet] Refactor file module to help adding sources

2020-10-07 Thread GitBox
rdettai commented on pull request #8300: URL: https://github.com/apache/arrow/pull/8300#issuecomment-704796994 The discussion with @alamb about the need for an intermediate layer when reading a parquet file is discussed on [JIRA](https://issues.apache.org/jira/browse/ARROW-10135) Th

[GitHub] [arrow] pitrou commented on a change in pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8366: URL: https://github.com/apache/arrow/pull/8366#discussion_r500861539 ## File path: cpp/src/parquet/arrow/schema.cc ## @@ -688,32 +688,21 @@ Status GetOriginSchema(const std::shared_ptr& metadata, Result ApplyOriginalMetadat

[GitHub] [arrow] pitrou commented on a change in pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8366: URL: https://github.com/apache/arrow/pull/8366#discussion_r500862082 ## File path: cpp/src/parquet/arrow/schema.cc ## @@ -689,10 +686,62 @@ Status GetOriginSchema(const std::shared_ptr& metadata, // but that is not necessaril

[GitHub] [arrow] pitrou commented on a change in pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8366: URL: https://github.com/apache/arrow/pull/8366#discussion_r500862229 ## File path: cpp/src/parquet/arrow/schema.cc ## @@ -725,23 +778,18 @@ Status ApplyOriginalStorageMetadata(const Field& origin_field, SchemaField* infe

[GitHub] [arrow] pitrou commented on a change in pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8366: URL: https://github.com/apache/arrow/pull/8366#discussion_r500863193 ## File path: cpp/src/parquet/arrow/schema.h ## @@ -91,7 +91,6 @@ struct PARQUET_EXPORT SchemaField { std::shared_ptr<::arrow::Field> field; // If fiel

[GitHub] [arrow] jorisvandenbossche commented on pull request #8317: ARROW-10134: [Python][Dataset] Add ParquetFileFragment.num_row_groups

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8317: URL: https://github.com/apache/arrow/pull/8317#issuecomment-704808851 > you want f.num_row_groups to potentially perform IO? Yes, that's indeed the consequence for now (if the metadata was not yet parsed before). Long term I would l

[GitHub] [arrow] pitrou commented on a change in pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8337: URL: https://github.com/apache/arrow/pull/8337#discussion_r500866571 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -791,6 +791,117 @@ Status ConvertListsLike(const PandasOptions& options, const ChunkedArray& dat

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500867702 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -809,6 +809,475 @@ struct IsUpperAscii : CharacterPredicateAscii { } }; +/

[GitHub] [arrow] pitrou commented on a change in pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8337: URL: https://github.com/apache/arrow/pull/8337#discussion_r500867795 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -791,6 +791,117 @@ Status ConvertListsLike(const PandasOptions& options, const ChunkedArray& dat

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500868020 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -809,6 +809,475 @@ struct IsUpperAscii : CharacterPredicateAscii { } }; +/

[GitHub] [arrow] pitrou commented on a change in pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8337: URL: https://github.com/apache/arrow/pull/8337#discussion_r500868159 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -791,6 +791,117 @@ Status ConvertListsLike(const PandasOptions& options, const ChunkedArray& dat

[GitHub] [arrow] pitrou commented on a change in pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8337: URL: https://github.com/apache/arrow/pull/8337#discussion_r500868740 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -791,6 +791,111 @@ Status ConvertListsLike(PandasOptions options, const ChunkedArray& data, r

[GitHub] [arrow] pitrou commented on pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou commented on pull request #8337: URL: https://github.com/apache/arrow/pull/8337#issuecomment-704816126 Will merge if CI is ok, thank you. This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500873548 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_subs

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500874404 ## File path: python/pyarrow/tests/test_compute.py ## @@ -230,6 +230,51 @@ def test_match_substring(): assert expected.equals(result) +def

[GitHub] [arrow] pitrou commented on a change in pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8366: URL: https://github.com/apache/arrow/pull/8366#discussion_r500875953 ## File path: cpp/src/parquet/arrow/schema.cc ## @@ -689,10 +686,62 @@ Status GetOriginSchema(const std::shared_ptr& metadata, // but that is not necessaril

[GitHub] [arrow] pitrou commented on pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou commented on pull request #8366: URL: https://github.com/apache/arrow/pull/8366#issuecomment-704819253 Will merge if CI green. Thank you for the reviews! This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] pitrou commented on a change in pull request #8374: ARROW-10203: Give guidance on big-endian support in the contributors docs

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8374: URL: https://github.com/apache/arrow/pull/8374#discussion_r500877701 ## File path: docs/source/developers/contributing.rst ## @@ -304,3 +304,40 @@ to your branch, which they sometimes do to help move a pull request along. In

[GitHub] [arrow] pitrou commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500879612 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_substring_opt

[GitHub] [arrow] pitrou commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

2020-10-07 Thread GitBox
pitrou commented on pull request #8343: URL: https://github.com/apache/arrow/pull/8343#issuecomment-704824694 Yes, they are. This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] pitrou closed pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

2020-10-07 Thread GitBox
pitrou closed pull request #8343: URL: https://github.com/apache/arrow/pull/8343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou commented on pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on pull request #8271: URL: https://github.com/apache/arrow/pull/8271#issuecomment-704826423 @maartenbreddels Please ping me when you need a new review. This is an automated message from the Apache Git Servic

[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8149: URL: https://github.com/apache/arrow/pull/8149#issuecomment-704827008 @github-actions crossbow submit -g integration This is an automated message from the Apache Git Service

[GitHub] [arrow] pitrou commented on a change in pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8255: URL: https://github.com/apache/arrow/pull/8255#discussion_r500886938 ## File path: python/pyarrow/__init__.py ## @@ -207,6 +207,28 @@ def show_versions(): import pyarrow.types as types + +if _sys.version_info >= (3, 7): +

[GitHub] [arrow] pitrou commented on a change in pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8255: URL: https://github.com/apache/arrow/pull/8255#discussion_r500888011 ## File path: python/pyarrow/serialization.py ## @@ -482,7 +490,23 @@ def register_default_serialization_handlers(serialization_context): _register_pyd

[GitHub] [arrow] pitrou commented on a change in pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8255: URL: https://github.com/apache/arrow/pull/8255#discussion_r500888167 ## File path: python/pyarrow/serialization.py ## @@ -482,7 +490,23 @@ def register_default_serialization_handlers(serialization_context): _register_pyd

[GitHub] [arrow] pitrou commented on a change in pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8255: URL: https://github.com/apache/arrow/pull/8255#discussion_r500888667 ## File path: python/pyarrow/tests/test_serialization.py ## @@ -52,6 +52,9 @@ sparse = None +pytestmark = pytest.mark.filterwarnings("ignore:'pyarro

[GitHub] [arrow] xhochy commented on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
xhochy commented on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704832356 @github-actions crossbow submit conda-linux-gcc-py36-cpu This is an automated message from the Apache Git S

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #8255: URL: https://github.com/apache/arrow/pull/8255#discussion_r500891790 ## File path: python/pyarrow/tests/test_serialization.py ## @@ -52,6 +52,9 @@ sparse = None +pytestmark = pytest.mark.filterwarnings("ig

[GitHub] [arrow] github-actions[bot] commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8149: URL: https://github.com/apache/arrow/pull/8149#issuecomment-704833499 Revision: dfdc62246bb4bb450bbbaba2b4247d3edeeb2264 Submitted crossbow builds: [ursa-labs/crossbow @ actions-607](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] alamb commented on pull request #8340: ARROW-10165: [Rust] [DataFusion]: Remove special case DataFusion casting checks in favor of Arrow cast kernel

2020-10-07 Thread GitBox
alamb commented on pull request #8340: URL: https://github.com/apache/arrow/pull/8340#issuecomment-704834619 @jorgecarleitao > Doesn't this mean that the plan can fail arbitrarily when a user performs an impossible cast? This can happen like 10hs after the execution starts, when th

[GitHub] [arrow] alamb commented on a change in pull request #8340: ARROW-10165: [Rust] [DataFusion]: Remove special case DataFusion casting checks in favor of Arrow cast kernel

2020-10-07 Thread GitBox
alamb commented on a change in pull request #8340: URL: https://github.com/apache/arrow/pull/8340#discussion_r500443182 ## File path: rust/datafusion/src/logical_plan/mod.rs ## @@ -323,21 +322,19 @@ impl Expr { /// /// # Errors /// -/// This function errors w

[GitHub] [arrow] alamb closed pull request #8340: ARROW-10165: [Rust] [DataFusion]: Remove special case DataFusion casting checks in favor of Arrow cast kernel

2020-10-07 Thread GitBox
alamb closed pull request #8340: URL: https://github.com/apache/arrow/pull/8340 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy commented on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
xhochy commented on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704835713 @kszucs Do you have an idea why my new `r_config` parameter doesn't work? I get the following traceback by the bot: ``` Cloning into '/tmp/tmpzsnkz8m4/arrow'... From ht

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8317: ARROW-10134: [Python][Dataset] Add ParquetFileFragment.num_row_groups

2020-10-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #8317: URL: https://github.com/apache/arrow/pull/8317#discussion_r500896637 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -530,17 +548,17 @@ Status ParquetFileFragment::EnsureCompleteMetadata(parquet::arrow::

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500897047 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_subs

[GitHub] [arrow] pitrou commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500898893 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_substring_opt

[GitHub] [arrow] pitrou commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500898893 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_substring_opt

[GitHub] [arrow] pitrou commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500899382 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_substring_opt

[GitHub] [arrow] dhirschfeld commented on pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
dhirschfeld commented on pull request #8255: URL: https://github.com/apache/arrow/pull/8255#issuecomment-704841013 > *what do you use serialize for currently?* I've just got a proof-of-concept arrow serialization framework which can serialize arbitrary Python objects (inheriting from

[GitHub] [arrow] pitrou commented on pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on pull request #8255: URL: https://github.com/apache/arrow/pull/8255#issuecomment-704842775 `pickle` is the best thing available for arbitrary Python types and heteregenous data, IMO. This is an automated m

[GitHub] [arrow] dhirschfeld commented on pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
dhirschfeld commented on pull request #8255: URL: https://github.com/apache/arrow/pull/8255#issuecomment-704844598 > `pickle` is the best thing available for arbitrary Python types and heterogenous data, IMO. Yep, but I also want to be able to read the serialized data in from R, Typ

[GitHub] [arrow] pitrou commented on pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
pitrou commented on pull request #8255: URL: https://github.com/apache/arrow/pull/8255#issuecomment-704844966 Then you'll have to invent your own serialization format, or find another existing one. This is an automated messa

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500907752 ## File path: python/pyarrow/_compute.pyx ## @@ -560,6 +560,29 @@ cdef class MatchSubstringOptions(FunctionOptions): return self.match_subs

[GitHub] [arrow] alamb commented on a change in pull request #8300: ARROW-10135: [Rust] [Parquet] Refactor file module to help adding sources

2020-10-07 Thread GitBox
alamb commented on a change in pull request #8300: URL: https://github.com/apache/arrow/pull/8300#discussion_r500903617 ## File path: rust/parquet/src/util/cursor.rs ## @@ -0,0 +1,203 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500909112 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -809,6 +809,475 @@ struct IsUpperAscii : CharacterPredicateAscii { } }; +/

[GitHub] [arrow] dhirschfeld commented on pull request #8255: ARROW-9518: [Python] Deprecate pyarrow serialization

2020-10-07 Thread GitBox
dhirschfeld commented on pull request #8255: URL: https://github.com/apache/arrow/pull/8255#issuecomment-704850273 Yeah, that's what I figured 😞. I've previously invented my own (with protocol buffers) but it's a *big* job so I was hoping to leverage off existing efforts. Will have to do s

[GitHub] [arrow] kszucs commented on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
kszucs commented on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704852898 @xhochy you need to add the `r_config` param to all tasks using the same yml template. If you wan't to set it for just specific builds then handle it as an optional variable from j

[GitHub] [arrow] alamb commented on a change in pull request #8364: ARROW-5350: [Rust] Allow filtering on simple lists

2020-10-07 Thread GitBox
alamb commented on a change in pull request #8364: URL: https://github.com/apache/arrow/pull/8364#discussion_r500910829 ## File path: rust/arrow/src/compute/kernels/filter.rs ## @@ -230,6 +231,86 @@ macro_rules! filter_dictionary_array { }}; } +macro_rules! filter_primi

[GitHub] [arrow] kszucs edited a comment on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
kszucs edited a comment on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704852898 @xhochy you need to add the `r_config` param to all tasks using the same yml template. If you wan't to set it for just specific builds then handle it as an optional variable

[GitHub] [arrow] pitrou closed pull request #8337: ARROW-10151: [Python] Add support for MapArray conversion to Pandas

2020-10-07 Thread GitBox
pitrou closed pull request #8337: URL: https://github.com/apache/arrow/pull/8337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on a change in pull request #8370: ARROW-10015: [Rust] Simd aggregate kernels

2020-10-07 Thread GitBox
alamb commented on a change in pull request #8370: URL: https://github.com/apache/arrow/pull/8370#discussion_r500915629 ## File path: rust/arrow/src/array/array.rs ## @@ -1907,15 +1907,16 @@ impl TryFrom> for StructArray { let mut null: Option = None; for (fie

[GitHub] [arrow] xhochy commented on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
xhochy commented on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704867308 @github-actions crossbow submit conda-linux-gcc-py36-cpu This is an automated message from the Apache Git S

[GitHub] [arrow] github-actions[bot] commented on pull request #8371: WIP: ARROW-4960: [R] Build r-arrow conda package in crossbow

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8371: URL: https://github.com/apache/arrow/pull/8371#issuecomment-704868371 Revision: b642d015ef7bf0b3780ced01ba162ed387e14d24 Submitted crossbow builds: [ursa-labs/crossbow @ actions-608](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] xhochy commented on a change in pull request #8325: ARROW-10206: [C++][Python][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
xhochy commented on a change in pull request #8325: URL: https://github.com/apache/arrow/pull/8325#discussion_r500933507 ## File path: cpp/src/arrow/flight/client.cc ## @@ -34,6 +34,9 @@ #include #endif +#include "grpc/grpc_security_constants.h" +#include "grpcpp/security/

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500945878 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -1894,18 +1937,30 @@ def test_dictionary_from_strings(): assert a.dictionary.equals(exp

[GitHub] [arrow] pitrou closed pull request #8366: ARROW-9943: [C++] Recursively apply Arrow metadata when reading from Parquet

2020-10-07 Thread GitBox
pitrou closed pull request #8366: URL: https://github.com/apache/arrow/pull/8366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500948820 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -298,7 +298,14 @@ class PyValue { value = internal::PyDelta_to_us(dt); brea

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500950247 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -66,6 +66,11 @@ static void MatchSubstring(benchmark::State& state) {

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500951363 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -809,6 +809,475 @@ struct IsUpperAscii : CharacterPredicateAscii { } }; +/

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #8271: URL: https://github.com/apache/arrow/pull/8271#discussion_r500954601 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -809,6 +809,475 @@ struct IsUpperAscii : CharacterPredicateAscii { } };

[GitHub] [arrow] maartenbreddels commented on pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
maartenbreddels commented on pull request #8271: URL: https://github.com/apache/arrow/pull/8271#issuecomment-704888455 @pitrou @jorisvandenbossche I think this is ready for review, provided the points below are not an issue. Open questions * kernel name: string_split_pattern vs s

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500957250 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -298,7 +298,14 @@ class PyValue { value = internal::PyDelta_to_us(dt); brea

[GitHub] [arrow] pitrou commented on pull request #8271: ARROW-9991: [C++] split kernels for strings/binary

2020-10-07 Thread GitBox
pitrou commented on pull request #8271: URL: https://github.com/apache/arrow/pull/8271#issuecomment-704890478 > I can imagine that we also want to support binary_split_pattern, so should we have to separate kernel names for this, or keep it split_pattern and implement support for binary in

[GitHub] [arrow] kszucs commented on pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on pull request #8349: URL: https://github.com/apache/arrow/pull/8349#issuecomment-704890653 @github-actions crossbow submit test-conda-python-3.8-hypothesis This is an automated message from the Apache Git S

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500958013 ## File path: python/pyarrow/tests/strategies.py ## @@ -293,3 +359,32 @@ def tables(draw, type, rows=None, max_fields=None): all_chunked_arrays = chunked_ar

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500959959 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1004,6 +991,9 @@ Result> ConvertPySequence(PyObject* obj, PyObject* PyObject* seq; OwnedR

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500961615 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1004,6 +991,9 @@ Result> ConvertPySequence(PyObject* obj, PyObject* PyObject* seq; OwnedR

[GitHub] [arrow] github-actions[bot] commented on pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
github-actions[bot] commented on pull request #8349: URL: https://github.com/apache/arrow/pull/8349#issuecomment-704895599 Revision: 018e03ef9d6609a7fdc5e8e2d884c958cf38b192 Submitted crossbow builds: [ursa-labs/crossbow @ actions-609](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] jorisvandenbossche commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8343: URL: https://github.com/apache/arrow/pull/8343#issuecomment-704897806 Thanks @bkietz ! This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] jorisvandenbossche commented on pull request #8149: ARROW-9645: [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8149: URL: https://github.com/apache/arrow/pull/8149#issuecomment-704897703 @pitrou @kszucs more feedback here? This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow] jorisvandenbossche commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

2020-10-07 Thread GitBox
jorisvandenbossche commented on pull request #8244: URL: https://github.com/apache/arrow/pull/8244#issuecomment-704898306 I don't think @arw2019 already pushed the update? (although answered to the comments) This is an autom

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500967626 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1004,6 +991,9 @@ Result> ConvertPySequence(PyObject* obj, PyObject* PyObject* seq; OwnedR

[GitHub] [arrow] lidavidm closed pull request #8377: ARROW-10205: [Java][FlightRPC] Allow disabling server validation

2020-10-07 Thread GitBox
lidavidm closed pull request #8377: URL: https://github.com/apache/arrow/pull/8377 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] kszucs commented on a change in pull request #8349: ARROW-3080: [Python] Unify Arrow to Python object conversion paths

2020-10-07 Thread GitBox
kszucs commented on a change in pull request #8349: URL: https://github.com/apache/arrow/pull/8349#discussion_r500970961 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1004,6 +991,9 @@ Result> ConvertPySequence(PyObject* obj, PyObject* PyObject* seq; OwnedR

[GitHub] [arrow] jhorstmann commented on a change in pull request #8370: ARROW-10015: [Rust] Simd aggregate kernels

2020-10-07 Thread GitBox
jhorstmann commented on a change in pull request #8370: URL: https://github.com/apache/arrow/pull/8370#discussion_r500977682 ## File path: rust/arrow/src/array/array.rs ## @@ -1907,15 +1907,16 @@ impl TryFrom> for StructArray { let mut null: Option = None; for

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-10-07 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r500979025 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,203 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

  1   2   3   4   >