[jira] [Resolved] (ARROW-11350) [C++] Bump dependency versions

2021-02-02 Thread Kouhei Sutou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-11350. -- Resolution: Fixed Issue resolved by pull request 9296 [https://github.com/apache/arrow/pull/92

[jira] [Updated] (ARROW-11470) [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColumnMajorStrides, and CheckTensorStridesValidity

2021-02-02 Thread Kenta Murata (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenta Murata updated ARROW-11470: - Description: OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides function occ

[jira] [Updated] (ARROW-11470) [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColumnMajorStrides, and CheckTensorStridesValidity

2021-02-02 Thread Kenta Murata (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenta Murata updated ARROW-11470: - Summary: [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColu

[jira] [Closed] (ARROW-11398) [C++][Compute] Test failures with gcc-9.3 on aarch64

2021-02-02 Thread Yibo Cai (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibo Cai closed ARROW-11398. Resolution: Won't Fix To summarize: - gcc-9.3 aarch64 auto vectorization generates buggy code for this [co

[jira] [Commented] (ARROW-11398) [C++][Compute] Test failures with gcc-9.3 on aarch64

2021-02-02 Thread Yibo Cai (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277665#comment-17277665 ] Yibo Cai commented on ARROW-11398: -- Wrote a simple test program to reproduce this issue

[jira] [Comment Edited] (ARROW-10255) [JS] Reorganize imports and exports to be more friendly to ESM tree-shaking

2021-02-02 Thread Paul Taylor (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277659#comment-17277659 ] Paul Taylor edited comment on ARROW-10255 at 2/3/21, 3:44 AM:

[jira] [Commented] (ARROW-10255) [JS] Reorganize imports and exports to be more friendly to ESM tree-shaking

2021-02-02 Thread Paul Taylor (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277659#comment-17277659 ] Paul Taylor commented on ARROW-10255: - [~bhulette] I vote no on the current PR for 3

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Tao He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277647#comment-17277647 ] Tao He commented on ARROW-11463: Thanks for the background of pickle 5 [~lausen] . [~la

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final = pd.re

[jira] [Updated] (ARROW-11479) Add method to return compressed size of row group

2021-02-02 Thread Manoj Karthick (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Karthick updated ARROW-11479: --- Issue Type: New Feature (was: Improvement) > Add method to return compressed size of row gr

[jira] [Updated] (ARROW-11479) Add method to return compressed size of row group

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11479: --- Labels: pull-request-available (was: ) > Add method to return compressed size of row group

[jira] [Created] (ARROW-11479) Add method to return compressed size of row group

2021-02-02 Thread Manoj Karthick (Jira)
Manoj Karthick created ARROW-11479: -- Summary: Add method to return compressed size of row group Key: ARROW-11479 URL: https://issues.apache.org/jira/browse/ARROW-11479 Project: Apache Arrow

[jira] [Commented] (ARROW-11478) [R] Consider ways to make arrow.skip_nul option more user-friendly

2021-02-02 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277628#comment-17277628 ] Ian Cook commented on ARROW-11478: -- I think I'm in favor of option 3, assuming it's fea

[jira] [Created] (ARROW-11478) [R] Consider ways to make arrow.skip_nul option more user-friendly

2021-02-02 Thread Ian Cook (Jira)
Ian Cook created ARROW-11478: Summary: [R] Consider ways to make arrow.skip_nul option more user-friendly Key: ARROW-11478 URL: https://issues.apache.org/jira/browse/ARROW-11478 Project: Apache Arrow

[jira] [Resolved] (ARROW-951) [JS] Fix generated API documentation

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-951. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9375 [https://git

[jira] [Resolved] (ARROW-11467) [R] Fix reference to json_table_reader() in R docs

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-11467. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9393 [https

[jira] [Commented] (ARROW-11433) [R] Unexpectedly slow results reading csv

2021-02-02 Thread Jonathan Keane (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277546#comment-17277546 ] Jonathan Keane commented on ARROW-11433: Yeah, I tried it with the system alloca

[jira] [Commented] (ARROW-11433) [R] Unexpectedly slow results reading csv

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277544#comment-17277544 ] Neal Richardson commented on ARROW-11433: - "Only on mac" and "freeing memory" ma

[jira] [Commented] (ARROW-11433) [R] Unexpectedly slow results reading csv

2021-02-02 Thread Jonathan Keane (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277543#comment-17277543 ] Jonathan Keane commented on ARROW-11433: We thought it might be due to the mmapi

[jira] [Commented] (ARROW-11433) [R] Unexpectedly slow results reading csv

2021-02-02 Thread Jonathan Keane (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277537#comment-17277537 ] Jonathan Keane commented on ARROW-11433: Ben and I spent some time on this today

[jira] [Updated] (ARROW-11477) [R][Doc] Reorganize and improve README and vignette content

2021-02-02 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated ARROW-11477: - Description: Collecting various ideas here for general ways to improve the R package README and vignett

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277515#comment-17277515 ] Leonard Lausen commented on ARROW-11463: Thank you for sharing the tests / examp

[jira] [Resolved] (ARROW-11310) [Rust] Implement arrow JSON writer

2021-02-02 Thread Andrew Lamb (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11310. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9256 [https://githu

[jira] [Commented] (ARROW-11477) [R][Doc] Reorganize and improve README and vignette content

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277511#comment-17277511 ] Neal Richardson commented on ARROW-11477: - Re the "Using the Arrow C++ Library i

[jira] [Created] (ARROW-11477) [R][Doc] Reorganize and improve README and vignette content

2021-02-02 Thread Ian Cook (Jira)
Ian Cook created ARROW-11477: Summary: [R][Doc] Reorganize and improve README and vignette content Key: ARROW-11477 URL: https://issues.apache.org/jira/browse/ARROW-11477 Project: Apache Arrow I

[jira] [Closed] (ARROW-11474) [C++] Update bundled re2 version

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson closed ARROW-11474. --- Assignee: Kouhei Sutou Resolution: Duplicate Done in ARROW-11350 after all > [C++] Up

[jira] [Updated] (ARROW-11476) [Rust][DataFusion] Test running of TPCH benchmarks in CI

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11476: --- Labels: pull-request-available (was: ) > [Rust][DataFusion] Test running of TPCH benchmarks

[jira] [Created] (ARROW-11476) [Rust][DataFusion] Test running of TPCH benchmarks in CI

2021-02-02 Thread Jira
Daniël Heres created ARROW-11476: Summary: [Rust][DataFusion] Test running of TPCH benchmarks in CI Key: ARROW-11476 URL: https://issues.apache.org/jira/browse/ARROW-11476 Project: Apache Arrow

[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread Ali Cetin (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277432#comment-17277432 ] Ali Cetin commented on ARROW-11427: --- Cool. I can give it a try in the coming days. >

[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277421#comment-17277421 ] Antoine Pitrou commented on ARROW-11427: (removed previous post, sorry) [~ali.c

[jira] [Resolved] (ARROW-11435) Allow creating ParquetPartition from external crate

2021-02-02 Thread Andrew Lamb (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11435. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9369 [https://githu

[jira] [Created] (ARROW-11475) [C++] Upgrade mimalloc

2021-02-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11475: --- Summary: [C++] Upgrade mimalloc Key: ARROW-11475 URL: https://issues.apache.org/jira/browse/ARROW-11475 Project: Apache Arrow Issue Type: New Feature

[jira] [Updated] (ARROW-11474) [C++] Update bundled re2 version

2021-02-02 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-11474: Description: I tried increasing the re2 version to 2020-11-01 in ARROW-11350 but it failed

[jira] [Created] (ARROW-11474) [C++] Update bundled re2 version

2021-02-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11474: --- Summary: [C++] Update bundled re2 version Key: ARROW-11474 URL: https://issues.apache.org/jira/browse/ARROW-11474 Project: Apache Arrow Issue Type: New

[jira] [Issue Comment Deleted] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11427: --- Comment: was deleted (was: [~ali.cetin] Could you try installing this wheel and see if it fi

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277392#comment-17277392 ] Antoine Pitrou commented on ARROW-11463: PyArrow serialization is deprecated, us

[jira] [Updated] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11427: --- Labels: pull-request-available (was: ) > [C++] Arrow uses AVX512 instructions even when not

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277373#comment-17277373 ] Leonard Lausen commented on ARROW-11463: Specifically, do you mean that PyArrow

[jira] [Updated] (ARROW-11308) [Rust] [Parquet] Add Arrow decimal array writer

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11308: --- Labels: pull-request-available (was: ) > [Rust] [Parquet] Add Arrow decimal array writer >

[jira] [Commented] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277352#comment-17277352 ] Antoine Pitrou commented on ARROW-11427: [~ali.cetin] Could you try installing t

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277348#comment-17277348 ] Leonard Lausen commented on ARROW-11463: Thank you [~apitrou] for the background

[jira] [Updated] (ARROW-11469) [Python] Performance degradation parquet reading of wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11469: -- Summary: [Python] Performance degradation parquet reading of wide dataframes

[jira] [Updated] (ARROW-11469) [Python] Performance degradation wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11469: -- Summary: [Python] Performance degradation wide dataframes (was: Performance d

[jira] [Resolved] (ARROW-11421) [Rust][DataFusion] Support group by Date32

2021-02-02 Thread Andrew Lamb (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11421. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9355 [https://githu

[jira] [Resolved] (ARROW-11442) [Rust] Expose the logic used to interpret date/times

2021-02-02 Thread Andrew Lamb (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11442. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9378 [https://githu

[jira] [Updated] (ARROW-11427) [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashes after upgrading to pyarrow 3.0

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11427: --- Component/s: C++ > [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashes after upgradi

[jira] [Updated] (ARROW-11427) [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashes after upgrading to pyarrow 3.0

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11427: --- Priority: Major (was: Blocker) > [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashe

[jira] [Updated] (ARROW-11427) [C++] Arrow uses AVX512 instructions even when not supported by the OS

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11427: --- Summary: [C++] Arrow uses AVX512 instructions even when not supported by the OS (was: [Pyth

[jira] [Updated] (ARROW-11427) [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashes after upgrading to pyarrow 3.0

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11427: --- Fix Version/s: 4.0.0 > [Python] Windows Server 2012 w/ Xeon Platinum 8171M crashes after upg

[jira] [Updated] (ARROW-11473) Needs a handling for missing columns while reading parquet file

2021-02-02 Thread jason khadka (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jason khadka updated ARROW-11473: - Description: Currently there is no way to handle the error raised by missing columns in parquet

[jira] [Created] (ARROW-11473) Needs a handling for missing columns while reading parquet file

2021-02-02 Thread jason khadka (Jira)
jason khadka created ARROW-11473: Summary: Needs a handling for missing columns while reading parquet file Key: ARROW-11473 URL: https://issues.apache.org/jira/browse/ARROW-11473 Project: Apache Arro

[jira] [Comment Edited] (ARROW-11469) Performance degradation wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277261#comment-17277261 ] Joris Van den Bossche edited comment on ARROW-11469 at 2/2/21, 4:27 PM: --

[jira] [Updated] (ARROW-11469) Performance degradation wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11469: -- Attachment: profile_wide300.svg > Performance degradation wide dataframes > --

[jira] [Commented] (ARROW-11469) Performance degradation wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277261#comment-17277261 ] Joris Van den Bossche commented on ARROW-11469: --- [~Axelg1] Thanks for the

[jira] [Resolved] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-11463. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9394 [https:/

[jira] [Resolved] (ARROW-11462) [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-11462. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9391 [https:/

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277239#comment-17277239 ] Joris Van den Bossche commented on ARROW-11456: --- bq. If you still need co

[jira] [Comment Edited] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277234#comment-17277234 ] Pac A. He edited comment on ARROW-11456 at 2/2/21, 4:12 PM:

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final = pd.re

[jira] [Updated] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pac A. He updated ARROW-11456: -- Description: When reading a large parquet file, I have this error:   {noformat} df: Final = pd.re

[jira] [Commented] (ARROW-11456) [Python] Parquet reader cannot read large strings

2021-02-02 Thread Pac A. He (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277234#comment-17277234 ] Pac A. He commented on ARROW-11456: --- For what it's worth, {{fastparquet}} v0.5.0 had n

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277230#comment-17277230 ] Antoine Pitrou commented on ARROW-11463: [~lausen] I'm not sure your question ha

[jira] [Updated] (ARROW-11469) Performance degradation wide dataframes

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11469: -- Description: I noticed a relatively big performance degradation in version 1.0

[jira] [Comment Edited] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277170#comment-17277170 ] Leonard Lausen edited comment on ARROW-11463 at 2/2/21, 2:52 PM: -

[jira] [Comment Edited] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277170#comment-17277170 ] Leonard Lausen edited comment on ARROW-11463 at 2/2/21, 2:51 PM: -

[jira] [Commented] (ARROW-11463) Allow configuration of IpcWriterOptions 64Bit from PyArrow

2021-02-02 Thread Leonard Lausen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277170#comment-17277170 ] Leonard Lausen commented on ARROW-11463:  Thank you Tao! How can we specify the

[jira] [Commented] (ARROW-11400) [Python] Pickled ParquetFileFragment has invalid partition_expresion with dictionary type in pyarrow 2.0

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277142#comment-17277142 ] Joris Van den Bossche commented on ARROW-11400: --- Marking it as 3.0.0, as i

[jira] [Resolved] (ARROW-11400) [Python] Pickled ParquetFileFragment has invalid partition_expresion with dictionary type in pyarrow 2.0

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-11400. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Updated] (ARROW-11472) [Python][CI] Kartothek integrations build is failing with numpy 1.20

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11472: --- Labels: pull-request-available (was: ) > [Python][CI] Kartothek integrations build is faili

[jira] [Assigned] (ARROW-11472) [Python][CI] Kartothek integrations build is failing with numpy 1.20

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-11472: - Assignee: Joris Van den Bossche > [Python][CI] Kartothek integrations b

[jira] [Commented] (ARROW-11472) [Python][CI] Kartothek integrations build is failing with numpy 1.20

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277113#comment-17277113 ] Joris Van den Bossche commented on ARROW-11472: --- Looking into this, and th

[jira] [Updated] (ARROW-11472) [Python][CI] Kartothek integrations build is failing with numpy 1.20

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11472: -- Description: See eg https://github.com/ursacomputing/crossbow/runs/1804464537,

[jira] [Created] (ARROW-11472) [Python][CI] Kartothek integrations build is failing with numpy 1.20

2021-02-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-11472: - Summary: [Python][CI] Kartothek integrations build is failing with numpy 1.20 Key: ARROW-11472 URL: https://issues.apache.org/jira/browse/ARROW-11472

[jira] [Assigned] (ARROW-7288) [C++][R] read_parquet() freezes on Windows with Japanese locale

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-7288: - Assignee: Kouhei Sutou (was: Neal Richardson) > [C++][R] read_parquet() freezes on Wind

[jira] [Resolved] (ARROW-7288) [C++][R] read_parquet() freezes on Windows with Japanese locale

2021-02-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-7288. --- Resolution: Fixed Issue resolved by pull request 9367 [https://github.com/apache/arrow/pull/9

[jira] [Commented] (ARROW-11410) [Rust][Parquet] Implement returning dictionary arrays from parquet reader

2021-02-02 Thread Andrew Lamb (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277063#comment-17277063 ] Andrew Lamb commented on ARROW-11410: - [~yordan-pavlov] I think this would be amazin

[jira] [Updated] (ARROW-11464) [Python] pyarrow.parquet.read_pandas doesn't conform to its docs

2021-02-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-11464: -- Fix Version/s: 4.0.0 > [Python] pyarrow.parquet.read_pandas doesn't conform to

[jira] [Assigned] (ARROW-11471) [Rust] DoubleEndedIterator for BitChunks

2021-02-02 Thread Jira
[ https://issues.apache.org/jira/browse/ARROW-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jörn Horstmann reassigned ARROW-11471: -- Assignee: Jörn Horstmann > [Rust] DoubleEndedIterator for BitChunks > ---

[jira] [Created] (ARROW-11471) [Rust] DoubleEndedIterator for BitChunks

2021-02-02 Thread Jira
Jörn Horstmann created ARROW-11471: -- Summary: [Rust] DoubleEndedIterator for BitChunks Key: ARROW-11471 URL: https://issues.apache.org/jira/browse/ARROW-11471 Project: Apache Arrow Issue Typ

[jira] [Updated] (ARROW-11470) [C++] Overflow occurs on integer multiplications in Compute(Row|Column)MajorStrides

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11470: --- Labels: pull-request-available (was: ) > [C++] Overflow occurs on integer multiplications i

[jira] [Created] (ARROW-11470) [C++] Overflow occurs on integer multiplications in Compute(Row|Column)MajorStrides

2021-02-02 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11470: Summary: [C++] Overflow occurs on integer multiplications in Compute(Row|Column)MajorStrides Key: ARROW-11470 URL: https://issues.apache.org/jira/browse/ARROW-11470 P

[jira] [Created] (ARROW-11469) Performance degradation wide dataframes

2021-02-02 Thread Axel G (Jira)
Axel G created ARROW-11469: -- Summary: Performance degradation wide dataframes Key: ARROW-11469 URL: https://issues.apache.org/jira/browse/ARROW-11469 Project: Apache Arrow Issue Type: Bug